1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to block a large number of pages from being indexed

Discussion in 'Google' started by bestoptimized, Jul 26, 2007.

  1. #1
    I have a site that has about 3500 pages indexed but 3400 pages are supplemental. What I would like to do is block (with robots.txt) about 2000 product pages that are not the best quality (short descriptions, etc). Now it is impossible to block these pages with a single line such as:
    Disallow: /catalog/
    because I have 600 products that I want to be allowed.
    Could I put all of the product urls that I want to disallow into robots.txt?
    I don't know if google could handle a robots.txt that big or not.
    Has anyone tried this.
     
    bestoptimized, Jul 26, 2007 IP
  2. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #2
    Create the robots.txt. Log into Google Webmaster Console. See if it validates.

    -Michael
     
    mvandemar, Jul 26, 2007 IP
  3. bestoptimized

    bestoptimized Peon

    Messages:
    159
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I didn't think about that!! The only problem is that is will take awhile to compile the list and if it doesn't work it would a waste of time. I might try it though.
     
    bestoptimized, Jul 26, 2007 IP
  4. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #4
    Add 600 random non-existing addresses/directories first. Would take about 5 minutes to write a php program that would spit those out, then just view source and copy and paste into robots.txt. If it is going to balk at reading the file, it won't matter if they actually exist or not.

    -Michael
     
    mvandemar, Jul 26, 2007 IP
    bestoptimized likes this.
  5. bestoptimized

    bestoptimized Peon

    Messages:
    159
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Well, they have a limit of 5000 characters on the Webmaster tool and I have done alittle searching on Google and most people say to keep the file below 15k.
    So if anyone has any ideas I would like to hear them.
    The product pages are oscommerce with modrewrite in case that helps.
     
    bestoptimized, Jul 26, 2007 IP
  6. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #6
    If you're just trying to funnel PageRank only to the important pages, then why don't you use nofollow on the ones you don't want PageRank to go to?

    Alternatively, you could just boost the sections that are in the supps, and get them into the regular index.

    -Michael
     
    mvandemar, Jul 26, 2007 IP
  7. oseymour

    oseymour Well-Known Member

    Messages:
    3,960
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    135
    #7
    just curious...why do you want to block these pages? supplemental results are not something to be afraid of...
     
    oseymour, Jul 26, 2007 IP
  8. bestoptimized

    bestoptimized Peon

    Messages:
    159
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It is well known that if you block the unimportant pages the rankings will go up on your important pages that are not blocked because the get more link power.

    Check out www.seobook.com/archives/001545.shtml
     
    bestoptimized, Jul 26, 2007 IP
  9. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #9
    It's dead weight... the PageRank a page has that can be passed on to other pages is divided among all links on a page. If some of the links point to supps, which won't then pass any on themselves to other pages of the site, then it's wasted.

    However, I think you need to focus on the linking, rather than on whether or not the pages are in the index. I'm pretty sure that blocking a page with robots.txt does not block Google form assigning a portion of the PageRank to the links pointing to those pages.

    -Michael
     
    mvandemar, Jul 26, 2007 IP