G Indexing Pages With No Inbound Links

Discussion in 'Google' started by flagday, Jan 19, 2007.

  1. #1
    I decided to put a wordpress blog on my site, more as a back-end for adding new content than as the main site. This was about two weeks ago.

    So, I put it in its own directory (.com/news/) and started posting. My first posts were "TEXT TEXT TEXT ..." ad infinitum, just to get used to the categories, the URI rewriting, how photos would look, etc. I also put up a file in the root directory called "testex.html," which was a dupe of index.html except it had a php rss parser of my latest posts, instead of the main content that was on my index.html.

    There were NO inbound links to this subdirectory, but there was an outbound link to the site homepage (...com/) in the blogroll on every page, and testex.html had outbounds to almost every page on the site (but no IBL to testex.html).

    I don't know how it happened, but last night I was in Sitemaps and I clicked on the "pages indexed" and it came up with all these pages I thought would never get indexed. Cached on January 15, about 10 days after I downloaded Wordpress, pages like "Seventh Post - Image test" with "Text text text text..." in the excerpt beneath.

    All of these pages return a 404 now - I deleted them before I even knew they were indexed.

    I know, I should've made a robots.txt exclusion for the WP directory, but I am lazy. And I had no idea G would index these pages so quickly (or at all).

    Anyone see this happen before?

    More importantly, is there a way to remove pages from the index?
     
    flagday, Jan 19, 2007 IP
  2. Kain

    Kain Peon

    Messages:
    58
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Happens all the time.

    Its usually caused by viewing the pages with a browser with the Google toolbar built in.

    Theres a theory that the built in spyware adds any pages you view to Googles crawl list
     
    Kain, Jan 19, 2007 IP
  3. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Maybe your sitemap generator got them into your sitemap?
     
    mad4, Jan 19, 2007 IP
  4. sqeeze

    sqeeze Peon

    Messages:
    169
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Neither google toolbar, nor sitemap generator. It is more likely that it was indexed through pingomatic notifications available in all wp scripts by default. Check your blog's options - /wp-admin/options-writing.php
     
    sqeeze, Jan 19, 2007 IP
  5. NetMidWest

    NetMidWest Peon

    Messages:
    1,677
    Likes Received:
    151
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Pingomatic notification is a suspect.
    So are crawl caching proxy servers. Running adsense would cause it to get indexed.
    I used to think that the toolbar might be one of the services the proxy servers might use, but this post says otherwise.
     
    NetMidWest, Jan 19, 2007 IP
  6. flagday

    flagday Peon

    Messages:
    348
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Yep, it appears that this is the culprit, since my "sitemap generator" is notepad (I'm still learning;) ).

    Thanks for the input, I'll be sure to change options next time I'm launching with WP.
     
    flagday, Jan 19, 2007 IP
  7. thetafferboy83

    thetafferboy83 Active Member

    Messages:
    312
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    70
    #7
    Where did you read that?
     
    thetafferboy83, Jan 19, 2007 IP
  8. sqeeze

    sqeeze Peon

    Messages:
    169
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Several years ago, when G toolbar was a novelty, there were many gossips regarding its real mission for G. One of opinions was that G is gathering information about surfers' activity on web and some said that sites without inbound links can be revealed for G through the toolbar. Unfortunately I can neither affirm, nor disprove these gossips.
     
    sqeeze, Jan 20, 2007 IP
  9. oseymour

    oseymour Well-Known Member

    Messages:
    3,960
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    135
    #9
    oseymour, Jan 20, 2007 IP