Robots.txt advice

Discussion in 'Blogging' started by newzone, Nov 22, 2007.

  1. #1
    newzone, Nov 22, 2007 IP
  2. arpitagarwal82

    arpitagarwal82 Notable Member

    Messages:
    2,996
    Likes Received:
    398
    Best Answers:
    0
    Trophy Points:
    280
    #2
    robots.txt file just tell the search engine bots about which page are meant to be indexed and which are not.
    Using a robot file helps in excluding the pages which you do not want t get indexed.
     
    arpitagarwal82, Nov 22, 2007 IP
  3. newzone

    newzone Well-Known Member

    Messages:
    2,865
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    135
    Digital Goods:
    1
    #3
    Yes but i want to know if the example robots.tx will let search engines index my new pages (new posts) not only the domain , i want to know if any post will be indexed as different page to search engines
     
    newzone, Nov 22, 2007 IP
  4. arpitagarwal82

    arpitagarwal82 Notable Member

    Messages:
    2,996
    Likes Received:
    398
    Best Answers:
    0
    Trophy Points:
    280
    #4
    Do you have a different page for each post (most of the blogs do have).?
    Number of index pages do not depend much on robot file. Try getting good backlinks and interlink the pages on your blog in a better manner.
     
    arpitagarwal82, Nov 22, 2007 IP
  5. newzone

    newzone Well-Known Member

    Messages:
    2,865
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    135
    Digital Goods:
    1
    #5
    newzone, Nov 22, 2007 IP
  6. arpitagarwal82

    arpitagarwal82 Notable Member

    Messages:
    2,996
    Likes Received:
    398
    Best Answers:
    0
    Trophy Points:
    280
    #6
    what is the URL of your blog?
     
    arpitagarwal82, Nov 22, 2007 IP
  7. apachehtaccess

    apachehtaccess Guest

    Messages:
    82
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #7
    the robots.txt file you linked to http://www.askapache.com/robots.txt is GREAT, just do a search for site:www.askapache.com on google and you will see how effective it is. The problem with what you are asking though is you have it backwards. Robots.txt doesn't tell anyone to index pages, it tells everyone what pages to NOT index. A sitemap sounds like the solution for you.
     
    apachehtaccess, Nov 25, 2007 IP
  8. CypherHackz

    CypherHackz Well-Known Member

    Messages:
    447
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    155
    #8
    just use simple robots.txt file is better than use complicated structure. here is mine.

    robots still able to crawl new posts that i made. btw if you want to check whether your robots.txt work in the right why, you can check in Google Webmaster Sitemap.

    -cypher.
     
    CypherHackz, Nov 27, 2007 IP
  9. lateuk

    lateuk Active Member

    Messages:
    317
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    58
    #9
    As explained here a:

    - robots.txt file is used to stop certain/all search engines seeing certain pages and/or folders.
    - The meta tag, noindex is used to allow search engines to crawl pages but they are told not to index them.
    - A sitemap is used to tell search engines what pages you would like indexed and when they were updated.

    Late
     
    lateuk, Nov 28, 2007 IP
  10. newzone

    newzone Well-Known Member

    Messages:
    2,865
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    135
    Digital Goods:
    1
    #10
    I deleted robots.txt , i'm so confuse , i want every post be indexed for the keyword i made this is what i want , if i write about best hosting a page , www.myblog/best-hosting/ to rank and be indexed for "best hosting"
     
    newzone, Nov 28, 2007 IP
  11. CypherHackz

    CypherHackz Well-Known Member

    Messages:
    447
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    155
    #11
    there are many things that qoogle use to rank your page in google serp. not only because of the robots.txt.

    -cypher.
     
    CypherHackz, Nov 28, 2007 IP
  12. apachehtaccess

    apachehtaccess Guest

    Messages:
    82
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #12
    newzone, the only sure way to get google to index content is for google to find a LINK to that content. So if google already has your homepage indexed, a link on your homepage to /best-hosting/ will ensure that google finds it.
     
    apachehtaccess, Nov 28, 2007 IP
  13. newzone

    newzone Well-Known Member

    Messages:
    2,865
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    135
    Digital Goods:
    1
    #13
    Ok , i put the robots.txt file up again, thanks
     
    newzone, Nov 28, 2007 IP
  14. itliberty

    itliberty Peon

    Messages:
    1,173
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #14
    The only thing I would suggest to add to that robots, is the sitemap location...

    For example:
    User-agent: *
    Disallow: /includes/
    Disallow: /images/
    sitemap: http://www.yourdomain.com/yoursitemap.xml

    And then if you really want to go the extra mile which it seems you do, use the old fashion python sitemap generator as found on Google Webmaster site.

    Run it as a cron or some scheduled job as often as you feel the need and for the directory you have the posts going in, set it to "daily" using the option where it crawls your site looking for files. (Sitemap you can have daily, weekly, monthly set to pages.)

    The reason I say the sitemap from google's site because after you run it, it notifies Google that a new sitemap has been generated. You will surely give yourself the best chances this way.
     
    itliberty, Nov 28, 2007 IP