1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

I did not find robots.txt in my site!

Discussion in 'HTML & Website Design' started by karnetics, Aug 17, 2007.

  1. #1
    I had a member ask me this question, so I decided to post my reply to help others with this same question.

    Robot.txt files do not come already package with your web hosting service... You have to create them. Here is how.

    1. Open your favorite text editor or html editor.
    2. Save the file as a standard text file with the file extension “txt”.

    The robot.txt file really tells a search engine bots what not to spider. Say you have a folder called secrets. You do not want the world to see the html files under that folder. You then make a entry within the robot.txt file for example:

    # Google
    User-agent: googlebot
    Disallow: /secrets/

    This entry tells Google search engine spiders not to crawl any thing under you folder secrets which keeps the search engine from indexing any html files under that folder.

    Here are other search engine bots:
    #msn
    User-agent: msnbot

    #yahoo
    User-agent: slurp

    # Others
    User-agent: *

    I know what you are asking, "So how do I tell the search engine bots to crawl everything else within my site?"

    2. Adding a robot.txt meta tag to all your html pages.
    Most search engine bots scan your site without effort. Meaning they do not need a robot.txt or meta tag to tell them what to do. However, the major bots will not scan your site without the meta tag entry and robot.txt causing your site to take longer to be index on the internet. So once, you have modified the robot.txt, telling the robots what not to scan. Then you add this meta tag entry to all your html pages. This entry will go under <title>Your page title</title> example:

    <title>You page title</title>
    <meta name="robots" content="FOLLOW,INDEX">

    This meta tag tells all robots, before they scan your site check for a robot.txt, then follow all links within my site, then index my site. Once the robots read the robot.txt file it will capture everything on your site that is not listed within the robot.txt file and send that back to the search engines.

    3. How do I check to see if my site has been crawled
    Navigate to your favorite search engine, example Google. Within Google search bar, type either your website name or title. If the spider has visited your site is should show within the search query and most likely it will be the first result. As you will see, it will show the title and description of your site. Click on the link cache, cache will show you the data the search engine bots sent back to the search engine. It will show you a screen shot of your website. On this screen shot page, you will notice a time and date stamp, you will see images and text of what the search engine bots captured. If you notice the time and date stamp this will tell you when that search engine bots last crawled your websites.

    Things you need to know.
    a. Constantly update your website with information.
    b. most search engine bots visit your site once a month and around the same time every month. If you have, your site updated before that time the better.
     
    karnetics, Aug 17, 2007 IP
  2. ednit

    ednit Peon

    Messages:
    152
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #2
    This is good advice, but I would add something.

    If someone has a portion of their website that they don't want indexed because it has download information for a product they are selling. . . this url/folder should not be put in the robots.txt.

    The reason for this is because:
    1) the robots.txt file is accessible by anybody that types in yoursite.com/robots.txt
    2) if the download page/folder is stored in the robots.txt file, you're giving anybody who has the unction to check out your robots.txt file free access to whatever you're selling (unless it's a protected download, i.e. dynamic, not static)

    I say this cuz I get human visitors to some on my sites' robots.txt file, and if I were trying to steal a digital product, the robots.txt file would be the second place I looked for the download page.

    I just thought that point was important to state.
     
    ednit, Aug 17, 2007 IP
  3. karnetics

    karnetics Peon

    Messages:
    254
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Great point.. Security is a big thing on the internet today.. an most web developers would not think of this... Great point!!! :D

     
    karnetics, Aug 17, 2007 IP
  4. tadwestie

    tadwestie Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Please, I get how to determine to see the cache site and see when Google came and spidered, How can I tell when Yahoo has cached?
     
    tadwestie, May 27, 2008 IP
  5. w0lfenst1en

    w0lfenst1en Well-Known Member

    Messages:
    128
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #5
    w0lfenst1en, May 27, 2008 IP
  6. casinouk

    casinouk Peon

    Messages:
    279
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    If all of your pages are going to be public you really do not need it.
     
    casinouk, May 28, 2008 IP
  7. hillord

    hillord Well-Known Member

    Messages:
    2,211
    Likes Received:
    59
    Best Answers:
    0
    Trophy Points:
    140
    #7
    my site doesn't update once a month. it's updated several times per day. so that's why my site get indexed in google faster with the right keyword
     
    hillord, May 28, 2008 IP
  8. jason102178

    jason102178 Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    hi im not sure what i am doing but i have a robot.txt and i try putting it on my website and it doesnt work how would i put it on my website
     
    jason102178, Jun 4, 2008 IP
  9. w0lfenst1en

    w0lfenst1en Well-Known Member

    Messages:
    128
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #9
    change the name from robot.txt to robots.txt :D
     
    w0lfenst1en, Jun 6, 2008 IP