How can you tell a site is blocking the bots?

Discussion in 'Search Engine Optimization' started by coopersPick, Jul 5, 2010.

  1. #1
    I am pretty sure its .robot txt that stops a site from being indexed but wanted to make sure and if it is how can I look at a source code to see if they are blocking a bot from indexing or is that not possible?
     
    coopersPick, Jul 5, 2010 IP
  2. magda

    magda Notable Member

    Messages:
    5,197
    Likes Received:
    315
    Best Answers:
    0
    Trophy Points:
    280
    #2
    magda, Jul 5, 2010 IP
  3. coopersPick

    coopersPick Active Member

    Messages:
    528
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    55
    #3
    dont get it so it should be in the url? or should I be looking for something in the source code?
     
    coopersPick, Jul 5, 2010 IP
  4. SEMSpot

    SEMSpot Peon

    Messages:
    513
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Lets say you want to see the robots.txt file of widget.com you would simply go to www.widget.com/robots.txt

    If they are allowing everything, then look in the meta data (should be at the top) within the source code and see if they are blocking anything from there.
     
    SEMSpot, Jul 5, 2010 IP
  5. Grimm

    Grimm Peon

    Messages:
    3,072
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Check nofollow tags on links as well. It blocks robots from crawling the link that page will most likely not get indexed unless having inbound links from other websites or pages that is not using nofollow tags.
     
    Grimm, Jul 5, 2010 IP
  6. coopersPick

    coopersPick Active Member

    Messages:
    528
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    55
    #6
    still a little confused can I pull up the source code and look at that and see a .robts txt in there or no?
     
    coopersPick, Jul 8, 2010 IP
  7. Grimm

    Grimm Peon

    Messages:
    3,072
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You can also try typing /robots.txt directly on your browser.

    Ex. http://www.example.com/robots.txt
     
    Grimm, Jul 8, 2010 IP
  8. coopersPick

    coopersPick Active Member

    Messages:
    528
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    55
    #8
    and what do I need to look for once I type that in to the domain?
     
    coopersPick, Jul 8, 2010 IP
  9. Grimm

    Grimm Peon

    Messages:
    3,072
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    0
    #9
    This can help you a lot, check this Google webmaster support information.

    Just watch out for this type of robots.txt files as they are meant to block any crawlers from crawling your website.

    User-agent: *
    Disallow: /
    Code (markup):
     
    Grimm, Jul 8, 2010 IP
  10. mvpsandeep

    mvpsandeep Active Member

    Messages:
    113
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #10
    <meta name="robots" content="noindex, nofollow" />
     
    mvpsandeep, Jul 8, 2010 IP
  11. dorthyjoseph

    dorthyjoseph Guest

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    This is perfect....
     
    dorthyjoseph, Jul 9, 2010 IP
  12. earnincome

    earnincome Peon

    Messages:
    724
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #12
    You are absolutely right, it is the only way to check robots.txt and content in it.
     
    earnincome, Jul 9, 2010 IP
  13. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #13
    As robots.txt is placed in the root folder, you can easily check your robots.txt file at yourdomain.com/robots.txt
     
    manish.chauhan, Jul 9, 2010 IP