what does it mean??

Discussion in 'Search Engine Optimization' started by wildstone, Apr 14, 2008.

  1. #1
    Hello friends,

    This is one of the page of one of my client :: datingsitefree4all.com/robots.txt
    does it mean that the side is disallow for SE. But the index source page its showing >> robot allow

    i m bit confused!!

    suggest me
     
    wildstone, Apr 14, 2008 IP
  2. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
    #2
    Search engine crawlers read the robots.txt file before the meta headers; so this means they would ignore the web pages via the robots.txt and never access them to begin with to see the meta robots element.
     
    astup1didiot, Apr 14, 2008 IP
  3. ajsa52

    ajsa52 Well-Known Member

    Messages:
    3,426
    Likes Received:
    125
    Best Answers:
    0
    Trophy Points:
    160
    #3
    Your robot.txt file is providing restrictions to search engine robots, but ONLY for the directories listed:
    /include, /design, /plugins, and /site

    Pages from other directories could be crawled if are in sitemap or linked from other pages (same or different sites).
     
    ajsa52, Apr 14, 2008 IP
  4. wildstone

    wildstone Peon

    Messages:
    1,040
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Disallow: /site/ Means the SE crawler will not visit my whole site??

    please suggest me.

    thanks in advance
     
    wildstone, Apr 14, 2008 IP
  5. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
    #5
    Actually even if the hyperlink is found via another source (external link) if it's "not" currently indexed and is blocked via the robots.txt file it won't get indexed. The first thing a "respectable" web robot following the robots exclusion protocol is check the robots.txt if the current web page or it's appended directory is blocked before it starts crawling from the web page.
     
    astup1didiot, Apr 14, 2008 IP
  6. wildstone

    wildstone Peon

    Messages:
    1,040
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Under the site folder, there are lots of folders and files (the web pages)

    /public_html/site/public

    so, tell me the pages will not viewed by SE??

    thanks
     
    wildstone, Apr 14, 2008 IP
  7. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
    #7

    User-agent: *
    Disallow: /

    That will block the entire website, include all sub-directories and web pages. Using the * as the user-agent will tell all search engines following the robots exclusion protocol to not not index anything from this website.
     
    astup1didiot, Apr 14, 2008 IP
  8. ajsa52

    ajsa52 Well-Known Member

    Messages:
    3,426
    Likes Received:
    125
    Best Answers:
    0
    Trophy Points:
    160
    #8
    I was taking about pages from other directories not listed in his robots.txt
    "Pages from other directories could be crawled ..." :)


    Your robots.txt is excluding these directories, and all files/directories under it:

    yoursite.com/include
    yoursite.com/design
    yoursite.com/plugins
    yoursite.com/site
     
    ajsa52, Apr 14, 2008 IP
  9. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
    #9
    Ah, your right. *slaps forehead*
     
    astup1didiot, Apr 14, 2008 IP
  10. wildstone

    wildstone Peon

    Messages:
    1,040
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #10
    and what does it mean>>

    User-agent: MediaPartners-Google
    User-agent: Adsbot-Google
    Disallow:
     
    wildstone, Apr 14, 2008 IP
  11. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
    #11
    That allows the Google AdSense bot and the Adwords Quality Score robot to index the entire site.
     
    astup1didiot, Apr 14, 2008 IP
  12. mikey1090

    mikey1090 Moderator Staff

    Messages:
    15,869
    Likes Received:
    1,055
    Best Answers:
    0
    Trophy Points:
    445
    Digital Goods:
    2
    #12
    That would only block the sub folder "site".

    Why ban the engines from your site anyway?
     
    mikey1090, Apr 14, 2008 IP
  13. sheds

    sheds Peon

    Messages:
    825
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #13
    What does it mean?

    User-agent: *
    Disallow:
     
    sheds, Apr 14, 2008 IP
  14. mobileshub

    mobileshub Peon

    Messages:
    33
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

    The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.

    The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.
     
    mobileshub, Apr 14, 2008 IP
  15. ajsa52

    ajsa52 Well-Known Member

    Messages:
    3,426
    Likes Received:
    125
    Best Answers:
    0
    Trophy Points:
    160
    #15
    To allow all robots complete access.
    See this page for full info about The Web Robots Pages
     
    ajsa52, Apr 14, 2008 IP
  16. poseidon

    poseidon Banned

    Messages:
    4,356
    Likes Received:
    246
    Best Answers:
    0
    Trophy Points:
    0
    #16
    robots.txt is just to give directions for search engine crawlers and otehr crawlers to what to do with the site. Its a good way to stop them crawling your copyrighted images or important files.

    Do note that crawlers are not bind to follow that, if they are not programmed to follow them, than you can't do anything.
     
    poseidon, Apr 14, 2008 IP
  17. Loy Maben

    Loy Maben Peon

    Messages:
    298
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #17
    i wud suggest you to go through dis::
    http://www.robotstxt.org/robotstxt.html

    hope all your doubts are clear nw!!!!!!!!
     
    Loy Maben, Apr 15, 2008 IP
  18. IndieRetailer

    IndieRetailer Peon

    Messages:
    214
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Hey Sanjoy (Wildstone)!!!!

    It means you are a criminal and have committed a great deal of FRAUD and THEFT against fellow DP'ers! You owe a LOT of people a LOT of money and you need to take responsibility for your actions! Come back to your thread and start taking a list of people you owe money to and start making payments!
    http://forums.digitalpoint.com/showthread.php?t=288426
     
    IndieRetailer, Apr 21, 2008 IP
    medicalhumor likes this.