Advanced Help With Robots.txt Allow & Disallow

Discussion in 'robots.txt' started by jvfconsulting, Apr 5, 2010.

  1. #1
    I noticed that none of our images were showing up in the Google Images search results, my thoughts were because our images are located inside a folder in our CMS which is set to Disallow. Unfortunately there is stuff we can't have crawled within the /amass folder, so we're trying to find a way to allow spiders to crawl only the images folder within the /amass folder. Is this possible? Would any of these scenarios below work for us, or are we just screwed? lol

    ROBOTS.TXT #1
    Sitemap: http://www.arthurassociates.com/sitemap.xml
    User-agent: *
    Allow: /amass/images/*?$
    Disallow: /amass
    Disallow: /cgi-bin

    ROBOTS.TXT #2
    Sitemap: http://www.arthurassociates.com/sitemap.xml
    User-agent: *
    Allow: /amass/images/
    Disallow: /amass
    Disallow: /cgi-bin

    ROBOTS.TXT #3
    Sitemap: http://www.arthurassociates.com/sitemap.xml
    User-agent: *
    Disallow: /amass/example1.aspx
    Disallow: /amass/example2aspx
    Disallow: /amass/js/example1js
    Disallow: /amass/js/example2.js
    Disallow: /cgi-bin
    Allow: /amass/images/
     
    jvfconsulting, Apr 5, 2010 IP
  2. D3xter

    D3xter Member

    Messages:
    22
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #2
    You can allow only Googlebot-Image for a specific folder
     
    D3xter, Apr 7, 2010 IP
  3. jvfconsulting

    jvfconsulting Active Member

    Messages:
    1,089
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    90
    #3
    Thanks D3xter. After allot of digging around I thought I would see what Google does. Once I took a look at what Google is doing in their robots.txt file, it all became clear to me. http://www.google.com/robots.txt

    See this chunk?

    Disallow: /safebrowsing
    Allow: /safebrowsing/diagnostic
    Allow: /safebrowsing/report_error/
    Allow: /safebrowsing/report_phish/

    Looks like what I needed to do to fix my current robots.txt was to change it to look something like this:

    Disallow: /amass
    Allow: /amass/images
    Allow: /amass/skins/default/images
     
    jvfconsulting, Apr 7, 2010 IP