I noticed that none of our images were showing up in the Google Images search results, my thoughts were because our images are located inside a folder in our CMS which is set to Disallow. Unfortunately there is stuff we can't have crawled within the /amass folder, so we're trying to find a way to allow spiders to crawl only the images folder within the /amass folder. Is this possible? Would any of these scenarios below work for us, or are we just screwed? lol ROBOTS.TXT #1 Sitemap: http://www.arthurassociates.com/sitemap.xml User-agent: * Allow: /amass/images/*?$ Disallow: /amass Disallow: /cgi-bin ROBOTS.TXT #2 Sitemap: http://www.arthurassociates.com/sitemap.xml User-agent: * Allow: /amass/images/ Disallow: /amass Disallow: /cgi-bin ROBOTS.TXT #3 Sitemap: http://www.arthurassociates.com/sitemap.xml User-agent: * Disallow: /amass/example1.aspx Disallow: /amass/example2aspx Disallow: /amass/js/example1js Disallow: /amass/js/example2.js Disallow: /cgi-bin Allow: /amass/images/
Thanks D3xter. After allot of digging around I thought I would see what Google does. Once I took a look at what Google is doing in their robots.txt file, it all became clear to me. http://www.google.com/robots.txt See this chunk? Disallow: /safebrowsing Allow: /safebrowsing/diagnostic Allow: /safebrowsing/report_error/ Allow: /safebrowsing/report_phish/ Looks like what I needed to do to fix my current robots.txt was to change it to look something like this: Disallow: /amass Allow: /amass/images Allow: /amass/skins/default/images