Everyone keeps pointing me to robots.txt, and I've read through the spec, but still don't understand what I can do. I already have a robots.txt file in which the subfolder 'cart' is dissallowed. There is a file in the cart folder named update.php which is a PHP script which updates a shopping cart and then redirects to the originating page. So, pages throughout the website might have links to update.php with GET variables: /cart/update.php?action=add&item=2 /cart/update.php?action=add&item=5 etc. So, even though robots.txt prevents /cart/ from being "crawled", the links to the update.php file from pages outside of /cart/ are resulting in those links showing up in google results. So now there is a google result for adding every item (there are thousands) to the cart. How can I ommit these results? The ones with update.php? - Micah
disallowing /cart/ should disallow everything under it as well. Disallowing /cart/update.php would disallow everything else after it (all update.php? url's). You could also put nofollow tags on all links pointing to and from update.php if you like (but that won't help when other people link to them).
You could add this line <meta name="robots" content="noindex,nofollow"> Code (markup): within the <head>...</head> section of the pages you do not want to see in the index. Jean-Luc