I would like to write a script to get all the links within a domain. However I don't really know where to start. Searching through text for links might work however it could get stuck in loops. Ideally I would like to grab all the urls on a google site:yourdomain.com search but how would I do that? Any ideas or pointers?
Grabbing every link from the texts would work very well... you would have to store every page you already indexed into an array so you don't result in loops. If you want to use Google data you might need to use the Google API from http://code.google.com/ , since directly grabbing the content from google.com search results is not allowed.