What is the most efficient way to index a particular site? I am currently looking at using PHP to pull all the text from the page, strip it of formating, and then save it to a MySQL data base, but this seems to be really slow are there more efficient/faster ways to do this? How do search engines do it?
in my own opinion submit url to google, yahoo and a few directories. make sure to insert the necessary meta tags
You can use some Perl commands or PHP. I got started with wget after working with various spidering programs. You can learn a lot from a book, "Spidering Hacks" from O'Reilly. There are PERL scripts, PHP libraries, libcurl, wget, etc. It all depends on what you're most comfortable with. Tell me more about what your background is and what you want to do.
Go to digg.com and submit a few articles from your websites. your aitw will be indexed within an hour.
there seems to be 3 different opinions here. Are you trying to index sites from somewhere else to your site? If so then google and etc etc are not the same because they index you, and not you index someone else. Sphider is a script where you index your own server directories, essentially you index yourself.
The best way is to check where the part you want to index begins and where it ends. than you have to find a pattern where you can start" cutting" the text you are interested in from, and a pattern when you will stop "cutting" the text. I'm using this method since years and it works unless they change the code.
Thanks for the help, yes I am trying to index text from other sites not my own, Looks like php will do the trick