I'm planning on making a pdf search engine website. Some well known examples are: pdfqueen.com pdfgeni.com As you can see, these sites store all of their searches on the site. So in the end (when your site is successfull), you will have thousands of searches (per day) to store. Now, my question is: what software/database should be used for this? Like I said, if you have a successfull site like my examples above, the server load will be massive. So the combination of which script (language/software) and database (SQL or something) will be extremely important for the working of the site. I was thinking of the combination of php with mysql. What do you guys think? Is this the best combo or are there other (better) solutions?
Why would the load be that bad? What kind of traffic are you anticipating? I wrote something somewhat similar. I wrote a spider in perl that would index the PDF files and store them in a mysql database. The front end was in PHP and would execute a fulltext query to return results. This was for a company whose documents had been stored in PDF format for years so there were thousands of documents that needed to be indexed.
My examples have an alexa of about 3000, so their traffic is hugh. So I assume their (server) load will be very high as well. The pdf's come from google (google's api), so no need for a spider here. So I guess you think php/mysql is the best way to go as well?
Probably so, yes. The language isn't nearly as important as database tuning. Tuning to take advantage of both query caching and fulltext searching will give you the best results.
What do you mean with those 2 terms? What's the difference? Should I use both things in my script? Sorry, but I'm not a coder.
These two links will explain better than I could. http://dev.mysql.com/doc/refman/5.1/en/query-cache.html http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
I don't understand most of it, because I'm not a coder, but thnx for the info. So basically, those 2 things should be incorporated into the script (when creating a pdf search engine website)?
I'd hate to answer that without knowing exactly how this system will work but probably so, yes. If you're storing the data found in the .pdf files in a database then definitely.