What software/database to use for my script?

thepeach Active Member

Messages:: 89

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 93

#1

I'm planning on making a pdf search engine website.

Some well known examples are:
pdfqueen.com
pdfgeni.com

As you can see, these sites store all of their searches on the site. So in the end (when your site is successfull), you will have thousands of searches (per day) to store.

Now, my question is: what software/database should be used for this?

Like I said, if you have a successfull site like my examples above, the server load will be massive. So the combination of which script (language/software) and database (SQL or something) will be extremely important for the working of the site.

I was thinking of the combination of php with mysql.

What do you guys think? Is this the best combo or are there other (better) solutions?

thepeach, May 19, 2010 IP

Fervid Well-Known Member

Messages:: 161

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 120

#2

Why would the load be that bad? What kind of traffic are you anticipating? I wrote something somewhat similar. I wrote a spider in perl that would index the PDF files and store them in a mysql database. The front end was in PHP and would execute a fulltext query to return results. This was for a company whose documents had been stored in PDF format for years so there were thousands of documents that needed to be indexed.

Fervid, May 19, 2010 IP

thepeach Active Member

Messages:: 89

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 93

#3

Fervid said: ↑

Why would the load be that bad? What kind of traffic are you anticipating? I wrote something somewhat similar. I wrote a spider in perl that would index the PDF files and store them in a mysql database. The front end was in PHP and would execute a fulltext query to return results. This was for a company whose documents had been stored in PDF format for years so there were thousands of documents that needed to be indexed.
Click to expand...

My examples have an alexa of about 3000, so their traffic is hugh. So I assume their (server) load will be very high as well.

The pdf's come from google (google's api), so no need for a spider here.

So I guess you think php/mysql is the best way to go as well?

thepeach, May 19, 2010 IP

Fervid Well-Known Member

Messages:: 161

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 120

#4

thepeach said: ↑

My examples have an alexa of about 3000, so their traffic is hugh. So I assume their (server) load will be very high as well.

The pdf's come from google (google's api), so no need for a spider here.

So I guess you think php/mysql is the best way to go as well?
Click to expand...

Probably so, yes. The language isn't nearly as important as database tuning. Tuning to take advantage of both query caching and fulltext searching will give you the best results.

Fervid, May 19, 2010 IP

thepeach Active Member

Messages:: 89

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 93

#5

Fervid said: ↑

Tuning to take advantage of both query caching and fulltext searching will give you the best results.
Click to expand...

What do you mean with those 2 terms? What's the difference? Should I use both things in my script?

Sorry, but I'm not a coder.

thepeach, May 19, 2010 IP

Fervid Well-Known Member

Messages:: 161

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 120

#6

These two links will explain better than I could.

http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html

Fervid, May 19, 2010 IP

thepeach Active Member

Messages:: 89

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 93

#7

I don't understand most of it, because I'm not a coder, but thnx for the info.

So basically, those 2 things should be incorporated into the script (when creating a pdf search engine website)?

thepeach, May 19, 2010 IP

Fervid Well-Known Member

Messages:: 161

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 120

#8

I'd hate to answer that without knowing exactly how this system will work but probably so, yes. If you're storing the data found in the .pdf files in a database then definitely.

Fervid, May 19, 2010 IP

Log in or Sign up

What software/database to use for my script?

thepeach Active Member

Fervid Well-Known Member

thepeach Active Member

Fervid Well-Known Member

thepeach Active Member

Fervid Well-Known Member

thepeach Active Member

Fervid Well-Known Member

Useful Searches