What software/database to use for my script?

Discussion in 'Programming' started by thepeach, May 19, 2010.

  1. #1
    I'm planning on making a pdf search engine website.

    Some well known examples are:
    pdfqueen.com
    pdfgeni.com

    As you can see, these sites store all of their searches on the site. So in the end (when your site is successfull), you will have thousands of searches (per day) to store.

    Now, my question is: what software/database should be used for this?

    Like I said, if you have a successfull site like my examples above, the server load will be massive. So the combination of which script (language/software) and database (SQL or something) will be extremely important for the working of the site.

    I was thinking of the combination of php with mysql.

    What do you guys think? Is this the best combo or are there other (better) solutions?
     
    thepeach, May 19, 2010 IP
  2. Fervid

    Fervid Well-Known Member

    Messages:
    161
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    120
    #2
    Why would the load be that bad? What kind of traffic are you anticipating? I wrote something somewhat similar. I wrote a spider in perl that would index the PDF files and store them in a mysql database. The front end was in PHP and would execute a fulltext query to return results. This was for a company whose documents had been stored in PDF format for years so there were thousands of documents that needed to be indexed.
     
    Fervid, May 19, 2010 IP
  3. thepeach

    thepeach Active Member

    Messages:
    89
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    93
    #3
    My examples have an alexa of about 3000, so their traffic is hugh. So I assume their (server) load will be very high as well.

    The pdf's come from google (google's api), so no need for a spider here.

    So I guess you think php/mysql is the best way to go as well?
     
    thepeach, May 19, 2010 IP
  4. Fervid

    Fervid Well-Known Member

    Messages:
    161
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    120
    #4
    Probably so, yes. The language isn't nearly as important as database tuning. Tuning to take advantage of both query caching and fulltext searching will give you the best results.
     
    Fervid, May 19, 2010 IP
  5. thepeach

    thepeach Active Member

    Messages:
    89
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    93
    #5
    What do you mean with those 2 terms? What's the difference? Should I use both things in my script?

    Sorry, but I'm not a coder. :)
     
    thepeach, May 19, 2010 IP
  6. Fervid

    Fervid Well-Known Member

    Messages:
    161
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    120
  7. thepeach

    thepeach Active Member

    Messages:
    89
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    93
    #7
    I don't understand most of it, because I'm not a coder, but thnx for the info. :)

    So basically, those 2 things should be incorporated into the script (when creating a pdf search engine website)?
     
    thepeach, May 19, 2010 IP
  8. Fervid

    Fervid Well-Known Member

    Messages:
    161
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    120
    #8
    I'd hate to answer that without knowing exactly how this system will work but probably so, yes. If you're storing the data found in the .pdf files in a database then definitely.
     
    Fervid, May 19, 2010 IP