Hello, im not sure if this is the right place to ask this question or not.. but I have been searching for a particular script that can do the following... Load pdf file and be able to : specify and read content and images of certain portion of the pdf file.. For example: say i have a pdf file of a newspaper page and i want to select an article from there and grab the text and the images and have it saved to the database. The selection needs to recognize the font size and the weight. So it can distinguish the title of the article and the contents. All these needs to be done on a web interface.. In simple terms i need a PDF cropping script that will be able to read the text and store it.. Please help and advice any possible script that is available which features these functions. I have been searching on the net but no luck as of yet. Hopefully i have explained for your understanding. Thanks
Let me explain it a little simpler. All I am looking for is a script that can read pdf's file text and be able to identify the text size and the weight so it can identify what is the title and the content. I don't need it to convert it to image. Just be able to recognize portion of the pdf file that i select.
You want this done automatically, or you want to manually select the text, then have the site save it to a database? In the first case, it's probably possible, but not easy, but it would be done in PHP. In the second case it would be fairly trivial, but it's done in Javascript (and AJAX unless you want a mess).