Loading/uploading multiple PDF then indexing them for search

Discussion in 'C#' started by taurianthebull, Jan 25, 2008.

  1. #1
    Hi,
    I need to know how it is possible to search informations from the user's uploaded PDFs?

    Like if we upload 3 different PDFs on to the server, then how it is possible to search text and then display results from those PDF files ?


    Any advice and suggestion?

    Thanks in Advance.
     
    taurianthebull, Jan 25, 2008 IP
  2. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You will need a component to read the content of the PDF files.

    There is a possible problem with this idea depending on where the users are going to be getting their PDFs from... some of the Document -> PDF convertors/ printers that individuals often use actually convert text to images in the creation of the PDF and so there would be no text to actually read... if you wanted to be able to read these as well you would also need an OCR (optical character recognition) component
     
    AstarothSolutions, Jan 25, 2008 IP
  3. teraeon

    teraeon Peon

    Messages:
    40
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I believe the Microsoft Indexing service might do that for you. You can then query the Indexing service for your search functionality. The only issue you might run into is if you have to search both the database and these pdf's at the same time, there are probably solutions out there but the indexing service is free.
     
    teraeon, Jan 28, 2008 IP