Need help working with PDF, DOC and image files

Discussion in 'PHP' started by rfeio, Mar 12, 2009.

  1. #1
    Hi,

    I would appreciate any help anyone could give me on this subject. I need to create a script that does:

    1. Upload files (allowed formats): PDF, Word Doc, RTF, JPG, GIF, BMP

    2. Validate that the file format uploaded corresponds to the ones allowed

    3. Search inside PDF, Word Doc and RTF files.


    I'm ok with uploading the files, but I don't know how to validate their formats and how to do a search inside a PDF, DOC or RTF file.

    Cheers,

    R
     
    rfeio, Mar 12, 2009 IP
  2. Altari

    Altari Peon

    Messages:
    188
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You'll need to store the allowed mime types, and compare the file data to those.

    $allowedMimes = array("application/msword", "application/pdf", "application/rtf");
    if(!in_array($fileMime, $allowedMimes) {
    // fail
    } else {
    // do something
    }
    
    PHP:
    http://www.webmaster-toolkit.com/mime-types.shtml

    For searching, I think you'll need a PDF and DOC library. But, I can't say for sure
     
    Altari, Mar 12, 2009 IP
  3. SmallPotatoes

    SmallPotatoes Peon

    Messages:
    1,321
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    0
    #3
    To search inside PDF files on unix/linux/mac systems you can use pdftotext which will extract the text. The program catdoc will do the same for most Word files. Both of these may need to be installed as they are normally not part of the standard software set. If you are on shared hosting then perhaps they are already there.
    % hostname -f
    yoohoo.dreamhost.com
    % which pdftotext
    /usr/bin/pdftotext
    % which catdoc
    catdoc not found
    Code (markup):
     
    SmallPotatoes, Mar 12, 2009 IP
  4. Stylesofts

    Stylesofts Peon

    Messages:
    64
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    
    function checkExtension($extension)
    {
              if($extension = "doc" || $extension = "pdf" ||  $extension = "rtf" || $extension = "jpeg" || $extension = "jpg" || $extension = "bmp" || $extension = "png" || $extension = "gif")
    {
    $error = 0;
    }
    else
    {
    $error = 1;
    return $error;
    }
    }
    function getExtension($fileName)
    {
    	$ext_arr = explode('.',$fileName);
    	$cnt = count($ext_arr);
    	$extension = $ext_arr[$cnt-1];
    	return $extension;
    }
    
    if(is_uploaded_file($_FILES['fileField']['tmp_name']))
    	{
    			copy($_FILES['fileField']['tmp_name'],$uploaddir.'/'.$random.$_FILES['fileField']['name']);
    			$files[]=$_FILES['fileField']['name'];
    			$fileName = $_FILES['fileField']['name'];
    			$orgDir=$uploaddir.'/';
    			$extension=getExtension($fileName);
    
    $checkExtension = checkExtension($extension);
    if($checkExtension != 0)
    {
    echo("<script>alert('Invalid file format')</script>");
    echo("<script>location.href='redirectlocation.php'</script>");///////redirects to the place you want
    }
    }
    
    PHP:
     
    Stylesofts, Mar 13, 2009 IP
  5. rfeio

    rfeio Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks everyone!

    Any ideas on how I can do the search inside the documents?
     
    rfeio, Mar 13, 2009 IP
  6. shallowink

    shallowink Well-Known Member

    Messages:
    1,218
    Likes Received:
    64
    Best Answers:
    2
    Trophy Points:
    150
    #6
    shallowink, Mar 13, 2009 IP