Parsing Large Text File with line Break Issues.

Discussion in 'PHP' started by rederick, Feb 4, 2007.

  1. #1
    Hi,

    wondering if someone can assist me with this: I have a large text file 241Megs - The format of the file is
    "20528039"|"0"|"0"|"0.120"|"Vacaville"|""

    So it's separated by Pipes and double quotes - no problem. I am using this code to parse it.

    
    $output = fgets($handle);    		
    $output = trim($output);
    $output = ltrim($output,'"'); //strip off first quote
    $output = substr($output,0,strlen($output)-1); //strip off last quote
    $output_array = explode('"|"',$output); 
    			
    			for($j=0;$j < sizeof($output_array);$j++)
    			{	
    								
    				
    		$outputs .= trim($columns[$j])." = '".mysql_real_escape_string(trim($output_array[$j])) ."',\n";
    				
    			}
    			$outputs = rtrim($outputs,",\n");
    PHP:

    So That works fine for the most part. The problem that I am having is that it seems that some of these lines in the text file have line breaks within the delimiters something like this
    "20528039"|"0"|"0"|"0.120"|"Vacaville"|"This is some
    text
    with
    the line breaks"|


    Now I "think" that fgets() is getting confused by this and giving me the incomplete line. I know there is a way to tell fgets the line size - but I can't seem to find the best way to determine this value.

    I hope this makes some sense - any suggestions for me would be appreciated.

    Thank you
    Red.
     
    rederick, Feb 4, 2007 IP
  2. Psychotomus1

    Psychotomus1 Banned

    Messages:
    411
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    use the split() function.
     
    Psychotomus1, Feb 4, 2007 IP
  3. rederick

    rederick Peon

    Messages:
    128
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #3
    use split() instead of explode()?

    - or are you proposing that I read the whole file into an array? It's really huge so I have to read it one line at a time...
     
    rederick, Feb 4, 2007 IP
  4. Psychotomus1

    Psychotomus1 Banned

    Messages:
    411
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #4
    woops. i ment explode. didnt see you were allready using it. explode should include multi-lines.
     
    Psychotomus1, Feb 4, 2007 IP
  5. clancey

    clancey Peon

    Messages:
    1,099
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    0
    #5
    The following approach seems to work . . . assuming the start of each record is always the same:

    
    $aline = "";
    $output = "";
    while (!feof($handle))
    	{
    	$aline = fgets($handle);
    	if( preg_match( '/^"[0-9]+"\|/', $aline) )
    		{
    		$output = preg_replace( '/[\r\n]+/'," ", $output);
    		$output = trim($output);
    		$output = ltrim($output,'"'); //strip off first quote
    		$output = substr($output,0,strlen($output)-1); //strip off last quote
    		$output_array = explode('"|"',$output);
    		$output = "";
    		}
    	$output .= $aline;
    	}
    
    PHP:
     
    clancey, Feb 4, 2007 IP
  6. rederick

    rederick Peon

    Messages:
    128
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thank you Clancey, your help led me to the solution.

    I ended up detecting the end of the string character - which always was a "
    If I had not reached that character- get another line until I do.

    It's not the best solution but it works.


    
     $handle = fopen($files, "r") or die(mailer($files." filename not found"));
        if ($handle) {
        while (!feof($handle)) {	
    
    $output = fgets($handle); 
        	    
        		if (!feof($handle)){
    	    			///is not the end of line get more
    	    		if ($output{strlen($output)-3} != '"'){
    	    	    		
    	    	    		while ($bits{strlen($bits)-3} != '"' && !feof($handle)){
    	    	    			$bits .= fgets($handle);
    	    	    			
    	    	    		}
    	   	    			$output .=$bits;
    	   	    			$bits = "";
    	    	    		
    	    	    }
    			$output = trim($output);
    			$output = ltrim($output,'"');
    		    	$output = substr($output,0,strlen($output)-1);
    		    	$output_array = explode('"|"',$output);
    	
    		    	
    				for($j=0;$j < sizeof($output_array);$j++)
    				{	
    									
    					
    					$outputs .= trim($columns[$j])." = '".mysql_real_escape_string(trim($output_array[$j])) ."',\n";
    					
    				}
    				$outputs = rtrim($outputs,",\n");
    
    PHP:
     
    rederick, Feb 5, 2007 IP