1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Searching for the same words in 3 txt. files

Discussion in 'Programming' started by arandon, Mar 16, 2013.

  1. #1
    Hello,
    I'd like to ask help for the following problem. I have 3 txt. files - A, B and C - with 10-15k words in each. I'd like to know that how many same words (and which words) are included in A-B, A-C, B-C and in A-B-C files.
    Probably in Excel? How?
    Thanks for all help.
     
    Solved! View solution.
    arandon, Mar 16, 2013 IP
  2. #2
    I would write a program in PHP to split by word then add each word to a database. Then group the database by each word and include the totals in another column. You would end up with WORD | Number of Occurances in a table. What do you think?
     
    projectWORD, Mar 25, 2013 IP
  3. Feriscool

    Feriscool Greenhorn

    Messages:
    99
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    23
    #3
    Java.

    Input your file. Search through it with loops to find your specific word. Output the results.
     
    Feriscool, Mar 25, 2013 IP
  4. arandon

    arandon Active Member

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #4
    Thanks for both help. Sounds very logical, will try soon.
     
    arandon, Mar 26, 2013 IP
  5. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,667
    Likes Received:
    1,983
    Best Answers:
    250
    Trophy Points:
    515
    #5
    In PHP, preg_replace all runs of whitespace characters with a single space, then explode them to an array. You can then use PHP's array_diff function:

    http://php.net/manual/en/function.array-diff.php

    To compare those arrays and get back a list of words common to those files... If I have time later I'll try to remember to revisit this and toss together a quick demo of that.
     
    deathshadow, Mar 28, 2013 IP
  6. browntwn

    browntwn Illustrious Member

    Messages:
    8,347
    Likes Received:
    848
    Best Answers:
    7
    Trophy Points:
    435
    #6
    I do something like what you want in excel. First I would convert the text files to a single word per line file so I could drop each of your text files into a single excel column - with one word per cell. Then you just compare Column A to Column B. I put this formula in Column C. Basically it tells me for each entry in Column B, does that same value exist anywhere in Column A. You can then sort by those results so you have just the words that are in Columns A and B, etc.


    =IF(COUNTIF($A:$A, $B3)<>0, "In Column A", "Not In Column A")

    p.s. This is probably easier done in PHP, but I am old school and like to figure shit out with the tools I know.
     
    browntwn, Mar 28, 2013 IP
  7. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,667
    Likes Received:
    1,983
    Best Answers:
    250
    Trophy Points:
    515
    #7
    Here we go, actual working tested code.

    <?php
    
    $file1 = preg_split('/\W+/',file_get_contents('content1.txt'));
    $file2 = preg_split('/\W+/',file_get_contents('content2.txt'));
    $file3 = preg_split('/\W+/',file_get_contents('content3.txt'));
    
    echo '
    	<h1>Like words in files demo</h1>
    	
    	<h2>Words common to content1.txt and content2.txt</h2>
    	<pre>',print_r(array_diff($file1,$file2)),'</pre>
    	
    	<h2>Words common to content2.txt and content3.txt</h2>
    	<pre>',print_r(array_diff($file2,$file3)),'</pre>
    	
    	<h2>Words common to content1.txt and content3.txt</h2>
    	<pre>',print_r(array_diff($file1,$file3)),'</pre>
    	
    	<h2>Words common to all three files</h2>
    	<pre>',print_r(array_diff($file1,$file2,$file3)),'</pre>';
    ?>
    Code (markup):
    Laugh is I completely forgot about preg_split... which saves a step.
     
    deathshadow, Mar 28, 2013 IP
  8. matessim

    matessim Active Member

    Messages:
    514
    Likes Received:
    5
    Best Answers:
    1
    Trophy Points:
    70
    #8
    Few lines in Python, just create a dict mapping every key to the amount of appearances it has and process word word in each file.
     
    matessim, Mar 30, 2013 IP