Need Text Parsing Script

Discussion in 'Services' started by mikem79, Mar 20, 2007.

  1. #1
    I need a script that reads in records from a text file, cleans them up then outputs them to two other text files. The input file comes from the ebay API and consists of the titles of things for sale on ebay. Below is outline of the control panel followed by an example of input / output:

    Control Panel
    Name of input file: whatever.txt
    Name to assign to output file #1 (urls): something
    Suffix for output file #1: (.txt, .word, etc. and so on....check boxes plus a "wild card")
    Name to assign to output file #2: somethingelse
    Suffix for output file #2 (links): (.txt, .word, etc. and so on....check boxes plus a "wild card")
    output url: htttp://www.example.com (intentional typo)
    subdirectory: select "none" or designate a name... "guitars" in this example
    word seperators: select "dash" or "underscore" or "none" (check box)
    file suffix: select ".html" or ".htm" or ".shtml" ".php" (and so on....check boxes plus a "wild card")
    append <br> to end of links?: yes or no (check boxes)

    Strong error trapping needed on control panel as well as a progress indicator and report on total time taken to run the script (e.g. "finished in 3.7654 seconds"....this sort of overly exact stuff makes me feel like a real techie...LOL).


    Three typical records form input file:
    NEW BLUE CUTAWAY Acoustic Guitars W/ CASE & GUITAR PACK
    HUMPHREY AUDIO MODS New Digitech 'BADDER' Monkey OD
    PINK ACOUSTIC kids Child GUITARS W/ CASE & ACCESSORIES

    Output file #1 ( urls ) (intentional typos to show entire url)
    htttp://www.example.com/guitars/new-blue-cutaway-acoustic-guitars-case-guitar-pack.html
    htttp://www.example.com/guitars/humphrey-audio-mods-new-digitech-badder-monkey-od.html
    htttp://www.example.com/guitars/pink-acoustic-kids-child-guitars-case-accessories.html

    Output file #2 (links) (intentional typos in link so it shows as text)
    <a hrref="http://www.example.com/guitars/new-blue-cutaway-acoustic-guitars-case-guitar-pack.html">new blue cutaway acoustic guitars case guitar pack</a><br>
    <a hrref="http://www.example.com/guitars/humphrey-audio-mods-new-digitech-badder-monkey-od.html">humphrey audio mods new digitech badder monkey od</a><br>
    <a hrref="http://www.example.com/guitars/pink-acoustic-kids-child-guitars-case-accessories.html">pink acoustic kids child guitars case accessories</a><br>

    The idea is take whatever the input is from ebay (or elsewhere) and clean it up into "normal" looking text for use in making urls and links. I am not sure of all of the filters that will be necessary, but I can send you much longer input files if that would be helpful. However, God only knows what people will type into ebay, so imagine that one big filter that removes everything that is not "a to z" or "zero to 9" and forces everything to lower case will probably suffice. Please advise.

    I am looking for a server side solution. My host is on PHP 4.4.1 and there are a gazillion Perl modules installed, so I guess that means it can be a Perl script too if that is the language you want to use.

    Please PM with a fixed price and time frame for completion.

    Thanks.
     
    mikem79, Mar 20, 2007 IP