How to read data faster?

Discussion in 'MySQL' started by aayybb, Jul 29, 2009.

  1. #1
    Hi,

    I have a text file of a random list with people's info-A.
    I also have a mysql table with existing customers' info -B.
    I am trying to read the A list line by line and compare (select.. from B where A.person = B.customer..) to info in B to see if the person in A list is in the B list also.

    Since the A list is huge (text file read by microsoft word is more than 3500 pages, per line per person) , the program takes a LONG time and still not finish comparing.

    Any suggestion for better program flow? Thanks for any help in advanced.
     
    aayybb, Jul 29, 2009 IP
  2. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #2
    Where is the B data located? 3500 pages is fairly long, but I've used php and mysql to parse text files over 2M pages, so I know it is definitely possible to manage data of this size. Can you explain how the data is stored (format), and the relationship between the 2 data sets.
     
    jestep, Jul 29, 2009 IP
  3. aayybb

    aayybb Peon

    Messages:
    128
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    A textfile is located at one of directories on the website which I will use PHP code to read it. It has info belong to people but it is much much bigger than B.
    B data is store in a MYSQL database table. It is customer info.
    I need to see if any of the A already exists in B then compile a new list of those A already exists in B and make a new table for it.
    A can have duplicate people. For example, there might be 20 entries of same person in A but with different interest for each entry.
    I am thinking to read the A line by line. Check if it exists in B (using SELECT statement) for each entry/line. If it exists then store(INSERT) it to the new table. If there are 2 million entries in A, it will run 2 million of SELECT statements which will take a long time.
     
    aayybb, Jul 29, 2009 IP
  4. ziv

    ziv Peon

    Messages:
    18
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Maybe consider bulk uploading the entire text file into a temporary table, then you can run the entire process with a Join.
    Does that make sense to you?
     
    ziv, Jul 29, 2009 IP
  5. aayybb

    aayybb Peon

    Messages:
    128
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Textfile A has a lot of info I don't need so I don't know if uploading it into a temp table will save processing time for me. I won't know until I try it. I am still thinking about how to read, compare and save some of the info in an efficient way. Thanks for the suggestion.
     
    aayybb, Jul 30, 2009 IP
  6. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #6
    The method you suggested would probably work. You will most likely need to temporary remove anything max execution parameter. I would test on a very small data set (few hundred lines) before running the full script.

    Is this a one time move or would you need to do this more times?
     
    jestep, Jul 31, 2009 IP
  7. chisara

    chisara Peon

    Messages:
    141
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #7
    If you are not able to do it using a table and a join, you can speed your comparison by sorting the input file-A by person.
    As long as the next line you read from the file has the same user id you don't need to query SQL, since your last query has the results already. If average person has 20 interests you should get a factor 20 reduction in select statements. by loading persons 1000 at a time and caching them in and hasharray you could speed it up even further.

    Alough I concur that you should convert your input file into CSV and use an import/load data system to quickly load the data in SQl and then use a join to find your results since this is what SQL is made for.

    For the netflix contest using Postgresql I perform joins with a table of 100M record and a table of 2M records so your small dataset should not be an issue :)
     
    chisara, Aug 3, 2009 IP
  8. basia

    basia Well-Known Member

    Messages:
    263
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    155
    #8
    Make sure you do a select on an indexed field. You might have to add an index to your table if one doesn't exist already.
     
    basia, Aug 5, 2009 IP