1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Query looking for duplicates in different tables (2011-2016) and return flag for each year

Discussion in 'MySQL' started by pmf123, Mar 22, 2017.

  1. #1
    I have a group of tables of first,middle,last,street,city,state,zip,zip4phone - one for each year 2011-2016.

    I wish to make a query (php/mysql) that will return as follows, or similar:

    select first,middle,last,street,city,state,zip,zip4,2011flag,2012flag,2013flag,2014flag,2015flag,2016flage where first = 'john' and last = 'smith'

    so if the same name and address appears in more than one year table, i just want a flag to say that, not a duplicate line.
    so results may be as follows:

    john,a,smith,some street, some city, some state,zip,zip4,phone,y,y,y,n,n,n
    john,j,doe,some street, some city, some state,zip,zip4,phone,n,n,n,n,n,y
    lisa,g,jones,some street, some city, some state,zip,zip4,phone,n,n,y,y,n,n

    any help would be much appreciated?


    EXAMPLE:

    Table 2011
    Victor,X,Trigo,123 main st,Kearney,NJ,01110,1234,9734567890


    Table 2012
    Victor,X,Trigo,123 main st,Kearney,NJ,01110,1234,9734567890


    Table 2013
    Victor,X,Trigo,123 main st,Kearney,NJ,01110,1234,9734567890


    Table 2014
    Victor,X,Trigo,321 main st,Kearney,NJ,01110,1234,9734567890


    Table 2015
    Victor,X,Trigo,321 main st,Kearney,NJ,01110,1234,9734567890


    Table 2016
    Victor,X,Trigo,123 main st,Kearney,NJ,01110,1234,9734567890
    Victor,X,Trigo,321 main st,Kearney,NJ,01110,1234,9734567890



    Query where first = ‘victor’ and last = ‘trigo’ should give:


    first,middle,last,street,city,state,zip,zip4,phone,2011 flag,2012 flag,2013 flag,2014 flag,2015 flag,2016 flag


    Victor,X,Trigo,123 main st,Kearney,NJ,01110,1234,9734567890,y,y,y,n,n,y
    Victor,X,Trigo,321 main st,Kearney,NJ,01110,1234,9734567890,n,n,n,y,y,y
    SEMrush

    Match up entries with matching first,middle,last,street,city,state,zip,zip4 and set flag for each database they show in
     
    Last edited: Mar 22, 2017
    pmf123, Mar 22, 2017 IP
    SEMrush
  2. sarahk

    sarahk iTamer Staff

    Messages:
    23,827
    Likes Received:
    3,216
    Best Answers:
    91
    Trophy Points:
    615
    #2
    You can query each of the tables and join the results together with a "Union" and then use "Case" in each query to compare the current record to other years but you'll need to consider your database structure for the future.

    I tried to create a fiddle but the site seems to be playing up. Here's my simplified version of what you're doing - untested, unfortunately
    create table `test2011` (
      `id` int(11) not null auto_increment,
      `address` varchar(100) default null,
      `firstname` varchar(25) default null,
      `lastname` varchar(25) default null,
      primary key (`id`)
    );
    
    create table `test2012` (
      `id` int(11) not null auto_increment,
      `address` varchar(100) default null,
      `firstname` varchar(25) default null,
      `lastname` varchar(25) default null,
      primary key (`id`)
    );
    
    create table `test2013` (
      `id` int(11) not null auto_increment,
      `address` varchar(100) default null,
      `firstname` varchar(25) default null,
      `lastname` varchar(25) default null,
      primary key (`id`)
    );
    
    insert into  `test2011`
    values
    (null, '1 Beach Rd', 'John', 'Smith')
    , (null, '2 Queen St', 'Sally', 'Smith')
    , (null, '3 High St', 'Sally', 'Ridge');
    
    insert into  `test2012`
    values
    (null, '4 Main Rd', 'Peter', 'Williams')
    , (null, '2 Queen St', 'Sally', 'Smith')
    , (null, '5 View Rd', 'Gemma', 'Leigh');
    
    insert into  `test2013`
    values
    (null, '1 Beach Rd', 'John', 'Smith')
    , (null, '6 Queen St', 'Brenda', 'Allsop')
    , (null, '7 Valley St', 'Katrina', 'May');
    Code (markup):
    and the monster query, you have more years and more fields but you should be able to see the pattern
    select `test2011`.*
    , 'y' as `c2011`
    , case when `test2012`.`id` is null then 'n' else 'y' end as `c2012`
    , case when `test2013`.`id` is null then 'n' else 'y' end as `c2013`
    from `test2011`
    left join on `test2012`
    on (`test2012`.`firstname` = `test2011`.`firstname` and `test2012`.`lastname` = `test2011`)
    left join on `test2013`
    on (`test2013`.`firstname` = `test2011`.`firstname` and `test2013`.`lastname` = `test2011`)
    union
    select `test2012`.*
    , 'n' as `c2011`
    , 'y' as `c2012`
    , case when `test2013`.`id` is null then 'n' else 'y' end as `c2013`
    from `test2012`
    left join on `test2013`
    on (`test2013`.`firstname` = `test2012`.`firstname` and `test2013`.`lastname` = `test2012`)
    where not exists (select `id`
                      from `test2011`
                      where `test2011`.`firstname` = `test2012`.`firstname`
                        and `test2011`.`lastname` = `test2012`.`lastname`)
    union
    select `test2013`.*
    , 'n' as `c2011`
    , 'n' as `c2012`
    , 'y' as `c2013`
    from `test2013`
    where not exists (select `id`
                      from `test2011`
                      where `test2011`.`firstname` = `test2012`.`firstname`
                        and `test2011`.`lastname` = `test2012`.`lastname`)
    and not exists (select `id`
                      from `test2012`
                      where `test2012`.`firstname` = `test2013`.`firstname`
                        and `test2012`.`lastname` = `test2013`.`lastname`)
    Code (markup):
     
    sarahk, Mar 22, 2017 IP
  3. pmf123

    pmf123 Well-Known Member

    Messages:
    1,389
    Likes Received:
    69
    Best Answers:
    0
    Trophy Points:
    165
    #3
    I will try this out on the system when I am in front of it later tonight.

    I'm sorry but I don't know how to add the other years, these queries are beyond me, and I really appreciate your help.

    I don't know how modifying the data structure would help, I'm assuming you mean creating one database with all the tables combined and having these columns already. That would be hard and possibly impractical, as there are over 200 million records in each, and I get a new data file each year. I don't fancy one table with one billion + records... or am i missing something?
     
    pmf123, Mar 22, 2017 IP
  4. sarahk

    sarahk iTamer Staff

    Messages:
    23,827
    Likes Received:
    3,216
    Best Answers:
    91
    Trophy Points:
    615
    #4
    I'd recommend one table with the names and addresses, another with whatever the transaction is that they've done

    If you've got 200 million records in each table then this query is going to be a pig to test so you'll want to create a snapshot with fewer records until you have it right.
     
    sarahk, Mar 22, 2017 IP
  5. pmf123

    pmf123 Well-Known Member

    Messages:
    1,389
    Likes Received:
    69
    Best Answers:
    0
    Trophy Points:
    165
    #5
    its just names and addresses, no transactions...

    i can easily create test tables of a few hundred records with some duplicate entries to test this
     
    pmf123, Mar 22, 2017 IP
  6. sarahk

    sarahk iTamer Staff

    Messages:
    23,827
    Likes Received:
    3,216
    Best Answers:
    91
    Trophy Points:
    615
    #6
    I was using the word transaction to refer to the event that caused them to be recorded in your list for the year.

    I'm assuming you actually need this info and aren't just trying to dedupe the latest list. This will have a huge overhead and you might be better off creating temporary tables or views and processing it in parts. Does the server the database is on have anything mission critical? Any dramas if it crashes or slows down?

    My sql had a few glitches, this works
    
    select `test2011`.*
    , 'y' as `c2011`
    , case when `test2012`.`id` is null then 'n' else 'y' end as `c2012`
    , case when `test2013`.`id` is null then 'n' else 'y' end as `c2013`
    from `test2011`
    left join `test2012`
    on (`test2012`.`firstname` = `test2011`.`firstname` and `test2012`.`lastname` = `test2011`.lastname)
    left join  `test2013`
    on (`test2013`.`firstname` = `test2011`.`firstname` and `test2013`.`lastname` = `test2011`.lastname)
    union
    select `test2012`.*
    , 'n' as `c2011`
    , 'y' as `c2012`
    , case when `test2013`.`id` is null then 'n' else 'y' end as `c2013`
    from `test2012`
    left join  `test2013`
    on (`test2013`.`firstname` = `test2012`.`firstname` and `test2013`.`lastname` = `test2012`.lastname)
    where not exists (select `t11`.`id`
                      from `test2011` as `t11`
                      where `t11`.`firstname` = `test2012`.`firstname`
                        and `t11`.`lastname` = `test2012`.`lastname`)
    union
    select `test2013`.*
    , 'n' as `c2011`
    , 'n' as `c2012`
    , 'y' as `c2013`
    from `test2013`
    where not exists (select `id`
                      from `test2011` as `t11`
                      where `t11`.`firstname` = `test2013`.`firstname`
                        and `t11`.`lastname` = `test2013`.`lastname`)
    and not exists (select `id`
                      from `test2012` as `t12`
                      where `t12`.`firstname` = `test2013`.`firstname`
                        and `t12`.`lastname` = `test2013`.`lastname`)
    Code (markup):
     
    sarahk, Mar 22, 2017 IP
  7. pmf123

    pmf123 Well-Known Member

    Messages:
    1,389
    Likes Received:
    69
    Best Answers:
    0
    Trophy Points:
    165
    #7
    The server does have a few other databases on it that are queried a couple of times a second.

    No I'm trying to generate an address history for a selected name from our databases.

    Thanks again for your time, I will give this a go and see how the server fares.

    Like you said, it may be worthwhile doing the queries separately and storing temporarily, and then use another query to find the history.
     
    pmf123, Mar 22, 2017 IP
  8. sarahk

    sarahk iTamer Staff

    Messages:
    23,827
    Likes Received:
    3,216
    Best Answers:
    91
    Trophy Points:
    615
    #8
    ok, if you are doing it for a selected name the overhead will be a lot lower - just adding in the name to the where reduces the load.
     
    sarahk, Mar 22, 2017 IP