So, I have a table called finances and over time people may have one or many records in this table. I'm wanting to find the people who have the biggest gap between their records example data From this dummy data I'd expect member #1 to show up because the gap between '07 and '15 is the greatest. However I need the query to be smart enough to not show up the gap between '06 and '15 because it knows there is the '07 record. I've started with this query which finds me all the datediffs and doesn't worry about records in the middle SELECT f1.membership_id , ROUND(DATEDIFF(f2.startdate, f1.startdate)/365) as `gap` FROM `finances` as `f1`, `finances` as `f2` WHERE f1.membership_id = f2.membership_id and f1.startdate < f2.startdate order by gap DESC; Code (markup): So I add in subquery to omit the records where there is something in the middle - it runs, but it returns nothing. SELECT f1.membership_id , ROUND(DATEDIFF(f2.startdate, f1.startdate)/365) as `gap` FROM `finances` as `f1`, `finances` as `f2` WHERE f1.membership_id = f2.membership_id and f1.startdate < f2.startdate and NOT EXISTS (select * from `finances` `f3` where f3.membership_id = f1.membership_id and (f3.startdate between f1.startdate and f2.startdate) ) order by gap DESC; Code (markup): http://sqlfiddle.com/#!9/0b378/15/0 Any ideas?
Using the ordered table like you have (ordered on ID+Date), I would have the query only compare consecutive records for each ID. I don't use MySQL so can't give you any code samples and I am having a brain fart trying to do it in FoxPro.
Well first of all, the JOIN isn't going to be the best way to get the gap because it's going to return the first record in the database that matches (which might not even be the first since it's not based on a sort, rather the creation order (for the most part). It's probably a good use case for user variables where you can both set and pass the variable along to the next record like so... You end up creating a derived table so you can first sort the data in the order you want to evaluate the records, then you just end up doing a GROUP BY on the derived table. Anyway, I think this is what you want... SET @last_membership_id := 0, @last_startdate := '0000-00-00'; SELECT membership_id, MAX(gap) AS max_gap FROM ( SELECT *, ROUND(DATEDIFF(startdate, @last_startdate := IF(@last_membership_id = membership_id, @last_startdate, startdate)) / 365) as `gap`, @last_membership_id := membership_id, @last_startdate := startdate FROM `finances` ORDER BY membership_id, startdate ) as tmp_table GROUP BY membership_id ORDER BY membership_id; Code (SQL): Some additional info on MySQL user variables: https://dev.mysql.com/doc/refman/5.6/en/user-variables.html
Hmm, that pulled out some mental stuff I haven't used in a long while - oracle and all that. On my real data that gave me a max_gap of either null or 0. Null makes sense for those with only 1 finance record 0 would be a data entry stuff up Should be lots of 1s and some with bigger numbers.
I use sqlyog to hit the database - this isn't going into a script at this point - maybe sqlyog queries don't like the variables? If so, you'd expect to get errors. Out of sheer perversity I decide to try changing around the syntax of the query and would you believe that it was the BETWEEN that was broken! I changed it to straight date comparisons and it worked. A tad annoyed because I'd actually checked the mysql reference to make sure it worked on dates!
Both BETWEEN and date columns can be screwy in themselves, so when you combine the two, who knows what will happen. hehe I got so annoyed with MySQL time/date columns a few years ago I've never used them since... instead use INT(10) to store timestamps. Then I didn't have to deal with timezone issues or other weirdness since you can just deal with normal integers then.