SQL time aggregate functions
Here’s some thinking out loud.
How do I construct a SQL aggregate function over time to measure wall clock hours? Say I work in corporate security and I have information about employees entering and leaving a building so I know how long Jim and Pam were in the building individually and I want to know how many minutes *anybody* was in the building? A security log might look like this:
| Person | Enter | Exit | aggregate minutes |
|---|---|---|---|
| Jim | 7:00AM | 11:30AM | 270 |
| Pam | 8:00AM | 9:00AM | 60 |
| Pam | 10:00AM | 11:45AM | 105 |
From such a log, I’ve got a total of 285 minutes when someone was in the building. In a standard language like C or perl this might not be too hard to solve by iterating over the records and a bit array for minutes of the day which could be set “on” when people are present, or by iterating through records (sort by start time and duration and compute overlaps and do some counting) but right now I’m not sure how to do it in SQL.
More thinking out loud to come as I make progress or get increasingly frustrated with this…
Update: I think I figured out the right way to approach this. As far as I know, no DB server implementation offers such a time aggregate facility so that’s probably not the right tool to throw at this job. This is data I have (and need this computation performed on) is in a database built on a Ruby on Rails web application and the simpler way to handle this will probably to recalculate that aggregate value as records are entered or modified from the web UI and then retrieve that value directly from the database rather than trying to compute it from the constituent records. Design-wise I’m not sure which approach I prefer. This is a pivot on DB data and feels best performed in some post-processing context (with the flexibility to pivot in different ways and see “minutes in the office [by day|by people in a department|by hour|etc]”) but I also know there is only one axis this will ever be pivoted on, so there’s a fine case for removing that business logic from the database. And since I don’t know how to do it in SQL, that’s the way I’ll probably go ![]()
rolando said,
September 12, 2008 @ 10:40 am
i think before you solve this brain buster you should learn about tags, Confusion McSloppyTable.
rolando said,
September 12, 2008 @ 10:41 am
ahh, you improved it since i posted…and your comment thing erased my tag
rolando said,
September 12, 2008 @ 10:41 am
pre tag damnit!
Patrick said,
September 12, 2008 @ 10:45 am
Why would I send a <pre> tag to do a <th>’s job?
Scott Sorheim said,
October 2, 2008 @ 9:57 am
Wow, this is really the last thing I needed to get distracted by today, but for some reason I couldn’t resist. I set up a basic table the had StartDate and EndDate columns. I pumped in a few lines of sample data, then, I created this query:
SELECT MinStart, MaxEnd FROM (
SELECT TOP 100 PERCENT
COALESCE((
SELECT min(StartDate)
FROM EmpTimes b
WHERE a.StartDate between b.StartDate and b.EndDate
), 0
) AS MinStart,
COALESCE((
SELECT max(enddate)
FROM EmpTimes b
WHERE a.EndDate between b.StartDate and b.EndDate
), 0
) AS MaxEnd
FROM EmpTimes a
ORDER BY StartDate, EndDate
) AS T1
GROUP BY MinStart, MaxEnd
Which, for my small set of data anyway, groups all of the overlapping chunks of time. I got something like this:
MinStart | MaxEnd
2008-10-01 09:00:00.000 | 2008-10-01 17:00:00.000
2008-10-01 18:00:00.000 | 2008-10-01 19:00:00.000
2008-10-01 19:30:00.000 | 2008-10-01 22:00:00.000
From here on out, the only work left would be to sum the minutes from those blocks of time.
Is there a prize?! There’s probably an easier way, but that was my first shot and it worked.