Tuesday, April 16, 2013

Better Visualization Of Baby Sleep

My wife uses an app from Similac to track various things about our baby: diaper changes, sleep schedules, and feeding schedules. Friends of ours recommended it to us, and we've recommended it to other people.

But I find its visualization for sleep schedules lacking. Anything beyond the daily view just shows you how many hours your baby slept on a given day; it doesn't show you when on that day your baby slept. And if a sleep session starts on one day and ends on another, the hours are only counted for the first day. In the summary the app provides, this probably doesn't matter; the hours will even out. But it's confusing.

Fortunately, the app exports its data. So I figured I could get the export and then produce the chart I had in my head.

The first step was cleaning the data. Here's a sample of what the app sends.
Start Time4/11/13, 2:09 PM
Duration2 hrs 11 min
Time of DayDaytime
Laid Down AwakeNo
Not very program friendly, but it's easy enough to fix with awk.

function quote(string) {
   return "\"" string "\"" } BEGIN {    FS = "[, \t]"    print quote("line") "," quote("start_date") "," quote("start_time") "," quote("duration") } /Start Time/ {    date = $3    date_and_timestamp = quote(date) "," quote(date " " $5 " " $6) } /Duration.*hr.*min/ {    print quote(NR) "," date_and_timestamp "," quote((($2 * 60) + $4)) } /Duration.*hrs?$/ {    print quote(NR) "," date_and_timestamp "," quote(($2 * 60)) } $3 ~ /min$/ {    print quote(NR) "," date_and_timestamp "," quote($2) }

That turns a chunk of text like the one above (and its variations) into a line like this:

"8","4/11/13","4/11/13 2:09 PM","131"


(I add the line number at the beginning so that R has a primary key to work with on the import.)

Once I made the csv, I pulled it into R. As usual in R, drawing the chart was straightforward once the data was correct. Here's the meat of it:


rect(xleft=data$sleep_offset,

     ybottom=data$y_value-.25,

     xright=(data$sleep_offset + data$duration),

     ytop=data$y_value+.25,

     col=data$rect_color,

     border=NA)


But even my cleaned data needed some cleaning within R. The data the app exports suffers from the same problem when it comes to sleep sessions that cross midnight. You might see an entry for 04/11/13, 9:00 PM with a duration of five hours. But you won't see any data for 4/12/13, midnight to 2:00 AM.

So I first added new entries that duplicated those records, but provided a "start date" (which translates into the y axis) of the "other side of midnight" time frame. So the theoretical 9:00 PM sleep session above would produce another row of data where the start date was 04/12/13 and the start time was 04/11/13 9:00 PM. That meant that the "sleep offset" (position on the x-axis) was negative, which is what I wanted.

Finally, I wanted to draw any chunks of sleep that fell outside of a given date (because of the overlap) in a different color, so I broke the overlap rows into rows that would give, to use the same example, a row for 04/11/13 with a start time of 9:00 PM and a duration of 3 hours and a row for 04/12/13 from midnight to 2:00 AM. I stored the color in the data as well, so I could tell R to just draw the rectangles based on each row's coordinates and with the color specified in one of the fields.

Here's what I came up with. Note that this particular chart is based off of fake data (since I have a tool that makes that easy), because I didn't want to expose the baby's personal data for all the world to see. But the chart gives the gist.



Each horizontal line is a day, with the night before and morning after visible on the chart but unobtrusive. The dark green represents sleep sessions within that calendar day, spanning the time listed on the x-axis. I often feel that every visualization I make ends up being a small multiple, but it is true that I often want to quickly look at a large mass of data and make comparisons within it.

The visualization is a work in progress. I'd like to put the total on the right, and a friend suggested that I add a heat map to show when parents have the best chance of getting in a long nap.

But compare this grid to what, in the app, would simply be a line graph with a single number for each day. That doesn't tell you anything about, say, how long your baby's been sleeping at night this week versus last week. Or whether her individual sleep sessions have gotten longer. We recently put up blackout curtains, and so we can see what effect that's had on the baby's sleep. We can see particularly sleepless days (or particularly sleepful ones) and correlate to other variables such as her mood and sleep schedule.

Of course, even with the awk and R scripts written, the data still has to be emailed to me, and I still have to process it. But I think the extra detail is worth the small effort to get it.

No comments:

Post a Comment