homeabout kottke.orgarchives + tags

Massive data analysis of NYC Citi Bike data

posted by Jason Kottke   Jan 13, 2016

Late last year, Todd Schneider did a big data analysis of taxi and Uber usage in NYC. This morning, he posted the results of a similar analysis for Citi Bike.

But unlike the taxi data, Citi Bike includes demographic information about its riders, namely gender, birth year, and subscriber status. At first glance that might not seem too revealing, but it turns out that it’s enough to uniquely identify many Citi Bike trips. If you know the following information about an individual Citi Bike trip:

1. The rider is an annual subscriber
2. Their gender
3. Their birth year
4. The station where they picked up a Citi Bike
5. The date and time they picked up the bike, rounded to the nearest hour

Then you can uniquely identify that individual trip 84% of the time! That means you can find out where and when the rider dropped off the bike, which might be sensitive information. Because men account for 77% of all subscriber trips, it’s even easier to uniquely identify rides by women: if we restrict to female riders, then 92% of trips can be uniquely identified.

We Work Remotely