Advertise here with Carbon Ads

This site is made possible by member support. โค๏ธ

Big thanks to Arcustech for hosting the site and offering amazing tech support.

When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!

kottke.org. home of fine hypertext products since 1998.

๐Ÿ”  ๐Ÿ’€  ๐Ÿ“ธ  ๐Ÿ˜ญ  ๐Ÿ•ณ๏ธ  ๐Ÿค   ๐ŸŽฌ  ๐Ÿฅ”

Massive data analysis of NYC Citi Bike data

Late last year, Todd Schneider did a big data analysis of taxi and Uber usage in NYC. This morning, he posted the results of a similar analysis for Citi Bike.

But unlike the taxi data, Citi Bike includes demographic information about its riders, namely gender, birth year, and subscriber status. At first glance that might not seem too revealing, but it turns out that it’s enough to uniquely identify many Citi Bike trips. If you know the following information about an individual Citi Bike trip:

1. The rider is an annual subscriber
2. Their gender
3. Their birth year
4. The station where they picked up a Citi Bike
5. The date and time they picked up the bike, rounded to the nearest hour

Then you can uniquely identify that individual trip 84% of the time! That means you can find out where and when the rider dropped off the bike, which might be sensitive information. Because men account for 77% of all subscriber trips, it’s even easier to uniquely identify rides by women: if we restrict to female riders, then 92% of trips can be uniquely identified.