Advertise here with Carbon Ads

This site is made possible by member support. โค๏ธ

Big thanks to Arcustech for hosting the site and offering amazing tech support.

When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!

kottke.org. home of fine hypertext products since 1998.

๐Ÿ”  ๐Ÿ’€  ๐Ÿ“ธ  ๐Ÿ˜ญ  ๐Ÿ•ณ๏ธ  ๐Ÿค   ๐ŸŽฌ  ๐Ÿฅ”

kottke.org posts about Todd Schneider

Massive data analysis of NYC Citi Bike data

Late last year, Todd Schneider did a big data analysis of taxi and Uber usage in NYC. This morning, he posted the results of a similar analysis for Citi Bike.

But unlike the taxi data, Citi Bike includes demographic information about its riders, namely gender, birth year, and subscriber status. At first glance that might not seem too revealing, but it turns out that it’s enough to uniquely identify many Citi Bike trips. If you know the following information about an individual Citi Bike trip:

1. The rider is an annual subscriber
2. Their gender
3. Their birth year
4. The station where they picked up a Citi Bike
5. The date and time they picked up the bike, rounded to the nearest hour

Then you can uniquely identify that individual trip 84% of the time! That means you can find out where and when the rider dropped off the bike, which might be sensitive information. Because men account for 77% of all subscriber trips, it’s even easier to uniquely identify rides by women: if we restrict to female riders, then 92% of trips can be uniquely identified.


Massive data analysis of NYC taxi and Uber data

Todd Schneider used a couple publicly available data sets (NYC taxis, Uber) to explore various aspects of how New Yorkers move about the city. Some of the findings include the rise of Uber:

Let’s add Uber into the mix. I live in Brooklyn, and although I sometimes take taxis, an anecdotal review of my credit card statements suggests that I take about four times as many Ubers as I do taxis. It turns out I’m not alone: between June 2014 and June 2015, the number of Uber pickups in Brooklyn grew by 525%! As of June 2015, the most recent data available when I wrote this, Uber accounts for more than twice as many pickups in Brooklyn compared to yellow taxis, and is rapidly approaching the popularity of green taxis.

…the plausibility of Die Hard III’s taxi ride to stop a subway bombing:

In Die Hard: With a Vengeance, John McClane (Willis) and Zeus Carver (Jackson) have to make it from 72nd and Broadway to the Wall Street 2/3 subway station during morning rush hour in less than 30 minutes, or else a bomb will go off. They commandeer a taxi, drive it frantically through Central Park, tailgate an ambulance, and just barely make it in time (of course the bomb goes off anyway…). Thanks to the TLC’s publicly available data, we can finally address audience concerns about the realism of this sequence.

…where “bridge and tunnel” folks go for fun in Manhattan:

The most popular destinations for B&T trips are in Murray Hill, the Meatpacking District, Chelsea, and Midtown.

…the growth of north Williamsburg nightlife:

Taxi Uber Data

…the privacy implications of releasing taxi data publicly:

For example, I don’t know who owns one of theses beautiful oceanfront homes on East Hampton’s exclusive Further Lane (exact address redacted to protect the innocent). But I do know the exact Brooklyn Heights location and time from which someone (not necessarily the owner) hailed a cab, rode 106.6 miles, and paid a $400 fare with a credit card, including a $110.50 tip.

as well as average travel times to the city’s airports, where investment bankers live, and how many people pay with cash vs. credit cards. Read the whole thing and if you want to play around with the data yourself, Schneider posted all of his scripts and knowhow on Github.

Update: Using summaries published by the New York City Taxi & Limousine Commission, Schneider takes a look at how taxi usage in NYC is shrinking and how usage of Uber is growing.

This graph will continue to update as the TLC releases additional data, but at the time I wrote this in April 2016, the most recent data shows yellow taxis provided 60,000 fewer trips per day in January 2016 compared to one year earlier, while Uber provided 70,000 more trips per day over the same time horizon.

Although the Uber data only begins in 2015, if we zoom out to 2010, it’s even more apparent that yellow taxis are losing market share.

Lyft began reporting data in April 2015, and expanded aggressively throughout that summer, reaching a peak of 19,000 trips per day in December 2015. Over the following 6 weeks, though, Lyft usage tumbled back down to 11,000 trips per day as of January 2016 โ€” a decline of over 40%.