kottke.org posts about Metadata

Why metadata REALLY matters for the future of e-booksAug 09 2010

Okay, I'll chase ONE new story today. But it's about this fundamental problem of converting old media objects into new ones, and I get to dig up some old blog posts too, I feel like I'm still in character.

Google Books claims to have counted all the books in the world: "129,864,880 of them. At least until Sunday." But as Ars Technica points out, that number is dubiously wiki:

Google's counting method relies entirely on its enormous metadata collection--almost one billion records--which it winnows down by throwing out duplicates and non-book items like CDs. The result is a book count that's arrived at by a kind of process of elimination. It's not so much that Google starts with a fixed definition of "book" and then combs its records to identify objects with those characteristics; rather, the GBS algorithm seeks to identify everything that is clearly not a book, and to reject all those entries. It also looks for collections of records that all identify the same edition of the same book, but that are, for whatever reason (often a data entry error), listed differently in the different metadata collections that Google subscribes to.

But the problem with Google's count, as is clear from the GBS count post itself, is that GBS's metadata collection is a riddled with errors of every sort. Or, as linguist and GBS critic Geoff Nunberg put it last year in a blog post, Google's metadata is "train wreck: a mish-mash wrapped in a muddle wrapped in a mess."

It's not just Google that has a problem. I wrote a post for Wired.com last week ("Why Metadata Matters for the Future of E-books") about how increased reliance on metadata was affecting publishers of new books, who also depend heavily on digital search -- and generally how bibliographic and legal arcana around e-books affects what we see and how we come to see it more than you'd think.

But I wish I'd added Google's woeful records to the piece. It's not like I didn't know about it; here's the title of a post I wrote a year ago, also citing Nunberg's post when it first appeared at Language Log: "Scholars to Google: Your Metadata Sucks".

kottke.org

Front page
About + contact
Site archives

Subscribe

Follow kottke.org on Twitter

Follow kottke.org on Tumblr

Like kottke.org on Facebook

Subscribe to the RSS feed

Advertisement

Ads by The Deck

Support kottke.org shop at Amazon

And more at Amazon.com

Looking for work?

More at We Work Remotely

Kottke @ Quarterly

Subscribe to Quarterly and get a real-life mailing from Jason every three months.

 

Enginehosting

Hosting provided EngineHosting