Tags and kottke.org  AUG 17 2005

A few months ago, I began tagging my remaindered links with keywords toward some still-unspecified goal. For instance, this recent post about an interview with Ruth Reichl got tagged with "nyc food restaurants ruthreichl books interviews". As I said, I haven't figured out what to do with them yet, but the other day I whipped up a little PHP script to see how the kottke.org tagspace was shaping up. Here are a few results:

# of entries tagged: 933
total # of tags: 3960
# of distinct tags: 1376
tags per entry: 4.244

Most popular tags (#):
science (80)
nyc (80)
movies (80)
business (73)
food (68)
photography (62)
funny (57)
books (53)
lists (53)
www (43)
music (43)
weblogs (40)
art (39)
design (34)
restaurants (34)
sports (34)
apple (33)
google (32)
technology (29)
nostalgia (27)

That's a fairly accurate description of both what the site is about and what I am interested in. Two of my favorite tags are "lists" and "bestof". Here's a sampling from each of those tags:

100 people who are qualified to carry the "Bad Mothafucka" wallet besides Pulp Fiction's Jules Winfield
Photo essay of the Hubble Telescope's top ten discoveries
50 Things to Do with Your iPod
Twelve ways to think differently
Pickup Lines Used by Mario [of Mario Bros. fame]
20 things gamers want from the next generation of game consoles
Money Magazine on the 50 smartest things you can do with your money
40 things that only happen in the movies
24 different ways to lace your shoes

Is Shaq the greatest NBA player of all time?
Spin names Radiohead's OK Computer the best album from the last 20 years
BusinessWeek Design Award winners for 2005
BBC Radio 4 poll results for Greatest Philosopher Ever!!
New bookmark: interesting Flickr photos from the last 24 hours, automagically determined

The dream is to go back and tag every single entry on the site -- currently ~8700 -- but it would take me approximately forever and I'm not sure it's worth the time and debilitating injuries to my wrists and fingers from all the typing. I've thought about a few alternative approaches (and their associated downsides):

  • Feed all my URLs into del.icio.us via the API and scrape out the tags most commonly associated with those links and posts. I literally haven't looked at the API, so I don't know if this is even possible. Also, I'm not sure I want to trust the del.icio.us community to collaboratively tag my posts and links...there would probably be a significant amount of correction and addition of tags by hand.
  • Use Yahoo's Term Extraction service to build a list of keywords based on an analysis of my posts and the content of the pages I point to within a post or remaindered links. I have no idea how well this would work in practice, especially in returning terms that make good tags. Probably a lot of hand-correction here too.
  • Getting my readers (that's you!) to tag them for me using the list of tags I've already used as a guideline. Unfortunately, you should never trust anyone over 30 or anyone who has access to a HTML textarea into which they can type anything they want. Given enough time, I could probably come up with a system that minimizes the damage a particular malcontent could do, but as with the other two options, I'm still left with a fair amount of correction by hand. A bigger problem I have with this option is there's a lot in it for me (and the site), but I'm not sure there's any real incentive for any of you to spend 20 minutes tagging kottke.org posts (I believe this chore would be the first entry in the dictionary under "mindless busywork"), so I'd feel weird about asking.
  • Some combination of the above approaches.

So yeah, that's where I am with the tagging.

Read more posts on kottke.org about:
kottke.org   tags


Front page
About + contact
Site archives


Follow kottke.org on Twitter

Follow kottke.org on Tumblr

Like kottke.org on Facebook

Subscribe to the RSS feed


Ads by The Deck

Support kottke.org shop at Amazon

And more at Amazon.com

Looking for work?

More at We Work Remotely

Kottke @ Quarterly

Subscribe to Quarterly and get a real-life mailing from Jason every three months.



Hosting provided EngineHosting