Since swearing off Technorati a couple of years ago, I’ve been checking back every few months to see if the situation has improved. The site is definitely more responsive but their data problems seemingly remain, at least with regard to kottke.org; Google Blog Search gives consistently better results and easy access to RSS feeds of searches.
Technorati recently introduced something called the Technorati Authority number, which is a fancy name for the number of blogs linking to a site in the last six months. Curious as to where kottke.org fell on the authority scale, I checked out the top 100 blogs list. Not there, so I proceeded to the “Everything in the known universe about kottke.org” page where a portion of that huge cache of kottke.org knowledge was the authority number: 5,094. Looking at the top 100 list, that should put the site at #47, nestled between The Superficial and fishki.net, but it’s not there. Technorati also currently states that kottke.org hasn’t been updated in the last day, despite several updates since then and my copy of MT pinging Technorati after each update.
Maybe kottke.org has been intentionally excluded because I’ve been so hard on them in the past. Or maybe it’s just a glitch (or two) in their system. Or maybe it’s an indication of larger problems with their service. Either way, as the company is attempting to offer an authentic picture of the blogosphere, this doesn’t seem like the type of rigor and accuracy that should send reputable media sources like the BBC, Washington Post, NY Times, and the Wall Street Journal scurrying to their door looking for reliable data about blogs.
Update: As of 3:45pm EST, the top 100 list has been updated to include kottke.org. The site also picked up this post right away, but failed to note a subsequent post published a few minutes later..
Kevin Burton looks at the Technorati “data” and discovers that since the number of daily postings is growing linearly, the number of active blogs is probably growing lineary too…which means that the exponential growth of the blogosphere touted repeatedly by Technorati and parroted by mainstream media outlets is actually the growth of dead blogs.
Update: I fucked up on this post and you should reread it if you’ve read it before. After reading this post by Niall Kennedy, I checked and found that I have mentioned or linked to the site for Freakonomics 5 times (1 2 3 4 5), not 13. The other 8 times, I either linked to a post on the Freakonomics blog that was unrelated to the book, had the entry tagged with “freakonomics” (tags are not yet exposed on my site and can’t be crawled by search engines), or I used the word “Freakonomists”, not “Freakonomics”. Bottom line: the NY Times listing is still incorrect, Google and Yahoo picked up all the posts where I actually mentioned “Freakonomics” in the text of the post but missed the 2 links to freakonomics.com, Google Blog Search got 2/3 (& missed the 2 links), Technorati got 1/3 (& missed the 2 links), and IceRocket, Yahoo Blog Search, BlogPulse, & Bloglines whiffed entirely. Steven Levitt would be very disappointed in my statistical fact-checking skills right now. :(
I wish Niall had emailed me about this instead of posting it on his site, but I guess that’s how weblogs work, airing dirty laundry instead of trying to get it clean. Fair enough…I’ve publicly complained about the company he works for (Technorati) instead of emailing someone at the company about my concerns, so maybe he had a right to hit back. Perhaps a little juvenile on both our parts, I’d say. (Oh, and I turned off the MT search thing that Niall used to check my work. I’m not upset he used it, but I’m irritated that it seems to be on by default in MT…I never intended for that search interface to be public.)
The NY Times recently released their list of the most blogged about books of 2005. Their methodology in compiling the list:
This list links to a selection of Web posts that discuss some of the books most frequently mentioned by bloggers in 2005. The books were selected by conducting an automated survey of 5,000 of the most-trafficked blogs.
Unsurprisingly, the top spot on the list went to Freakonomics. I remembered mentioning the book several times on my site (including this interview with author Steven Levitt around the release of the book), so I checked out the citations they had listed for it. According to the Times, Freakonomics was cited by 125 blogs, but not once by kottke.org, a site that by any measure is one of the most-visited blogs out there. A quick search in my installation of Movable Type yielded
13 5 mentions of the book on kottke.org in the last 9 months. I had also mentioned Blink, Harry Potter, Getting Things Done, Collapse, The Wisdom of Crowds, The Singularity is Near, and State of Fear, all of which appear in the top 20 of the Times’ list and none of which are cited by the Times as having been mentioned on kottke.org in 2005.
I chalked this up to a simple error of omission, but then I started checking around some more. Google’s main index returned only three distinct mentions of Freakonomics on kottke.org. Google Blog Search returned two results. Yahoo: 3 results (0 results on Yahoo’s blog search). Technorati only found one result (I’m not surprised). Many of the blog search services don’t even let you search by site, so IceRocket, BlogPulse, and Bloglines were of no help. (See above for corrections.) I don’t know where the Times got their book statistics from, but it was probably from one of these sites (or a similar service).
Granted this is just one weblog, which I only checked into because I’m the author, but it’s not like kottke.org is hard to find or crawl. The markup is pretty good , fairly semantic, and hasn’t changed too much for the past two years. The subject in question is not off-topic…I post about books all the time. And it’s one of the more visible weblogs out there…lots of links in to the front page and specific posts and a Google PR of 8. So, my point here is not “how dare the Times ignore my popular and important site!!!” but is that the continuing overall suckiness of searching blogs is kind of amazing and embarrassing given the seemingly monumental resources being applied to the task. It’s forgivable that the Times would not have it exactly right (especially if they’re doing the crawling themselves), but when companies like Technorati and Google are setting themselves up as authorities on how large the blogosphere is, what books and movies people are reading/watching, and what the hot topics online are but can’t properly catalogue the most obvious information out there, you’ve got to wonder a) how good their data really is, and b) if what they are telling us is actually true.
 Full disclosure: I am the author of kottke.org.
 This is an important point…these observations are obviously a starting point for more research about this. But this one hole is pretty gaping and fits well with what I’ve observed over the past several months trying to find information on blogs using search engines.
 I say only pretty good because it’s not validating right now because of entity and illegal character errors, which I obviously need to wrestle with MT to correct at some point. But the underlying markup is solid.
Google finally launches a blog search service. The default search is by relevance, which I’m not sure is correct, and it’s pretty bare bones so far, but I’m sure that many other people will be saying so long, Technorati. Also available in Blogger flavor. (via waxy)
That’s it. I’ve had it. No more Technorati. I’ve used the site for, what, a couple of years now to keep track of what people were saying about posts on kottke.org and searching blogs for keywords or current events. During that time, it’s been down at least a quarter of the time (although it’s been better recently), results are often unavailable for queries with large result sets (i.e. this is only going to become a bigger problem as time goes on), and most of the rest of the time it’s slow as molasses.
When it does return results in a timely fashion for links to kottke.org, the results often include old links that I’ve seen before in the results set, sometimes from months ago. And that’s to say nothing of the links Technorati doesn’t even display. The “kottke.org” smart list in my newsreader picks up stuff that Technorati never seems to get, and that’s only pulling results from the ~200 blogs I read, most of which are not what you’d call obscure. What good is keeping track of 14 million blogs if you’re missing 200 well-known ones? (And trackbacks perform even better…this post got 159 trackbacks but only 93 sites linking to it on Technorati.)
Over the past few months, I’ve been comparing the results from PubSub to those of Technorati and PS is kicking ass. Technorati currently says that 19 sites have linked to me in the past 6 days (and at least four of those are old and/or repeats…one is from last September, fer chrissakes) while PubSub has returned 38 fresh, unrepeated results during that same time. (Not that PubSub is all roses and sunshine either…the overlap between the result sets is surprisingly small.)
While their search of the live web (the site’s primary goal) has been desperately in need of a serious overhaul, Technorati has branched out into all sorts of PR-getting endeavors, including soundbiting the DNC on CNN, tags (careful, don’t burn yourself on the hot buzzword), and all sorts of XML-ish stuff for developers. Which is all great, but get the fricking search working first! As Jason Fried says, better to build half a product than a half-assed product. I know it’s a terrifically hard problem, but Figure. It. Out.
As for the acquisition rumors, I don’t know who’d buy such a mess, but if someone does, I look forward to them improving it to a usable level. Pretty much everyone I talk to in the industry thinks the site sucks and we’ve just been waiting for it to get better because, well, it would have to at some point, wouldn’t it? Well, I’m tired of waiting. Goodbye, Technorati…your url will darken the door of my browser no longer.
Update: For the short amount of time I’ve been using it, IceRocket’s blog search seems to work quite well. Thanks to Kevin for pointing me in that direction.
I love that Davenetics still shows up in these graphs of the top blogs on Technorati. I read Davenetics daily but the only reason it is on the list is because it’s linked in a default Blogger template. If T’rati actually looked at their “statistics” instead of just using them to market to us, this sort of thing is pretty easy to spot (if the ratio of the # of links vs. the # of sites linking is close to 1.0, the site may not belong on the list). (Oh, and Binary Bonsai is suspect as well…its high rank is at least partially due to a default link on a popular Wordpress template.)