Repeat after me: inbound links do not
Repeat after me: inbound links do not indicate either readership or influence. Plus, Technorati’s top 100 data remains dirty.
This site is made possible by member support. ๐
Big thanks to Arcustech for hosting the site and offering amazing tech support.
When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!
kottke.org. home of fine hypertext products since 1998.
Repeat after me: inbound links do not indicate either readership or influence. Plus, Technorati’s top 100 data remains dirty.
Reader comments
MattAug 12, 2004 at 10:13PM
Technorati's top 100 data remains dirty
A quick glance and I see plastic.com, which is only up there because it's a default link in tens of thousands of default blogspot templates (the pre-CSS versions, it was a dark template many used). Are there others? As much as I love Dave Pell and Davenetics, I don't recall seeing Davenetics linked in too many blogrolls. Where'd that one come from?
Am I missing other obvious problems in the top 100? Drudge should be news instead of a blog?
RamananAug 12, 2004 at 10:17PM
Agreed. Most LiveJournal, Xanga, AsianAvenue, etc., etc., users would be quite influential based on a measurement such as this. Many online web communities promote linking within the community; buddy lists and what have you. This sort of link-incest mucks up a metric such as inbound links.
MattAug 12, 2004 at 10:17PM
Ah, I see, Davenetics is in that default template as well. Here's a good example of a blog using the template.
dave pellAug 12, 2004 at 11:19PM
Hey, what can I say? I'm huge among template users from the late nineties, specifically those from Brazil. What I can't believe is that someone would actually believe that my third most popular site would somehow be LESS influential than the NY Times and CNN.
Just as a brief example, search for "my cat mister winters" on google. Guess who comes up numero uno?
And CNN and NY Times? Zilch.
Come on. I'm a player.
jkottkeAug 13, 2004 at 12:09AM
Here are some other problems with the Top 100 (ordering may have changed by the time you read this):
#2: ScottWater writes a .NET based blogging tool...the "powered by" banner links back to his site (example).
#5: MSN Groups?
#7: Photo Matt is primarily responsible for the open source Wordpress and for some weird reason (payment for all the free work he does?), he puts link to his Web site on the default template.
#10: Balmasque. Not quite sure the deal with this one, but it seems to have tons of incoming links from blog*spot sites that contain no content and have no incoming links themselves. Coud be legit, I guess. Template designer?
#11: Mike Little's Journalized, see #7.
#12: geek ramblings, see #7.
#16: interney.com provides a JavaScript-based stats tracker that you can put on your site.
#19: Penny Arcade! Weblog? Folks are probably there for the comics.
#25: Wanker.
#26: Erin at Suicide Girls, alterna-porn, but I think it counts as a weblog.
#27: Bryan Bell designs blog templates with a link back to his site on them.
Ok, stopping there. I realize it's hard, the list needs some gardening (which is probably the last thing on Technorati's to-do list), and scraping weblogs for links is a messy business, but when people are using this data for research or journalistic purposes, I have a bit of a problem with it. (Hypocrite! But this is only a weblog, so it's ok, right?)
BTW, the graph linked above appears in this month's Wired** accompanying a Clay Shirky article on mapping the different kinds of blogs on the power law curve. Wired being Wired, the graph looks pretty but by my reckoning at least four and possibly five of the data points don't belong there: Plastic, Davenetics, Penny Arcade, interney.com, and maybe Balmasque. Don't pretty graphs deserve fact checkers? And nevermind that, once again, links != influence. Traffic stats from Alexa would probably be more illuminating. Or PageRank-weighted link statistics from Google, kinda like what Daypop does with its Ranked By Daypop Score list.
**I believe it's the exact same data points, but I'll check (and correct) tomorrow when I get to work and have access to the magazine.
jkottkeAug 13, 2004 at 12:39AM
I'm also highly skeptical about the accuracy of Technorati's claim of tracking 3,500,000 blogs. I'm not saying they're being deliberately misleading (again, scraping blogs is a messy business) and I don't quite know how to go about proving/disproving it, but it just doesn't seem right. Google currently indexes 4.2 billion pages. If a typical weblog has just 10-15 pages and Google indexes all the weblogs that Technorati does**, weblogs comprise ~1% of all pages in Google's index. And that 10-15 pages figure may be low...Google indexes at least 7,000 pages from kottke.org.
** Certainly not a given, but you've got to think that while Technorati gets weblogs that Google misses, the reverse is also true.
MattAug 13, 2004 at 1:43AM
The real question is -- how can this make me money?
Michael S.Aug 13, 2004 at 6:14AM
Is this all links, ever? Technorati does have this information, but I've personally linked to Slate 233 times, and it's just not possible that I account for 5% of Slate's links. It also seems unlikely that less than 0.5% of weblogs have ever linked to the New York Times.
jkottkeAug 13, 2004 at 10:23AM
I believe it's the exact same data points, but I'll check (and correct) tomorrow when I get to work and have access to the magazine.
Looked at the graph in Wired and it's the same data.
MichaelAug 13, 2004 at 10:48AM
If one is going to look for weblog influence you have to look at links within the text of a page and
discount or even ignore sidebar links. This issue came up during all the power law stuff a year (?) or so ago. Trouble is,
At this time I have seen no indication that anyone is doing that kind of discrimination, and if they are,
how. The evidence, as you point out, is pretty clear that most sites count all links as equal.
Of course discriminating between sidebar links and editorial links would be a very difficult task, but I think
it's possible to develop some methodology that might work. Members at Technorati, for example, could be
asked to make a specific CSS class for sidebar/linkbar links, and then Technorati's algorithms could be
adjusted to weight those differently. They wouldn't get everyone, but if they got enough people to do it,
the information gained in the aggregate could be extrapolated to a larger population. Just a thought.
GeneAug 13, 2004 at 11:01AM
I always thought this was a better
take on influence. Also, Google should buy Technorati. Not only would they do a better job of spidering, they would prolly be able to develop a useful algorithm to estimate influence. In the meantime, I'm going to hire Wil Wheaton to pitch my new line of terry-cloth sweats.
Beerzie YoinkAug 13, 2004 at 11:04AM
Drudge is a blog? Hm.
timAug 13, 2004 at 11:53AM
rss feeds don't contain sidebar links. and i'd hazard a guess that smaller sites (your inbounders) more often provide full text feeds. so perhaps a combination approach, with full text rss weighted more heavily if present.
jim winsteadAug 13, 2004 at 12:02PM
i wouldn't be so quick to call penny arcade a non-weblog. the daily postings from the authors of the comic are about as weblog-ish as you can get.
ChristineAug 13, 2004 at 12:07PM
Photo Matt actually puts links to the 4-5 core developers of WordPress, not just himself. So using that reasoning, they should all be in the top 100 - and they are not. I think his site has actually reached that level of readership. (Then again, I've been reading it for years, so maybe I'm biased?)
jkottkeAug 13, 2004 at 12:21PM
i wouldn't be so quick to call penny arcade a non-weblog. the daily postings from the authors of the comic are about as weblog-ish as you can get.
The front page is definitely a weblog, but most of the inbound links are to the comics, not the weblog.
Photo Matt actually puts links to the 4-5 core developers of WordPress, not just himself. So using that reasoning, they should all be in the top 100 - and they are not. I think his site has actually reached that level of readership. (Then again, I've been reading it for years, so maybe I'm biased?)
They are all in the top 100. Here's an example of a site that uses the default template. Three of the four developers linked there I listed in the post above and the fourth one (Alex King) is at #13 (I forgot to include that one). So Matt gets most of his links from the default Wordpress template. Actually, you can kinda tell which of the top 100 are legit by comparing the # of incoming blogs with the # of incoming links for each blog...if they are almost the same, the numbers for that site are artificially inflated.
JordanAug 13, 2004 at 1:14PM
I'm #84 on the Top 100 because my site is one of the links in the default b2evolution template. I feel a little guilty about it. A little. If they used a PageRank-like scheme instead, this would not be the case, since almost all of those b2evo sites just are half-baked blogs with one or two "Testing!" entries.
MattAug 13, 2004 at 1:16PM
I suppose one could think of Drudge as a sort of news-centric "Remaindered Links", though they are clearly not remaindered, but the focus.
Craig C.Aug 13, 2004 at 2:14PM
I'm just endlessly amused that Ensign Crusher is more popular than President Bush.
Peter CooperAug 13, 2004 at 5:34PM
It's a bit old school, but other than RSS your crawler could always look at links, see if there are lots more in proximity on new lines (or as list items) and ignore those. Very few blogs post long lists of links in posts, particularly on new lines or in lists, whereas this is how nearly all blogrolls are done. With the popularity of RSS now, however, this shouldn't be a issue, although a couple of years ago (before MT was widespread), it would have worked.
Steven MarshallAug 16, 2004 at 11:21PM
It's all just a popularity contest anyway
Ryan C.Oct 06, 2004 at 3:34PM
Craig C.: That's because his speech and writing are better than The Pres. ;)
Randy PetermanOct 06, 2004 at 4:29PM
I write a Statistics plugin for WordPress and if there's one thing I've learned in all of the coding: trends change. When Google is old and crusty (it will be some day, unless they re-invent themselves) we'll all be glad we use search engine X (if there are still search engines). I link to PhotoMatt.net in my blog regularly because he's often got good content. Just like I link to kottke.org when I find something in remainders useful or fun.
P ScottOct 07, 2004 at 11:11AM
You claim that Drudge is not a blog but news.
Could a newsblog just be links to other sites without commentary? Why not?
Besides Drudge is in many blog's blogroll, so most do think of Drudge as a blog.
This thread is closed to new comments. Thanks to everyone who responded.