Advertise here with Carbon Ads

This site is made possible by member support. โค๏ธ

Big thanks to Arcustech for hosting the site and offering amazing tech support.

When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!

kottke.org. home of fine hypertext products since 1998.

๐Ÿ”  ๐Ÿ’€  ๐Ÿ“ธ  ๐Ÿ˜ญ  ๐Ÿ•ณ๏ธ  ๐Ÿค   ๐ŸŽฌ  ๐Ÿฅ”

Technorati is now tracking 1,000,000 weblogs

Technorati is now tracking 1,000,000 weblogs.

Reader comments

Matt HaugheyOct 01, 2003 at 2:29AM

Why hasn't anyone ever really looked at how Technorati determines what is a blog? I don't believe the Technorati numbers myself, I think it's greatly inflated.

Why? Sometimes in technorati results I see every category of a Radio weblog counted as a separate blog. I just dug around about 30 URLs until I found an example. Check out Merlin's cosmos. It says that 64 blogs are pointing 71 links to him, so there should be very few repeat listings, right?

Look down for the person running a Radio blog that is pointing at Merlin's site, called monkinetic weblog. The guy must have a sidebar link to kungfugrippe and by the looks of it he has 13 categories on his Radio blog, which show up as 13 blogs. In this entire list, there should only be 7 doubly listed blogs (Jish is one), but here we have 13 links from a single blog and the numbers don't add up.

This isn't the fault of Radio's design, it's how Sifry coded his algorithm to determine the difference between a blog and other pages of the same blog. For some reason it's not quite right for Radio blogs hosted on their own domains (personally, I've never seen the problem on the userland hosted radio sites).

I bet the counting of livejournal sites may also be wonky, since the URLs aren't that predictable and other pages might be showing up as other blogs.

Swami PremOct 01, 2003 at 2:32AM

Who was the lucky winner to own the one millionth blog?

Matt HaugheyOct 01, 2003 at 2:35AM

I think it's greatly inflated.

Actually, I'm probably overdoing it a bit here by saying "greatly" but it could be off by a lot, if there are enough sites with weird URL storage schemes being miscounted (and I don't see why a MT blog couldn't trip the algorithm). I would say it's got to be at least 10% from my personal result tracking, and could be higher depending on how widespread the problem is.

David SifryOct 01, 2003 at 3:45AM

You're right, Radio is somewhat messed up in that is attempts to count each "category" as a separate blog. We go through and cull the database regularly to pull that crap out. If you continue to see any results that look funky, please send an email to [email protected] and let us know.

I'm pretty sure the LJ stuff is accurate though, you'd be amazed at how many people are posting over there.

I'm working really hard to make sure that the Technorati database is accurate and clean, but wacky things happen all the time, and to expect 100% accuracy is of course, impossible. But I really believe that the numbers are pretty accurate.

Matt HaugheyOct 01, 2003 at 4:36AM

Radio is somewhat messed up in that is attempts to count each "category" as a separate blog

How does Radio do the separate blog stuff, does Radio ping weblogs.com for each category? When you make a post?

I'm pretty sure the LJ stuff is accurate though

When I was going through a bunch of cosmos looking for good examples of the previous problem, I found some results with a single LJ post listed 5-10 times, but there were so many results I couldn't make it out if they were treated as one blog with many links or many blogs (they all seemed to point at the same URL).

wacky things happen all the time

I noticed that Typepad blogs are counted twice, once for the root URL of foo.typepad.com, then again for the default blog directory, foo.typepad.com/bar (it's the same files in both places).

Swami PremOct 01, 2003 at 6:44AM

Oh, what about the Typepad blog having a domain name? Does that mean the blog will be counted three times?

NickOct 01, 2003 at 9:31AM

Dave's right that it can be very difficult to filter out "false" Radio weblogs, we've had that problem ourselves. I'm not doubting his assessment of the number of LJ sites either, but that's an area we scratched our heads trying to figure out for some time. The problem with LJ or any of the blog hosting groups is that "failures" of those central servers will often cause a few thousand sites to simultaneously point to some default list of links (I distinctly remember a day when a page from the PHP manual jumped to the top of our 4 hour trends list). I think we've got them under control now however.

Our site is currently tracking around 150,000 weblogs -- no where near the million Technorati's got. I wonder if one of the differences in number is that we delete URLs that don't respond to our robots after a certain number of tries. Typically if a site comes back it finds a way to get added back intot he system. This keeps our database leaner, and keeps our robots reading "actual" pages instead of waiting for errors.

Either way, "about one million" is a nice round number to point to for those of us trying to show how quickly blogging is growing around the world.

jkottkeOct 01, 2003 at 10:03AM

The million number seems fairly accurate. Maciej's Blog Census puts the number at 1.35 million with an estimate of ~890,000 that are active.

megnutOct 01, 2003 at 10:15AM

I think active is key here, I've noticed a lot of totally dead (I mean haven't been updated since 2000/2001) appearing in Technorati lately.

megnutOct 01, 2003 at 10:15AM

Wait, a lot is too strong. A fair amount would be a more accurate statement.

jim winsteadOct 01, 2003 at 11:03AM

just as another few data points: according to blo.gs, 136,955 blogs have updated in the last week, 272,764 have updated in the last month, 391,042 have updated in the last two months, and 34,753 new blogs have been added in the last week (unfortunately, i haven't been keeping track of that for long).

this includes all blogs that ping weblogs.com, and that show up in the blogger.com changes feed, and a few other sources (and that ping blo.gs directly, of course).

this does almost totally exclude livejournal.com users.

GeneOct 02, 2003 at 11:50AM

Speaking of active, this Marlow post on churn rate and this Blogcensus follow up have some good information about blog activity. The Blogcensus post shows 5% of their sample had been abandoned (> 52 weeks since the last post). I wonder how Technorati's numbers would compare.

Blum Valerie Dec 09, 2003 at 7:46PM

Unusual ideas can make enemies.

Peterson Lee Dec 10, 2003 at 12:27PM

'May you live all the days of your life.' - Swift

Good Heidi Dec 10, 2003 at 12:27PM

The important thing isn't doing, but knowing how you do it.

Fields Lesley Dec 20, 2003 at 4:10PM

Just because there's a pattern doesn't mean there's a purpose.

This thread is closed to new comments. Thanks to everyone who responded.