Pair of weblog articles on Salon  MAY 10 2002

A pair of articles on Salon today about weblogs: Much ado about blogging by Scott Rosenberg and Use the blog, Luke by Steven Johnson (oy, those titles!). As Scott notes in his piece, many recent articles on weblogs have largely missed the point (or at least the point that's most interesting to me), focusing on big names and politics instead of the bigger picture of weblogs' impact (both good and bad) on people, online culture, and behavior, so it's nice to see something good for a change.

In his article, Johnson uses kottke.org as an example of a typical weblog and how it's not as useful as it could be. I couldn't agree more. I love all the content that weblogs produce, but finding it when you need it is a different matter (I talked about this at the NetMedia conference last year). I'm frustrated with kottke.org because it's not as effective as an information resource as it could be. It's hard to find something here that's not today's information. I don't have categories, keywords, or other metadata for posts and the search mechanism is not optimized for weblog use (it searches by page, weblogs need search results returned on a post-basis). My site isn't smart...it doesn't make connections between current posts and older posts (either on my site or elsewhere) like it should (and like Google and Amazon do with their content). I can't even display single posts on their own page. It's pretty pathetic.

None of the tools out there offer exactly what I want. Movable Type does categories, but specifying multiple categories per post is a bit of a pain...and there's no search or room for other types of metadata. Vanilla (which is quite impressive) is an interesting cross between a weblog and a Wiki, but it has the same problems that all Wiki software does: it's not software for writers, it's software for people who like to Wiki (e.g. "dynasnips can be included in your snips by simply inserting {!dynasnip-name} in the content while editing"...people who just want to write don't want to deal with that crap).

Johnson's solution to navigating the info glut (for all weblogs, not just individual ones) is to use a few standardized tags on weblogs (BlogML? XML DTD for weblogs? DiaryXML?) so that third-party tools can come along, grab the content, shove it into some categories based on the standardized tags, and do the searching/sorting/comparing for us. He's talking about a semantic Web.

The problem is that implementing a weblog universe-wide system of tags and categories is like herding lots and lots of cats. No one will agree on which tags and categories to use. If a de facto standard set of tags does emerge, getting people to implement it will be hard. Tools could be programmed to include BlogML so the people don't have a choice, but chances are that each tool's implementation of it will be slightly different and therefore close to useless. It took the Web 6 or 7 years to come up with a format that pretty much agrees based upon one piece of metadata: it's possible, more or less, to organize the content from all weblogs by date. I dunno...I've thought about this a lot in the past and I just don't see a top-down system of categories working in this situation. A Semantic Web would be very useful for everyone using the Web, but unless some major paradigm shift occurs in how people approach the Web, it's not going to happen anytime soon.

(And thus ends a quickly written, incoherent ramble. Any comments?)

There are 37 reader comments

Ryan50 10 2002 1:50PM

Wait... it's hard to do multiple categories in MT? Not from where I sit...

Unrelated: did you ever buy that meal at Burger King that you collected money for via Paypal? I sent you 25 cents and wasn't kept abreast of new updates. I may ask for my money back.

Ed03 10 2002 2:03PM

The one problem of a semantic web is, as you suggested, getting everyone to agree on a common set of variables. But there is a much greater obstacle currently in place. At the moment, it remains exceedingly difficult to not only catalog every active weblog but to classify its particular slant, content and nature and target it intelligently towards the need of a user looking for a specific style or a specific subject matter. While a prospective weblog reader looking for a new blog can always type in a set of search terms followed by the word "blog" into Google, this is largely a disastrous gauge in attempting to figure out just what the hell any given weblog is about or in determining the topics or interests that a particular blog is bound to share. One must rely these days upon whether a nutty phrase or a particular link appears to get a sense of what displays on the search results page. And even then, the serious blog seeker is more likely to open up ten separate windows for each link, having to expend the time to wade through an infinity of words and design to get a sense of whether s/he likes any given blog.

But going back to a semantic web of blogs, even if something along the lines of BlogML is introduced, there will inevitably be the need for some kind of central hub with which to classify and interpret the information that is sent out. Who exactly would control that? Would such a collection of information be used to create a database of personal information, such as Amazon wishlists and personal preferences, in which bloggers are inundated by spam? And how could the filtering of this information be applied in a manner that is not only intelligible, but tailored to the interests of a particular user? Right now, the only real way in which a particular blog has any hope of rising to the top of Blogdex or the search engines is through referrals. But the blogs at the top may not always be the ones that possess the most pertinent content or the information desired. The real question here is whether a system can be devised that does not catalog and discriminate on the basis of linkage, but on a basis of the content itself.

Of course, gauging a weblog by content involves a decided opinion, much as the accumulation of various opinions of people who links to Kottke results in Kottke.org rising up the linkage ladder. To a certain extent, this is all very good, as a link itself involves, in most cases, a favorable nod. But when it's very possible that 40 million bloggers could be wrong, popularity approaches a Neilsen/Arbitron-like homogenization of perceived quality.

Any blogging standard that is to be implemented would have to be implemented or regulated in some way throughout the course of all blogs. And it would have to be something that could be easily applied to even a Geocities site. But given the independent-minded nature of the Web, who would have that? Who could agree on a BC (Blog Consortium) and how could such an organization find a representative democratic voice on the personal web front that was accepted by those who stodgily adhere to the nonexistent notion of "the A list"?

Bill Seitz00 10 2002 3:00PM

I also agree trying to control tags is silly, and probably unnecessary.

mathowie02 10 2002 3:02PM

Jason, have you seen Josh's endquote lately?

Every outside link is blue and underlined, internal links have dashed underlines and are lower contrast than outbound links. Every inbound link shows a page of all mentions of that phrase in previous posts, in a way, it is quite similar to the vanilla wiki/blog concept. I would say using an automated system like Josh's for previous subject linkage, plus multiple categories, and a working search engine that did brute force grabs of data from specific posts, and you'd be pretty close to having a useful weblog tool.

No get to coding!

Graham05 10 2002 3:05PM

FWIW, there is a Movable-Type search engine (by Jay Allen, but no longer supported/developed by him, and [I think] to appear in future versions of MT). Also, one could, at least if starting fresh, use the title and excerpt fields for metadata - the layouts are customizable enough that they could either be hidden from the reader and not from bots, or the other way round. (I wouldn't relish inserting all that stuff in years worth of content, though.)

Lukas Bergstrom11 10 2002 3:11PM

Everyone intuitively believes that a good website should have extensive internal links -- "here are three other posts on the same topic". The problem is that it's a lot of work. What I'd like is to have a tool that analyzes a post semantically (just prior to storing it in the db) and suggests other posts that may be related. I click the ones I agree with, and it automatically creates bidirectional links between them. Maybe Tinderbox would allow something like this?

mbaze12 10 2002 3:12PM

But wait...why do we even want a structure like this for the web? Categorizing weblogs? For what? The beauty of the web is that it is unstructured. It is serendipitous. It's free-form. It's what it name implies - a web. In the case of blogs, I thoroughly enjoy randomly jumping from link to link, unsure whether I will be pleasantly surprised or let down by what I find. How mundane - how confining - it would be to scan through a monstrous blog index where everrything was neatly ordered by category. Leave the web be. Please.

anil14 10 2002 3:14PM

The only way any weblog schema (and all the current DTDs are sorely lacking) would take off is if it (1) were built into all the major publishing tools and (2) provided a benefit to the site's visitors when an author used them.

Realistically, the only way we can assign categories to weblog microcontent after the fact is by assigning semantic value to individual URLs and then using that information to classify the content of the weblogs that link to that URL.

moz36 10 2002 3:36PM

i don't think the journal format and providing informational content really meshes well. even if you can categorize your journal entries very well -- discussions about cars breaking down are indexed under "cars", "breaking", and can be easily referenced by cross-indexing the two categories and looking through the results -- what you end up with remains as short journal entries that rarely do the subjects justice.

my opinion is that if you are serious about providing topical content regarding one thing (such as weblog management systems) or another, don't try to slap some tool over your journal writings and categorize it thusly. instead, pool your writings and create your own subsections that have nothing to do with your journal. content and content retrieval would both be better focused.

jfournier40 10 2002 3:40PM

I'd be interested in designing a tool that works like that. I am an avid PHP/mySQL system designer. I'm currently working on rolling my own blogging system, so if someone could give a list of suggestions that I could actually implement for experimentation, I'd be willing to give it a go...

kshay32 10 2002 4:32PM

I don't see that a categorization system necessarily has to be top-down.

The key would be to let writers create their own arbitrary categories. In other words, a structure for tagging a post with a category would be standardized, but the category itself wouldn't be.

Of course, if I want anyone to find my writings through this system, I can't go around making up categories that nobody else would ever think to use. But if I and my friends don't like the set of categories you and your friends are using, we'll simply "standardize" on our own.

Look at LiveJournal "Interests" for an example of how such a system might pan out. Yes, at the top of the popularity list you have Movies, Music, etc. But there's also an Interest for pretty much every band you could name, and I just found 11 people interested in "fancy pants" and 25 interested in "lebanese food."

At least three important requirements for something like this to have a chance of working: A) there's a central hub that collects, but does not dictate, all the categories that are in use (it might be reasonable to impose some basic standards of capitalization and punctuation); B) it's easy to tag a given post with multiple categories; and C) some thought should be given to making categories (optionally) hierarchical, so you don't end up with a huge flat mess like the aforementioned LiveJournal listings.

If the right conditions were in place, we might see idiosyncratic taxonomies emerge within the system that would be every bit as free-form and bottom-up as the web itself.

Greg01 10 2002 5:01PM

This all reminds me too much of the promise that was to be Xanadu.

"Xanadu, if it existed, would be better than the World Wide Web.

In Xanadu, you could link to any public document. Also, you could easily discover the origin of all the links into any document. For instance, you can start at a verse of the Bible and find all the other documents that had links to this verse.

The ability to place links in any document and to follow links backward is known by the Xanadu programmers as "extrinsic, bi-directional linking"."

"Wired Magazine - 3.06

Azrael Brown43 10 2002 6:43PM

I'm on the "why bother?" bandwagon -- I'm not sure it's necessary. Libraries, sure, but when you start breaking down the useful "chunks" into daily paragraphs posted by random, unverlifiably credible writers, it's not going to be much more useful than what we have now. Even web search engines categorize things on entire pages, not paragraphs. Libraries assign Dewey decimals for entire books, not by chapters. Context would also be lost, too

(here's where I admit I didn't completely read everything linked or said by kottke :)

Anyways, Some of this might be solved via weblogs which aren't spit out into HTML flat-files: if posts were stored in a database, dynamically created on page loads, searches & sorting would be more useful than trying to add XML madness & intermal-meta-references to HTML documents (SQL could even be used to collect from multiple data sources, using current standards). But, then you start getting into a Slashdot-style format...which might not be a bad starting point for creating what you're talking about. Plastic is a pretty good example of the sort of thing you're looking to create, methinks.

Steven Garrity05 10 2002 7:05PM

Jason, since I originally wrote the article on BlogML you linked to, the readers have taken it in quite a different direction than I had intended. I wasn't thinking of a back-end storage format to make weblog articles more portable. Rather, I was thinking of a syntax embedded in HTML that help google and other indexes know the difference between a post and a page.

Have you considered rolling your own? I'm in somewhat of a unique scenario in which I have access to code libraries and the help of developers at my workplace. This put me in a great position to develop a simple database driven platform. It didn't take very long, and having all of your content in an SQL database is able as flexible as it could ever be. If a BlogML format is ever determined, I could probably output all 500 of my posts to it in a few minutes.

I realize that not everyone has a team of developers at their disposal - but then again, not everyone is dating a pyra founder ;-)

As for categorization, I've steered clear only because I couldn't imagine a scenario in which a user would come to my site and want to browse the archives by a particular topic - there just isn't enough volume. However, after running the site for two years, I'm starting to consider the possibility.

Steven Garrity12 10 2002 7:12PM

...and...

Isn't it a bummer that when multiple posts are on one page (which they are on almost all site at least while they are on the home page), links from those posts don't have specific referers? For example, if you are permanently linked from kottke's sidebar, and then occasionally linked in posts on the homepage, your webstats can't differentiate. If only the browsers could know that a particular part of a page had it's own permalink - refers could be much more specific - perhaps I'm trying to invent frames...

Elan22 10 2002 8:22PM

I'm sure a useful algorithm (and usable interface for searching) combing metadata that we contribute to each of our posts could be created as long as we can standardize on the metadata. If we could do that, then these search engines can under where a post begins and ends and eventually with a mass of content and search theasaurus we could create more useful search and browse service. As for implementing it, only blogger, movabletype, mr. winer and a few others need to put the plugs into their software and it would work automatically (only the contribution of the metadata while posting would be required by bloggers).

CQNY44 10 2002 8:44PM

first there is blog, then there's people asking for blogs to be useful when they needed it. c'on, it's a blog, not a database. when is blogging, whinning and complaining be so "programming" related. it kills the fun, and makes it mandate and, worse of all, exclusive.

just let us know when some master minds of the web had decided what mortals shall be using for their blogs.

vanderwal32 10 2002 9:32PM

I have been playing around with the multiple categories on my site for some time. My next step is to put them in a heirarchy. There are some folks working on a cross-site categorization experiment, but that is a few months away.

I started the categories for myself to make it easier to find related links and comments. I have recently opened up a category list (atleast the ones I have used, there are over 130 of them) to public use. The interface for entry as a text box for entry, select location, select entry type (essay, journal, weblog), and click categories (laid out in two columns) then submit.

Rolling one's own is fun. Using an open tool that is near what you like and open for modifications may be a good bet for people interested. Perl and/or PHP tied to MySQL or XLM are good easy access points.

jonah23 10 200210:23PM

I think that Kevin's site is one of the most 'usable' with categories and and single pageable posts.

jkottke35 10 200210:35PM

How mundane - how confining - it would be to scan through a monstrous blog index where everrything was neatly ordered by category. Leave the web be. Please.

If such a thing existed, you could still read weblogs how ever you wanted. You can also browse the Web without using any search engines or hierarchical directories. You can look for books at the library without consulting the card catalog. You can find something good on TV without consulting the TV Guide. Some of us would prefer to have those tools available to help us find what we're interested in.

Xanadu, if it existed, would be better than the World Wide Web.

That's the comedy and the tragedy of the Web. It's so easy to use that everyone did. It's not as powerful or smart as it could be, but if it had been too complicated from the get-go, no one would have used it.

just let us know when some master minds of the web had decided what mortals shall be using for their blogs.

Come on. No one is telling anyone what to use their weblog for. I'm talking about organizing the information for use after publication. Whether you want to participate in order to help people find information on your site is your decision. You could even keep your weblog off the Web entirely if you don't want anyone to find anything.

luke24 10 200211:24PM

What about ye olde yahoo directories and that kind of thing? Isn't that just a big catalogue of web sites, with meta info and a search engine? Can this concept be expanded to include info in general?

Anyhoo...

We should remember content 'shelf life' too. A piece of content has time based value too, ie it might be useful because it is really new, or really old, or was posted at the same time as x, y or z. The amount of time based value it has will vary from person to person, or category to category to some degree. Content's time based value needs to be weighed against its topical or categorical value. I think.

Also, being able to string together a few relevant posts within a broad category of posts would be useful, so you have big, broad categories, and highly specific topics which string together say
Also keep an eye on Drupal (also see Drop.org) for taxonomy funk (in CVS - they are moving towards v4 atm). Taxo term RSS feeds ahoy!

Kevin Fox34 10 200211:34PM

Thanks for the thumbs-up, Jonah. It's interesting timing, actually. Next week I'm going to start a very open, public project to redesign my site, using by-the-book interaction design principles, because though categories and single-post pages are handy, I'm not convinced they provide enough for the basic user types.

In my experience (and anyone else who wants to chime in, please do so, the main types of people (on which personas could easily be based) are the avid reader, the occasional reader, the person who followed a link from another site, and the person coming from a search engine. These four people have entirely different wants and needs, not to mention the want of the site owner to make the relevant information accessable, while ideally also showing serendipitous information, and giving the reader a clear idea of what else is available on the site, without innundating them or obsturcting their primary task.

Anyhow, just something that seemed relevant in this thread. I'll be talking about it a lot more on Fury next week, and after we redesign Fury, then we can start doing a real heuristic analysis on Jakob's site, which really, really needs it.

mattpfeff37 11 2002 1:37AM

Would the semantic tags need to be "weblog universe-wide"?

Johnson talks about "guardian" weblogs, and tracking information on relatively small circles of sites. And there would probably be very real disadvantages to tracking information from too wide a range of sites -- the results you'd thereby get might be useful for a marketing agency, but an individual probably wouldn't trust or like them very much (they would ultimately just average out, by definition).

So an interesting test case might just be a small number of trusted sites, written by people whose views and thoughts are relatively compelling. If it worked, and if Johnson is right, it would have an immediate and lasting value that would not be diminished by any other sites' failure to use the same semantic tags -- though, if it really did work, other sites would then (finally) have an incentive to adopt them.

Thurman Faulk30 11 2002 2:30AM

I've been working on a great project for ThreeRing -- see link on my name. It's a database-driven weblogging tool that has just been released in Beta, and it already has numerous features that other tools don't have, such as auto link-back and integrated page maintenance (i.e., you can create additional pages easily, instead of through some odd hacking of templates).

We're currently in the process of beefing up the search features, one of the things that Mr. Kottke said he feels is lacking in existing tools. Many of the concepts we're implementing are borrowed for corporate CMS systems -- categories, keywords, related content, item-based (instead of page-based) searches, etc.

It runs on Windows 2000, but that's a feature, not a bug. The *nix market is pretty saturated with reasonably good tools, so we decided to fill the gaps instead of reinventing the wheel (Sorry, I mix metaphors when I'm tired). If anyone's interested in a PHP or other port, lemme know.

ricardo vacapinta33 11 2002 4:33AM

I'll take my usual contrarian stance and insist that a major point here is being missed. For me, categorization of content *within* a weblog is less important than categorization of blogs themselves. Someone above mentioned an analogy to libraries, where books are categorized but not chapters within the books themselves. I think the initial focus should be to categorize the books.

Posts within a weblog have no value to me without the context of the weblog themselves. If I am looking for an informed opinion on, say, the current opera at the SF opera house, I dont want to be presented, in my search with a hundred blog entries telling me about how someone saw their ex-boyfriend at a recent performance. I am looking for someone who writes frequently and passionately about opera, a developed connoisseur, one whose weblog metadata may include a shortlist of themes and strengths (externally given or self-described) which enhance the authority and quality of their post.

Lets get away from the whole binary search string- keyword mentality. What I want to type in to my engine is something like:

[opera buff, location san francisco, topic aida]

and have it do all the and/or logic, parsing, metadata searches and give me the post I am asking for - relevant and authoritative.

Steven Garrity16 11 2002 6:16AM

Some well implemented examples: xPlane's xBlog has a great 'archived in' feature that does a good job of preserving the typical chronological home page format while also adding good simple categories (with nice clean URLs too). Seems useful and not over-categorization.

Also, a friend of mine built a PHP-based gallery for his personal photos. Notice how there is a heirarchy of categories - but any photo can exist in any combination of cats. On the back end, the categories are just checked off like attributes when you add a new photo.

np27 11 2002 7:27AM

Keyword search is pretty good on your site.

Daypop finds your relevant content based on keywords(along with others) if it's fresh.

Would it be too difficult to program/ask Daypop to use search boxes for keyword combinations where you can also specify dates, or certain blogs, or categories?

http://www.neuroprosthesis.org/blogger.html

Thurman Faulk46 11 2002 9:46AM

Ricardo: I'll take my usual contrarian stance and insist that a major point here is being missed. For me, categorization of content *within* a weblog is less important than categorization of blogs themselves.

Another thing we've been working on at ThreeRing (but it probably won't make it into the 1.0 release) is a sort of category hierarchy that can be used to categorize weblogs, as well as individual posts within weblogs.

There are so many weblogs (and weblog readers) that such a thing is quickly becoming necessary.

David Lee Rogde30 11 200210:30AM

IÕd like to see more blogs make use of images, and the world around them, after all itÕs a monitor and itÕs more image friendly then text friendly. Too many blogs seem to be attempting to be syndicated columnists, and too many are traditional diaries and neither makes very good use of the medium. Open up, let us see your world, I want to see it!

nf003 12 2002 7:03PM

There is actually a very good search addon for Movable Type, you can get it here. This want help much if your using any other system, but its very effective.

James41 13 2002 9:41PM

I tossed up a word association link on my blog to ferret out like posts. It is currently too biased towards the shortest of posts, but is interesting nonetheless. What I love most about the code is that the links returned often have little in common with the source post. This is because there is little consistency in what I write about.

James08 14 2002 3:08PM

As it turns out word association and keyword searching are pretty related, so now I've got a search box up too. Yummy.

scottandrew20 14 2002 4:20PM

I dunno, it seems we're up against a wall, given the wide topical nature of weblogs. Categories help, but only drill down so far.

I recently hacked my copy of Moveable Type to include a "metadata" field, so I could associate important (IMO) keywords with a particular post. I currently use it to power my Google It! links, but I've been thinking about different ways to use it to power a "more like this" search. But then, I have to pay attention to the quality of the keywords. For the average weblogger, this seems like a lot of extra work.

It seems to me the problem is that only so much metadata can be inferred by automated processes like search engines. Not to mention that context pretty much flies out the window. I may do a search for "Arafat" and find an interesting weblog post, but that post alone may not be enough to give me a sense whether the poster is pro-Palestinian, pro-Israel or neither. And while I agree with Ricardo's points in spirit, I can't see how any process can truly bestow "authority."

The idea of a semantic web is very appealing but what I've read of it (especially the TBL stuff) seems awfully abstract, leans heavily on RDF, and seems a long way from having useful implementations.

candy24 18 2002 4:24AM

None of the tools out there offer exactly what I want
you guys really should try www.nucleuscms.org, it does!

irott51 19 200210:51PM

Suggestion: greymatter?

This thread is closed to new comments. Thanks to everyone who responded.

this is kottke.org

   Front page
   About + contact
   Site archives

You can follow kottke.org on Twitter, Facebook, Tumblr, Feedly, or RSS.

Ad from The Deck

We Work Remotely

 

Enginehosting

Hosting provided EngineHosting