Automatic discovery of RSS feeds

posted Jun 3 @ 12:08 AM by Jason Kottke · gift link

Automatic discovery of RSS feeds

Now that all of kottke.org is in MT, I can start worrying what to do about things like RSS feeds. I’ve been following the developments concerning the automatic discovery of RSS feeds, written about extensively on dive into mark (more here and here). Basically, you insert the following code:

into your Web page and then all an RSS aggregator needs to do is check that Web page for your RSS feed instead of you having to provide the aggregator with a specific URL. Pretty slick really.

However, I have a couple of concerns about how this works:

1. My understanding is that when a Web browser loads a page, it downloads all the documents referenced in the tags. That’s how stylesheets work. Does this mean that every time someone loads up my Web site, they’re going to get this RSS file as well, whether they want it or not? For popular sites, depending on the size of the RSS file, that could add up to several megabytes in additional bandwidth…and possible additional bandwidth charges. Does the “rel” attribute being set to “alternate” take care of this?

2. Do the aggregators need to check my Web page each time they download the RSS file or are they going to cache the location and then only check the Web page once a week or so for a possible location update? Again, serving two files when only one is called for could get costly, especially if an aggregator is calling for it multiple times a day.

Can anyone shed some light on this?

Reader comments

jjgJun 02, 2002 at 11:10PM

1. The browser does not load every document referenced in the tags. If the "rel" attribute is not set to "stylesheet", most browsers (IE, Netscape, Mozilla, Opera) just ignore the tag. iCab is the only browser I know of that does anything with any link tag with a "rel" attribute other than "stylesheet". Link tags have lots of intriguing possibilities, virtually all of which have gone untapped.

2. I guess it depends on what the people writing the aggregator software choose to do. I believe the original idea was that the link tag would tell the aggregator "don't subscribe to this file, go get that other one instead" if someone tried to subscribe to the wrong file. In that scenario, presumably the aggregator would never come back to your HTML page. Probably the best scenario would be that, if the RSS file goes 404, the aggregator would then go back to the HTML page to check for a new URL in the link tag.

AnilJun 03, 2002 at 12:06AM

Mozilla can handle link tags, too. It was pulled from 1.0 at the last minute due to performance issues, but I'm sure by 1.1 we'll see the LINK toolbar return to Mozilla.

When activated, it appears on any page with a link tag, and provides navigation to referenced pages.

Which, incidentally, gives me a lot of hope. Because the link tag will be the savior of the web. I believe it.

Steven GarrityJun 03, 2002 at 6:54AM

I've noticed that Mozilla makes another use of the LINK tag. Rather than looking for sitename.com/favicon.ico for the favourites icon, as Internet Explorer does, they use the LINK tag to link to the icons.

Looking for root/favicon.ico, as IE does, suffers the same problems that Jason points out for autodetecting RSS - on some sites I manage, favicon.ico is one of the top requested files, whether it exists or not.

Mark PilgrimJun 03, 2002 at 9:34AM

To clarify: iCab does not auto-download the LINKed file, it merely displays the TITLE in a drop-down menu. As others have noted, Mozilla (used to, and will again someday) display all such LINKs in the Link Toolbar, if it was enabled in the View menu. Lynx displays them along the top of the document.

And no, news aggregators like Radio and Amphetadesk will only check the LINK tag if the user enters the main site URL to try to subscribe; it then caches the RSS feed URL and never goes back to the HTML.

Other scripts (such as my auto-linkbacks on my weblog) can take advantage of the LINK tag to expose the RSS feed for a site, given the site URL.

The LINK tag is part of the HTML 4 spec, documented here:

Mark PilgrimJun 03, 2002 at 9:35AM

http://www.w3.org/TR/REC-html40/types.html#type-links

Other interesting uses for LINK tags: establishing the place of a page within a hierarchy, using rel="home", rel="up", rel="prevf", rel="next" LINK tags. I do this both on my weblog and in my book. View source:
http://diveintomark.org/archives/2002/06/02.html
http://diveintopython.org/odbchelper_list.html

JoeJun 03, 2002 at 10:27AM

"2. Do the aggregators need to check my Web page each time they download the RSS file..."

Aggie doesn't do this. It uses the web page to resolve the URL of the RSS file. It stores only the URL of the RSS file and discards the info about the web page.

jkottkeJun 03, 2002 at 10:55AM

Thanks guys. Sounds like my concerns aren't a problem at all.

And sorry about the lack of preview on the comments here. Still working out the kinks.

Ben HammersleyJun 03, 2002 at 3:59PM

Mozilla DOES display all Link Rels in the Link toolbar. Go to View/Show-Hide/Show Navigation Bar and turn it on. Go to http://rss.benhammersley.com and click on an entry's permalink to see it in very-much-full-effect

Mark PilgrimJun 03, 2002 at 4:34PM

Ben: unfortunately, the Link toolbar has been removed from Mozilla 1.0 due to performance concerns. See: http://bugzilla.mozilla.org/show_bug.cgi?id=102992

justinJun 03, 2002 at 9:50PM

jason: just in case you aren't aware, you're already publishing an RSS feed (i won't point to it though since you haven't yet made up your mind). i'm sure you can get rid of it if you so choose, but MT puts it there by default, and it is publically accessible.

jkottkeJun 03, 2002 at 10:24PM

Yeah, I know the feed is there. Just haven't told anyone about it yet.

Ben HammersleyJun 04, 2002 at 11:15AM

Mark: Not in RC3 it's not. In fact I have it on all the time, and I'm using build 2002053104 right now. I think that's an old bug...

ShmuelJun 04, 2002 at 11:44AM

What about pages (a site) that have more than one RSS file associated with them? For instance, say Jason chose to keep a list of books he was reading on the front page of his site along with the regular blog content. How would the tools respond to two links to RSS files from the same page?

Example:

Mark PilgrimJun 04, 2002 at 12:09PM

Ben: I'm running 20020529 and the option is just gone from the View menu. Maybe it's been fixed; that'd be great.

Shmuel: multiple RSS files are fine, just have one LINK tag for each of them, with different titles. I do this for the category-specific feeds on my site. View-source on diveintomark.org for an example.

mattjacobJun 05, 2002 at 4:00PM

In build 2002053012 (v1.0 release), the option is also no longer there. I hope they bring it back.

This thread is closed to new comments. Thanks to everyone who responded.

Stay Connected

Automatic discovery of RSS feeds

Reader comments