Automatic discovery of RSS feeds  JUN 03 2002

Now that all of kottke.org is in MT, I can start worrying what to do about things like RSS feeds. I've been following the developments concerning the automatic discovery of RSS feeds, written about extensively on dive into mark (more here and here). Basically, you insert the following code:

<link rel="alternate" type="application/rss+xml" title="RSS" href="url/to/rss/file">

into your Web page and then all an RSS aggregator needs to do is check that Web page for your RSS feed instead of you having to provide the aggregator with a specific URL. Pretty slick really.

However, I have a couple of concerns about how this works:

1. My understanding is that when a Web browser loads a page, it downloads all the documents referenced in the <link> tags. That's how stylesheets work. Does this mean that every time someone loads up my Web site, they're going to get this RSS file as well, whether they want it or not? For popular sites, depending on the size of the RSS file, that could add up to several megabytes in additional bandwidth...and possible additional bandwidth charges. Does the "rel" attribute being set to "alternate" take care of this?

2. Do the aggregators need to check my Web page each time they download the RSS file or are they going to cache the location and then only check the Web page once a week or so for a possible location update? Again, serving two files when only one is called for could get costly, especially if an aggregator is calling for it multiple times a day.

Can anyone shed some light on this?

There are 15 reader comments

jjg10 02 200211:10PM

1. The browser does not load every document referenced in the tags. If the "rel" attribute is not set to "stylesheet", most browsers (IE, Netscape, Mozilla, Opera) just ignore the tag. iCab is the only browser I know of that does anything with any link tag with a "rel" attribute other than "stylesheet". Link tags have lots of intriguing possibilities, virtually all of which have gone untapped.




2. I guess it depends on what the people writing the aggregator software choose to do. I believe the original idea was that the link tag would tell the aggregator "don't subscribe to this file, go get that other one instead" if someone tried to subscribe to the wrong file. In that scenario, presumably the aggregator would never come back to your HTML page. Probably the best scenario would be that, if the RSS file goes 404, the aggregator would then go back to the HTML page to check for a new URL in the link tag.

Anil06 03 200212:06AM

Mozilla can handle link tags, too. It was pulled from 1.0 at the last minute due to performance issues, but I'm sure by 1.1 we'll see the LINK toolbar return to Mozilla.

When activated, it appears on any page with a link tag, and provides navigation to referenced pages.

Which, incidentally, gives me a lot of hope. Because the link tag will be the savior of the web. I believe it.

Steven Garrity54 03 2002 6:54AM

I've noticed that Mozilla makes another use of the LINK tag. Rather than looking for sitename.com/favicon.ico for the favourites icon, as Internet Explorer does, they use the LINK tag to link to the icons.

<LINK REL="icon" HREF="images/mozilla-16.png" TYPE="image/png">

Looking for root/favicon.ico, as IE does, suffers the same problems that Jason points out for autodetecting RSS - on some sites I manage, favicon.ico is one of the top requested files, whether it exists or not.

Mark Pilgrim34 03 2002 9:34AM

To clarify: iCab does not auto-download the LINKed file, it merely displays the TITLE in a drop-down menu. As others have noted, Mozilla (used to, and will again someday) display all such LINKs in the Link Toolbar, if it was enabled in the View menu. Lynx displays them along the top of the document.

And no, news aggregators like Radio and Amphetadesk will only check the LINK tag if the user enters the main site URL to try to subscribe; it then caches the RSS feed URL and never goes back to the HTML.

Other scripts (such as my auto-linkbacks on my weblog) can take advantage of the LINK tag to expose the RSS feed for a site, given the site URL.

The LINK tag is part of the HTML 4 spec, documented here:

Other interesting uses for LINK tags: establishing the place of a page within a hierarchy, using rel="home", rel="up", rel="prevf", rel="next" LINK tags. I do this both on my weblog and in my book. View source:


Mark Pilgrim35 03 2002 9:35AM

To clarify: iCab does not auto-download the LINKed file, it merely displays the TITLE in a drop-down menu. As others have noted, Mozilla (used to, and will again someday) display all such LINKs in the Link Toolbar, if it was enabled in the View menu. Lynx displays them along the top of the document.

And no, news aggregators like Radio and Amphetadesk will only check the LINK tag if the user enters the main site URL to try to subscribe; it then caches the RSS feed URL and never goes back to the HTML.

Other scripts (such as my auto-linkbacks on my weblog) can take advantage of the LINK tag to expose the RSS feed for a site, given the site URL.

The LINK tag is part of the HTML 4 spec, documented here:

http://www.w3.org/TR/REC-html40/types.html#type-links

Other interesting uses for LINK tags: establishing the place of a page within a hierarchy, using rel="home", rel="up", rel="prevf", rel="next" LINK tags. I do this both on my weblog and in my book. View source:
http://diveintomark.org/archives/2002/06/02.html
http://diveintopython.org/odbchelper_list.html

Joe27 03 200210:27AM

"2. Do the aggregators need to check my Web page each time they download the RSS file..."

Aggie doesn't do this. It uses the web page to resolve the URL of the RSS file. It stores only the URL of the RSS file and discards the info about the web page.

jkottke55 03 200210:55AM

Thanks guys. Sounds like my concerns aren't a problem at all.

And sorry about the lack of preview on the comments here. Still working out the kinks.

Ben Hammersley59 03 2002 3:59PM

Mozilla DOES display all Link Rels in the Link toolbar. Go to View/Show-Hide/Show Navigation Bar and turn it on. Go to http://rss.benhammersley.com and click on an entry's permalink to see it in very-much-full-effect

Mark Pilgrim34 03 2002 4:34PM

Ben: unfortunately, the Link toolbar has been removed from Mozilla 1.0 due to performance concerns. See: http://bugzilla.mozilla.org/show_bug.cgi?id=102992

justin50 03 2002 9:50PM

jason: just in case you aren't aware, you're already publishing an RSS feed (i won't point to it though since you haven't yet made up your mind). i'm sure you can get rid of it if you so choose, but MT puts it there by default, and it is publically accessible.

jkottke24 03 200210:24PM

Yeah, I know the feed is there. Just haven't told anyone about it yet.

Ben Hammersley15 04 200211:15AM

Mark: Not in RC3 it's not. In fact I have it on all the time, and I'm using build 2002053104 right now. I think that's an old bug...

Shmuel44 04 200211:44AM

What about pages (a site) that have more than one RSS file associated with them? For instance, say Jason chose to keep a list of books he was reading on the front page of his site along with the regular blog content. How would the tools respond to two links to RSS files from the same page?

Example:


Mark Pilgrim09 04 200212:09PM

Ben: I'm running 20020529 and the option is just gone from the View menu. Maybe it's been fixed; that'd be great.

Shmuel: multiple RSS files are fine, just have one LINK tag for each of them, with different titles. I do this for the category-specific feeds on my site. View-source on diveintomark.org for an example.

mattjacob00 05 2002 4:00PM

In build 2002053012 (v1.0 release), the option is also no longer there. I hope they bring it back.

This thread is closed to new comments. Thanks to everyone who responded.

kottke.org

Front page
About + contact
Site archives

Subscribe

Follow kottke.org on Twitter

Follow kottke.org on Tumblr

Like kottke.org on Facebook

Subscribe to the RSS feed

Advertisement

Ads by The Deck

Support kottke.org shop at Amazon

And more at Amazon.com

Looking for work?

More at We Work Remotely

Kottke @ Quarterly

Subscribe to Quarterly and get a real-life mailing from Jason every three months.

 

Enginehosting

Hosting provided EngineHosting