kottke.org posts about search

History of film and broadcasting search engineAug 20 2013

Citizen Kane Mag Cover

Lantern is a search engine for the books, periodicals, and catalogs contained in the Media History Digital Library. If you are a fan or student of pre-1970s American film and broadcasting, this looks like a goldmine. Here are some of the periodical titles and the years available:

Variety 1905-1926
Photoplay 1914-1943
Movie Classic 1931-1937
Home Movies and Home Talkies 1932-1934
Talking Machine World 1921-1928

(via candler blog)

Search data finds unknown drug side effectsMar 06 2013

Whenever I start to feel sick, I hit the Internet and start searching for more information about my symptoms. When a doctor writes me a prescription and I start feeling something unexpected, I search the web for side effects. And I'm not the only one whose first instinct is to turn my head and search. So many of us have adopted this behavior that researchers are gathering valuable information by studying our search queries and "have for the first time been able to detect evidence of unreported prescription drug side effects before they were found by the Food and Drug Administration's warning system."

Worrisome Facebook Graph Search queriesJan 23 2013

Facebook's new Graph Search can be used to find some very unusual, disturbing, and potentially dangerous things. Like "Married people who like Prostitutes", "Family members of people who live in China and like Falun Gong", and "Islamic men interested in men who live in Tehran, Iran".

Get the old Twitter timeline back (with @replies!)Jul 23 2012

Earlier today I shared a quick way to read a links-only version of your Twitter stream using Twitter's new "people you follow" search filter. More than three years ago, Twitter removed @replies to people you don't follow from people's streams... e.g. if I follow Jack Dorsey on Twitter and you don't, you won't see my "@jack That's great, congrats!" tweet in your stream. With the "people you follow" search filter, you now have the option of seeing all those @replies again: just do a search for some gibberish with the not operator in front of it. (But obviously not that gibberish because then you'll miss tweets with that link in it. Get yer own gibberish!)

Two things that I wished worked that don't: -@ and -# for searches that exclude @replies and #hastags.

Update: Andy Baio reminds me that you can filter out @replies and #hashtags with "-filter:replies" and "-filter:hashtags". Which makes things a bit more interesting. Using the "people you follow" filter in combination with other filters, you can see your Twitter stream in all sorts of different ways:

- Only links
- Only links excluding Foursquare, Instagram, or whatever...
- Without links
- Without links and @replies (which is kind of an amazingly old school way to read Twitter)

You can also use it to read your stream with certain terms excluded...say if you didn't want to read anything about the Presidential candidates, SXSW, Rupert Murdoch, the Yankees, or Gawker. I know other tools let you filter tweets in your stream in different ways, but this is the first time Twitter allows people to do it on their site, even if it is through the back door.

Turn your Twitter stream into your friends' linkblogJul 23 2012

A few weeks ago, Twitter added an option to search the tweets of only the people you follow. This is useful for several different reasons (try searching for [recent pop culture key phrase] to see what I mean) but for those who use Twitter primarily to find cool links to read/watch, it's an unexpected gift. To view your Twitter stream filtered to include only tweets containing links, just do a search for "http". Simple but powerful.

ps. Who knows if they're interested in this or not, but by a) making their entire archive available to search and b) allowing people to limit their search to their friends + 1-2 degrees of separation, Twitter could significantly better the search experience offered by Google et al in maybe 25-30% of all search cases. This is what Google is attempting to do with Google+ but Twitter could beat them to the punch.

Update: The search above, while quick, is also dirty in that it will include non-link tweets like "My favorite protocol is HTTP". The official Twitter way to is to use "filter:links", which will avoid that problem.

You can also filter out the links from your Twitter stream by negating the http search (this no longer works...), but you'll have to wade through all the @replies.

Google Image Search recursionJan 13 2012

This is mesmerizing: using Google Image Search and starting with a transparent image, this video cycles through each subsequent related image, over 2900 in all.

(via ★mattb)

Searching the web for planes in the skyNov 17 2011

If you search Wolfram Alpha for "planes overhead", it returns a list of planes passing over your current location along with a sky map of where to look.

Planes sky map

Do a barrel rollNov 03 2011

The most fun on the internet right now: go to Google and search for "do a barrel roll" (no quotes). Whee!

Happy birthday, big GOOGSep 27 2011

Google is thirteen today...back in 1998 when the site was still hosted at http://google.stanford.edu, Keith Dawson gave the search engine its first online coverage in English on the fondly remembered Tasty Bits From the Technology Front.

This site, one of the few rigorous academic research projects on Web searching, presents a demonstration database -- only 25M documents -- that already blows past most of the existing search engines in returning relevant nuggets. Google employs a concept of Page Rank derived from academic citation literature. Page Rank equates roughly to a page's importance on the Web: the more inbound links a page has, and the higher the importance of the pages linking to it, the higher its Page Rank.

Searching Vonnegut's story shapesSep 02 2011

Austin Kleon explicitly tied the last two posts together and fed Kurt Vonnegut's story shape graphs into Google Correlate's search by drawing feature. This is SO GOOD.

Vonnegut correlate

Google's search by drawing featureSep 02 2011

This is kind of amazing...you draw a graph and Google Correlate finds query terms whose popularity matches the drawn curve. I drew a bell curve, a very rough one peaking in 2007, and it matches a bunch of searches for "myspace".

Google Correlate

This fits beautifully with the previous post about Vonnegut's story shape graphs.

Autocomplete map of the United StatesDec 08 2010

Dorothy Gambrell looked up all of the state names on Google and made a map of what the autocomplete suggestions were. Here's part of it:

Autocomplete map

Lots of sports and schools.

Threaten customers for SEO? Go directly to jail.Dec 07 2010

The dickwad who threatened his customers as an SEO tactic (detailed here in the NY Times) was arrested on Monday by federal agents.

The merchant, Vitaly Borker, 34, who operates a Web site called decormyeyes.com, was charged with one count each of mail fraud, wire fraud, making interstate threats and cyberstalking. The mail fraud and wire fraud charges each carry a maximum sentence of 20 years in prison. The stalking and interstate threats charges carry a maximum sentence of five years.

He was arrested early Monday by agents of the United States Postal Inspection Service. In an arraignment in the late afternoon in United States District Court in Lower Manhattan, Judge Michael H. Dolinger denied Mr. Borker's request for bail, stating that the defendant was either "verging on psychotic" or had "an explosive personality." Mr. Borker will be detained until a preliminary hearing, scheduled for Dec. 20.

Hard-Coding Bias in Google "Algorithmic" Search ResultsNov 17 2010

Have you noticed that when you search Google for the answer to a mathematical calculation, the only result it lists is Google's own? I mean, just look at this obvious result tampering:

Google Bias

This "hard-coding" of calculation answers as the top search result goes against the company's supposed policy promising completely algorithmic and unbiased results. How are other mathematical calculation sites supposed to compete against the Mountain View search and math giant? What if 45 times 12 isn't actually 540? (I checked the calculation on Wolfram Alpha several times and on my iPhone calcuator and 540 appears to be correct. For now.)

And this isn't even Google's most egregious transgression. As Eric Meyer points out, Google is blocking private correspondence between private parties. That means that grandmothers aren't getting necessary information about erectile disfunction, people aren't finding out where they can play Texas Hold 'Em online, and the queries of Nigerian foreign ministers are going unanswered. There are millions of dollars sitting in a bank somewhere and all they need is a loan to get it out! Google! This. Is. Un. Acce. Ptable!

P.S. I think this "research" is obvious and the conclusions are misleading and biased. But then I don't have Ph.D. from Harvard, so what do I know?

Not your father's PageRankFeb 26 2010

Steven Levy on how Google's search algorithm has changed over the years.

Take, for instance, the way Google's engine learns which words are synonyms. "We discovered a nifty thing very early on," Singhal says. "People change words in their queries. So someone would say, 'pictures of dogs,' and then they'd say, 'pictures of puppies.' So that told us that maybe 'dogs' and 'puppies' were interchangeable. We also learned that when you boil water, it's hot water. We were relearning semantics from humans, and that was a great advance."

But there were obstacles. Google's synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein's theories about how words are defined by context. As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other. "Hot dog" would be found in searches that also contained "bread" and "mustard" and "baseball games" -- not poached pooches. That helped the algorithm understand what "hot dog" -- and millions of other terms -- meant. "Today, if you type 'Gandhi bio,' we know that bio means biography," Singhal says. "And if you type 'bio warfare,' it means biological."

Or in simpler terms, here's a snippet of a conversation that Google might have with itself:

A rock is a rock. It's also a stone, and it could be a boulder. Spell it "rokc" and it's still a rock. But put "little" in front of it and it's the capital of Arkansas. Which is not an ark. Unless Noah is around.

Google's Super Bowl adFeb 08 2010

It didn't feature an athletic woman with a flimsy bra throwing a hammer through a screen, but I thought Google's Super Bowl ad was pretty well done:

Google DNSDec 03 2009

Google announced their public DNS server today. I'm using it right now. There's been a bunch of speculation as to why Google is offering this service for free but the reason is pretty simple: they want to speed up people's Google search results. In 2006, Google VP Marissa Mayer told the audience at the Web 2.0 conference that slowing a user's search experience down even a fraction of a second results in fewer searches and less customer satisfaction.

Marissa ran an experiment where Google increased the number of search results to thirty. Traffic and revenue from Google searchers in the experimental group dropped by 20%.

Ouch. Why? Why, when users had asked for this, did they seem to hate it?

After a bit of looking, Marissa explained that they found an uncontrolled variable. The page with 10 results took .4 seconds to generate. The page with 30 results took .9 seconds.

Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

Former Amazon employee Greg Linden backs up Mayer's claim:

This conclusion may be surprising -- people notice a half second delay? -- but we had a similar experience at Amazon.com. In A/B tests, we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue.

Google's new search engineAug 11 2009

Google is developing their next-generation search engine and needs your help in testing it out.

For the last several months, a large team of Googlers has been working on a secret project: a next-generation architecture for Google's web search. It's the first step in a process that will let us push the envelope on size, indexing speed, accuracy, comprehensiveness and other dimensions. The new infrastructure sits "under the hood" of Google's search engine, which means that most users won't notice a difference in search results.

(via waxy)

Google's search dominanceMay 29 2009

After I heard Microsoft's announcement of yet-another-interation of their search engine (named Bing), I went to look at the stats for kottke.org for the past month to see how many visitors each search engine sent to the site. I couldn't believe how dominant Google was.

Google | 262,946 | 93.8%
MS Live | 4,307 | 1.5%
Yahoo | 4,036 | 1.4%
MSN | 2,796 | 1.0%

It's a small sample and doesn't match up with Comscore's numbers (Google: 64.2%, Yahoo: 20.4%, MS: 8.2%), but wow. As a comparison, the numbers for a year ago for kottke.org had Google at 91%, Yahoo at 4.9%, and Live at 0.7%.

Federated searchMay 22 2009

At some event called the Churchill Club Top Tech Trends, VC Steve Jurvetson had an interesting idea about the future direction of search.

He said the aggregate power of distributed human activity will trump centralized control. His main point was that Google, and other search engines that analyze the Web and links, are much less useful than a (theoretical) search engine that knows not what people have linked to (as Google does), but rather what pages are open on people's browsers at the moment that people are searching. "All the problems of search would be solved if search relevance was ranked by what browsers were displaying," he said.

I like that idea a lot, but it got me thinking: how many instances of Firefox can you run on a cheapo LInux box, how many tabs could you have open in each of those browsers, and would that be more or less cost effective than the search term gaming that currently happens? In other words, good luck with that!

WolframAlpha launchesMay 18 2009

If you're skeptical of WolframAlpha (as I was), you should watch this introduction by Stephen Wolfram. The comparison to Google (usually "is WolframAlpha a Google killer?") is not a good one but the new service could learn a little something from the reigning champion: hide the math. One of the geniuses of Google is that it took simple input and gave simple output with a whole lot of complexity in between that no one saw and few people cared about. Plus the underlying premise of the complex computation was simplified, branded (PageRank!), and became a value proposition for Google: here's what the web itself thinks is important about your query.

The country's new robots.txt fileJan 20 2009

Here's a small and nerdy measure of the huge change in the executive branch of the US government today. Here's the robots.txt file from whitehouse.gov yesterday:

User-agent: *
Disallow: /cgi-bin
Disallow: /search
Disallow: /query.html
Disallow: /omb/search
Disallow: /omb/query.html
Disallow: /expectmore/search
Disallow: /expectmore/query.html
Disallow: /results/search
Disallow: /results/query.html
Disallow: /earmarks/search
Disallow: /earmarks/query.html
Disallow: /help
Disallow: /360pics/text
Disallow: /911/911day/text
Disallow: /911/heroes/text

And it goes on like that for almost 2400 lines! Here's the new Obamafied robots.txt file:

User-agent: *
Disallow: /includes/

That's it! BTW, the robots.txt file tells search engines what to include and not include in their indexes. (thx, ian)

Update: Nearly four months later, the White House's robots.txt file is still short...only four lines.

User-agent: *
Disallow: /includes/
Disallow: /search/
Disallow: /omb/search/

TinEye image searchJan 07 2009

TinEye is an image search engine. You give it an image and it'll find it on the web for you. If it works -- I didn't get to try it too much because it was down -- this is great for chasing down attribution and finding other pix by the same photographer and such. (via master kalina)

Google Book Search now does magazinesDec 10 2008

Google Book Search has added a few magazines to their repertoire.

Today, we're announcing an initiative to help bring more magazine archives and current magazines online, partnering with publishers to begin digitizing millions of articles from titles as diverse as New York Magazine, Popular Mechanics, and Ebony.

At least I think it's a few magazines...it might be thousands but there's no way (that I can find) to view a list of magazines on offer.

Update: Spellbound and Thomas Gruber have lists of some of the magazines on offer.

Search correlations with StateStatsDec 03 2008

StateStats is hours of fun. It tracks the popularity of Google searches per state and then correlates the results to a variety of metrics. For instance:

Mittens - big in Vermont, Maine, and Minnesota, moderate positive correlation with life expectancy, and moderate negative correlation with violent crime. (Difficult to commit crimes while wearing mittens?)

Nascar - popular in North and South Carolinas, strong positive correlation with obesity, and and moderate negative correlation with same sex couples and income.

Sushi - big in NY and CA, moderate positive correlation with votes for Obama, and moderate negative correlation with votes for Bush.

Gun - moderate positive correlation with suicide and moderate negative correlation with votes for Obama. (Obama is gonna take away your guns but, hey, you'll live.)

Calender (misspelled) - moderate positive correlation with illiteracy and rainfall and moderate negative correlation with suicide.

Diet - moderate positive correlation with obesity and infant mortality and moderate negative correlation with high school graduation rates.

Kottke - popular in WI and MN, moderate positive correlation with votes for Obama, and moderate negative correlation with votes for Bush.

Cuisine - This was my best attempt at a word with strong correlations but wasn't overly clustered in an obvious way (e.g. blue/red states, urban/rural, etc.). Strong positive correlation with same sex couples and votes for Obama and strong negative correlation with energy consumption and votes for Bush.

I could do this all day. A note on the site about correlation vs. causality:

Be careful drawing conclusions from this data. For example, the fact that walmart shows a moderate correlation with "Obesity" does not imply that people who search for "walmart" are obese! It only means that states with a high obesity rate tend to have a high rate of users searching for walmart, and vice versa. You should not infer causality from this tool: In the walmart example, the high correlation is driven partly by the fact that both obesity and Walmart stores are prevalent in the southeastern U.S., and these two facts may have independent explanations.

Can you find any searches that show some interesting results? Strong correlations are not that easy to find (although foie gras is a good one). (thx, ben)

2001, a search odysseyOct 01 2008

Google has released a search engine that only searches their index from 2001. kottke.org is in there. (via waxy)

Rogers Cadenhead has beaten me to theDec 20 2007

Rogers Cadenhead has beaten me to the punch in calculating the winner of the Dave Winer/Martin Nisenholtz Long Bet pitting the NY Times vs. blogs to see who ranks higher in end of the year search results for the 5 most important news stories of 2007. The winner? Wikipedia.

The Times has really improved their position in Google since 2005...opening up their archives helped, I bet.

There are indications that Google is changingOct 24 2007

There are indications that Google is changing their PageRank algorithm, possibly to penalize sites running paid links or too many cross-promotional links across blog networks. Affected sites include Engadget, Forbes, and Washington Post. Even Boing Boing, which I think had been at 9, is down to 7. You can check a site's PR here.

Depending on the site, 30-40% of a site's total traffic can come from search engines, much of that from Google. It will be interesting to see how much of an impact the PR drop will have on their traffic and revenue. (thx, my moon my mann)

Update: Just got the following from the editor of a site that got its PR bumped down. He says:

Two weeks ago I lost 80% of my search traffic due to, I believe, using ads from Text-Link-Ads, which does not permit the "nofollow" attribute on link ads. That meant an overall drop of more than 44% of my total traffic. It also meant a 65%-95% drop in Google AdSense earnings per day and a loss of PageRank from 7 to 6.

He has removed the text links from his site and is negotiating with Google for reinstatement but estimates a loss in revenue of $10,000 for the year due to this change. And this is for a relatively small site...the Engadget folks must be freaking out.

What if the Google homepage were optimizedOct 17 2007

What if the Google homepage were optimized for Google? (via magnetbox)

Speaking of cool Etsy shops, elastiCo isSep 17 2007

Speaking of cool Etsy shops, elastiCo is selling pillows and tshirts with the most popular Google News search terms printed on them.

A rerun, because it came up atJun 07 2007

A rerun, because it came up at dinner the other night: EPIC 2014, the recent history of technology and the media as told from the vantage point of 7 years in the future. "2008 sees the alliance that will challenge Microsoft's ambitions. Google and Amazon join forces to form Googlezon. Google supplies the Google Grid and unparalled search technology. Amazon supplies the social recommendation engine and its huge commercial infrastructure."

An interesting somewhat-inside look at Google's searchJun 04 2007

An interesting somewhat-inside look at Google's search technology. I found this interesting: "When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds." No matter how hard CNN or Digg or Twitter works to harness their audience to break news, hooking up Google search queries to Google News in a useful manner would likely scoop them all every time.

Google is the crossword puzzler's best friend.May 28 2007

Google is the crossword puzzler's best friend. Several of the top 100 searches on a given day are for crossword clues. This was more apparent a few days ago but it looks like they've started to filter the crossword terms out. More here. (thx, peggy & jonah)

kottke.org banned from Technorati top 100?May 10 2007

Since swearing off Technorati a couple of years ago, I've been checking back every few months to see if the situation has improved. The site is definitely more responsive but their data problems seemingly remain, at least with regard to kottke.org; Google Blog Search gives consistently better results and easy access to RSS feeds of searches.

Technorati recently introduced something called the Technorati Authority number, which is a fancy name for the number of blogs linking to a site in the last six months. Curious as to where kottke.org fell on the authority scale, I checked out the top 100 blogs list. Not there, so I proceeded to the "Everything in the known universe about kottke.org" page where a portion of that huge cache of kottke.org knowledge was the authority number: 5,094. Looking at the top 100 list, that should put the site at #47, nestled between The Superficial and fishki.net, but it's not there. Technorati also currently states that kottke.org hasn't been updated in the last day, despite several updates since then and my copy of MT pinging Technorati after each update.

Maybe kottke.org has been intentionally excluded because I've been so hard on them in the past. Or maybe it's just a glitch (or two) in their system. Or maybe it's an indication of larger problems with their service. Either way, as the company is attempting to offer an authentic picture of the blogosphere, this doesn't seem like the type of rigor and accuracy that should send reputable media sources like the BBC, Washington Post, NY Times, and the Wall Street Journal scurrying to their door looking for reliable data about blogs.

Update: As of 3:45pm EST, the top 100 list has been updated to include kottke.org. The site also picked up this post right away, but failed to note a subsequent post published a few minutes later..

Google buys Doubleclick for $3.1 billion. My assertionApr 14 2007

Google buys Doubleclick for $3.1 billion. My assertion more than four years ago that Google is not a search engine isn't looking too shabby.

Google AppsFeb 22 2007

The NY Times today:

On Thursday, Google, the Internet search giant, will unveil a package of communications and productivity software aimed at businesses, which overwhelmingly rely on Microsoft products for those functions.

The package, called Google Apps, combines two sets of previously available software bundles. One included programs for e-mail, instant messaging, calendars and Web page creation; the other, called Docs and Spreadsheets, included programs to read and edit documents created with Microsoft Word and Excel, the mainstays of Microsoft Office, an $11 billion annual franchise.

kottke.org from April 2004:

Google isn't worried about Yahoo! or Microsoft's search efforts...although the media's focus on that is probably to their advantage. Their real target is Windows. Who needs Windows when anyone can have free unlimited access to the world's fastest computer running the smartest operating system? Mobile devices don't need big, bloated OSes...they'll be perfect platforms for accessing the GooOS. Using Gnome and Linux as a starting point, Google should design an OS for desktop computers that's modified to use the GooOS and sell it right alongside Windows ($200) at CompUSA for $10/apiece (available free online of course). Google Office (Goffice?) will be built in, with all your data stored locally, backed up remotely, and available to whomever it needs to be (SubEthaEdit-style collaboration on Word/Excel/PowerPoint-esque documents is only the beginning). Email, shopping, games, music, news, personal publishing, etc.; all the stuff that people use their computers for, it's all there.

When you swing a hammer in the vicinity of so many nails, you're bound to hit one on the head every once in awhile. Well, I got it in the general area of the nail, anyway.

FindSounds is a search engine for sounds.Feb 21 2007

FindSounds is a search engine for sounds. Here's a collection of bee sounds. Bzzzz....

Jeffrey Toobin, the New Yorker's legal writer,Feb 01 2007

Jeffrey Toobin, the New Yorker's legal writer, has penned a piece about Google's book scanning efforts and the legal challenges it faces. Interestingly, both Google and the publishers who are suing them say that the lawsuit is basically a business negotiation tactic. However, according to Larry Lessig, settling the lawsuit might not be the best thing for anyone outside the lawsuit: "Google wants to be able to get this done, and get permission to resume scanning copyrighted material at all the libraries. For the publishers, if Google gives them anything at all, it creates a practical precedent, if not a legal precedent, that no one has the right to scan this material without their consent. That's a win for them. The problem is that even though a settlement would be good for Google and good for the publishers, it would be bad for everyone else."

How to disable the stupid Snap PreviewJan 29 2007

How to disable the stupid Snap Preview things that are popping up on everyone's site these days. (via df)

Google is now including YouTube videos inJan 26 2007

Google is now including YouTube videos in Google Video search results. I love the smell of synergy in the morning.

No nofollowJan 22 2007

All links on Wikipedia now automatically use the "nofollow" attribute, which means that when Google crawls the site, none of the links it comes across get any PageRank from appearing on Wikipedia. SEO contest concerns aside, this also has the effect of consolidating Wikipedia's power. Now it gets all the Google juice and doesn't pass any of it along to the sources from which it gets information. Links are currency on the web and Wikipedia just stopped paying it forward, so to speak.

It's also unclear how effective nofollow is in curbing spam. It's too hard for spammers to filter out which sites use nofollow and which do not and much easier & cheaper just to spam everyone and everywhere. Plus there's a not-insignificant echo effect of links in Wikipedia articles getting posted elsewhere so the effort is still worth it for spammers.

Dick Cheney's Google searches. "lynne cheney MySpace"Dec 08 2006

Dick Cheney's Google searches. "lynne cheney MySpace"

Search, always deadOct 30 2006

Via Tim O'Reilly comes this comment from Bill Burnham:

A couple of months ago I had the pleasure of moderating a panel at TIECon on the Search Industry. Peter Norvig, Google's Director of Research, made one comment in particular that stood out in my mind at the time. In response to a question about the prospects for the myriad of search start-ups looking for funding Peter basically said, and I am paraphrasing somewhat, that search start-ups, in the vein of Google, Yahoo Ask, etc. are dead. Not because search isn't a great place to be or because they can't create innovative technologies, but because the investment required to build and operate an Internet-scale, high performance crawling, indexing, and query serving farm were now so great that only the largest Internet companies had a chance of competing.

For Norvig to say what he did seems a little crazy, given the company he works for. The first time that search died was back in 1998. Yahoo, Altavista, Hotbot, Webcrawler, and other sites had the search game all sewn up. They were all about the same in terms of quality and people found what they were looking for much of the time. No one needed another search engine, and starting a search company in such a mature market seemed like folly. Around that time, Google became a company and eventually the world figured out it really did need another search engine.

Not sure if this is the actualOct 10 2006

Not sure if this is the actual code or not, but the source code for MS-DOS 6.0 appears to be available on Google Code Search (by way of a search for "microsoft confidential"). More Google Code Search goodies here. (thx, aj)

O students! Pray teachers! Behold: a Shakespeare search engine.Sep 14 2006

O students! Pray teachers! Behold: a Shakespeare search engine.

Simply Google, a one-pager for navigating andMay 31 2006

Simply Google, a one-pager for navigating and searching all of Google's offerings.

Dictionary wordsMay 30 2006

I've been keeping track of words which return a link to a dictionary definition of the word in Google. Dictionary words are those that are written but not written about, haven't been subject to the corporate/band/blog word grab, or aren't otherwise popular words.

germane
paucity
reticent
cantankerous
suppositious
abstruse
whinge
assiduous
surreptitious
proclivity
disparaging
sporadically
hypertrophied
pallor
acerbic
surfeit

Many of the Dictionary.com Words of the Day are probably dictionary words as well.

"The Google search box is like theMay 24 2006

"The Google search box is like the Tardis -- there's a lot more inside that little box than you expect".

Steven Johnson responds to (blasts? slams?) theMay 11 2006

Steven Johnson responds to (blasts? slams?) the endangered joy of serendipity piece I just linked to, arguing that the web is a much better serendipity engine than the library. (BTW, I think Steven is part machine himself...after posting that link, I took out the trash and ducked out to get something at the bodega around the corner and when I got back, there's a message from him in my inbox with a link to his rant. Jesus.)

William McKeen on the "endangered joys ofMay 11 2006

William McKeen on the "endangered joys of serendipity". "Do people browse anymore? We have become such a directed people. We can target what we want, thanks to the Internet. It's efficient, but dull."

pb fills us in on how heMay 05 2006

pb fills us in on how he finds lost URLs. In addition to the techniques he lists, I use the search function on my newsreader, which content also gets indexed with Spotlight so that works as well.

Google can be used for finding scientificApr 27 2006

Google can be used for finding scientific papers that are more popular (and influential?) than their number of citations would otherwise indicate. "The technique might also emerge as a more useful measure of scientific impact than merely the number of citations alone."

kottke.org: #1 Google search result for "nudeApr 18 2006

kottke.org: #1 Google search result for "nude paddleball players". (thx, jonah)

John Battelle's book, The Search, is notMar 08 2006

John Battelle's book, The Search, is not available on Google Book Search because his publisher, Penguin, is suing Google over Book Search. On Penguin's decision, Battelle says, "I totally disagree with it" and "It's very irritating to me". (via jb)

Ouch for Amazon and A9: Udi Manber heads to Google.Feb 09 2006

Ouch for Amazon and A9: Udi Manber heads to Google.

Business 2.0 imagines Google's future: as The Media,Feb 02 2006

Business 2.0 imagines Google's future: as The Media, as The Internet, its death, and as God.

Blogs versus the NY Times in GoogleJan 30 2006

In 2002, Dave Winer of Scripting News and Martin Nisenholtz of the New York Times made a Long Bet about the authority of weblogs versus that of NY Times in Google:

In a Google search of five keywords or phrases representing the top five news stories of 2007, weblogs will rank higher than the New York Times' Web site.

I decided to see how well each side is doing by checking the results for the top news stories of 2005. Eight news stories were selected and an appropriate Google keyword search was chosen for each one of them. I went through the search results for each keyword and noted the positions of the top results from 1) "traditional" media, 2) citizen media, 3) blogs, and 4) nytimes.com. Finally, the scores were tallied and an "actual" winner (blogs vs. nytimes.com) and an "in-spirit" winner (any traditional media source vs. any citizen media source) were calculated. (For more on the methodology, definitions, and caveats, read the methodology section below.)

So how did the NY Times fare against blogs? Not very well. For eight top news stories of 2005, blogs were listed in Google search results before the Times six times, the Times only twice. The in-spirit winner was traditional media by a 6-2 score over citizen media. Here the specific results:

1) Hurricane Katrina hits New Orleans.
Search term: "hurricane katrina"

3. Top citizen media result (Wikipedia)
13. Top media result (CNN)
56. Top NY Times mention (NY Times).
61. Top blog result (Kaye's Hurricane Blog)

Winner (in spirit): Citizen media
Winner (actual): NY Times

2) Big changes in the US Supreme Court (Rhenquist dies, O'Conner retires, Roberts appointed Chief Justice, Harriet Miers rejected).
Search term: "harriet miers"

4. Top media result (Washington Post)
5. Top citizen media result (Wikipedia)
8. Top NY Times mention (NY Times)
11. Top blog result (TalkLeft)

Winner (in spirit): Media
Winner (actual): NY Times

3) Terrorists bomb London, killing 52.
Search term: "london bombing"

1. Top media result (CNN)
2. Top citizen media result (Wikipedia)
21. Top blog result Schneier on Security
No NY Times article appears in the first 100 results.

Winner (in spirit): Media
Winner (actual): Blogs

4) First elections in Iraq after Saddam.
Search term: "iraq election"

1. Top media result (BBC News)
6. Top blog result (Iraq elections newswire)
6. Top citizen media result (Iraq elections newswire)
14. Top NY Times mention (NY Times)

Winner (in spirit): Media
Winner (actual): Blogs

5) Terri Schiavo legal fight and death.
Search term: "terri schiavo"

2. Top blog result (Abstract Appeal)
2. Top citizen media result (Abstract Appeal)
4. Top media result (CNN)
65. Top NY Times mention (NY Times)

Winner (in spirit): Citizen media
Winner (actual): Blogs

6) Pope John Paul II dies and Cardinal Joseph Ratzinger appointed Pope Benedict XVI.
Search term: "pope john paul ii death"

1. Top media result (CNN)
3. Top citizen media result (Wikipedia)
58. Top blog result (The Pope Blog: Pope Benedict XVI)
No NY Times article appears in the first 100 results.

Winner (in spirit): Media
Winner (actual): Blogs

7) The Israeli withdrawal from the Gaza Strip.
Search term: "gaza withdrawal"

1. Top media result (Worldpress.org)
31. Top blog result (Simply Appalling)
31. Top citizen media result (Simply Appalling)
No NY Times article appears in the first 100 results.

Winner (in spirit): Media
Winner (actual): Blogs

8) The investigation into the Valerie Plame affair, Judith Miller, Scooter Libby indicted, etc..
Search term: "scooter libby indicted":

1. Top media result (CNN)
15. Top blog result (Seven Generational Ruminations)
15. Top citizen media result (Seven Generational Ruminations)
43. Top NY Times mention (NY Times)

Winner (in spirit): Media
Winner (actual): Blogs

And just for fun here's a search for "judith miller jail" (not included in the final tally):

1. Top media result (Washington Post)
3. Top blog result (Gawker)
3. Top citizen media result (Gawker)
No NY Times article appears in the first 100 results (even though there are several matching articles on the Times site).

In covering the jailing of their own reporter, the Times lagged in the Google results behind such informational juggernauts as Drinking Liberally, GOP Vixen, and Feral Scholar.

Winner (in spirit): Media
Winner (actual): Blogs

Here's the overall results, excluding the Judith Miller search:

Overall winner (in spirit): Media (beating citizen media 6-2).
Overall winner (actual): Blogs (beating the NY Times 6-2).

Some observations:

  • My feeling is that Mr. Nisenholtz will likely lose his bet come 2007. Even though the nytimes.com fares very well in getting linked to by the blogosphere, it does very poorly in Google. This isn't exactly surprising given that most NY Times articles disappear behind a paywall after a week and some of their content (TimesSelect) isn't even publicly accessible at all. Also, I didn't look too closely at the HTML markup of the NY Times, but it could also be that it's not as optimized for Google as well as that of some weblogs and other media outlets.
  • "www.nytimes.com" has a PageRank of 10/10, higher than that of "www.cnn.com" (9/10), yet stories from CNN consistently appeared higher in the search results than those from the Times. The Times clearly has overall authority according to Google, but when it comes to specific instances, it falls short. In some cases, a NY Times story didn't even appear in the first 100 search results for these keyword searches.
  • By 2007, it may be difficult to differentiate a blog from a traditional media source. All of the Gawker and Weblogs, Inc. sites are presented in a blog format and are referred to as blogs but otherwise how are they distinguishable from traditional media? Engadget paid to send 12 people to cover the CES technology conference, probably as many or more than the Times sent. The Sundance film festival was heavily covered by paid writers for both companies as well. In the spirit in which this bet was made, I'd have a hard time counting any of their sites as blogs. (And what about kottke.org? I get paid to write it. Am I still a member of the citizen media or have I crossed over?)
  • Choosing appropriate news stories and keywords for those stories was difficult in some cases. Katrina was a no-brainer, but was the Terri Schiavo story really one of the top eight news stories of 2005? Resolving the methodology for this bet in 2007 will be tricky. I wonder how the Long Bets Foundation will handle its determination of the victory.
  • Wikipedia does very well in Google results for topical search terms. Overall, traditional media still dominates (in first appearance as well as number of results), but blogs and Wikipedia do very well in some instances.
  • What do these results mean? Probably not a whole lot. Nisenholtz asserts that "[news] organizations like the Times can provide that far more consistently than private parties can" while Winer says that "in five years, the publishing world will have changed so thoroughly that informed people will look to amateurs they trust for the information they want". It's difficult to draw any conclusions on this matter based on these results. Contrary to what most people believe, PageRank has a bias, a point of view. That POV is based largely (but not entirely) on what people are linking to. As someone said in the discussion of this bet, this bet is about Google more than influence or reputation, so these results probably tell us more about how Google determines influence on a keyword basis rather than how readers of online informational sources value or rate those sources. Do web users prefer the news coverage of blogs to that of the NY Times? I don't think you can even come close to answering that question based on these results.

Methodology and caveats

The eight news stories were culled from various sources (Lexis-Nexis, Wikipedia, NY Times) and narrowed down to the top stories that would have been prominently covered in both the NY Times and blogs.

The keyword phrase for each of the eight stories was selected by the trial and error discovery of the shortest possible phrase that yielded targeted search results about the subject in question. In some cases, the keyword phrase chosen only returned results for a part of a larger news story. For instance, the phrase "pope john paul" was not specific enough to get targeted results, so "pope john paul ii death" was used, but that didn't give results about the larger story of his death, the conclave to select a new pope, and the selection of Cardinal Joseph Ratzinger as Pope Benedict XVI. In the case of "katrina", that single keyword was enough to produce hundreds of targeted search results for both Hurricane Katrina and its aftermath. Keyword phrases were not tinkered with to promote or demote particular types of search results (i.e. those for blogs or nytimes.com); they were only adjusted for the relevence of overall results.

The searches were all done on January 27, 2006 with Google's main search engine, not their news specific search.

Since the spirit of the bet deals with the influence of traditional media versus that of citizen-produced media, I tracked the top traditional media (labeled just "media" above) results and the top citizen media results in addition to blog and nytimes.com results. For the purposes of this exercise, relevent results were those that linked to pages that an interested reader would use as a source of information about a news story. For citizen media, this meant pages on Wikipedia, Flickr (in some cases), weblogs, message boards, wikis, etc. were fair game. For traditional media, this meant articles, special news packages, photo essays, videos, etc.

In differentiating between "media" & citizen media and also between relevent and non-relevent results, in only one instance did this matter. Harriet Miers's Blog!!!, a fictional satire written as if the author were Harriet Miers, was the third result for this keyword phrase, but since the blog was not a informational resource, I excluded it. In all other cases, it was pretty clear-cut.

Chris Anderson has one of the bestDec 22 2005

Chris Anderson has one of the best descriptions I've read of collective knowledge systems like Google, Wikipedia, and blogs: they're probabilistic systems "which sacrifice perfection at the microscale for optimization at the macroscale".

Blog search still sucks (a little)Dec 21 2005

Update: I fucked up on this post and you should reread it if you've read it before. After reading this post by Niall Kennedy, I checked and found that I have mentioned or linked to the site for Freakonomics 5 times (1 2 3 4 5), not 13. The other 8 times, I either linked to a post on the Freakonomics blog that was unrelated to the book, had the entry tagged with "freakonomics" (tags are not yet exposed on my site and can't be crawled by search engines), or I used the word "Freakonomists", not "Freakonomics". Bottom line: the NY Times listing is still incorrect, Google and Yahoo picked up all the posts where I actually mentioned "Freakonomics" in the text of the post but missed the 2 links to freakonomics.com, Google Blog Search got 2/3 (& missed the 2 links), Technorati got 1/3 (& missed the 2 links), and IceRocket, Yahoo Blog Search, BlogPulse, & Bloglines whiffed entirely. Steven Levitt would be very disappointed in my statistical fact-checking skills right now. :(

I wish Niall had emailed me about this instead of posting it on his site, but I guess that's how weblogs work, airing dirty laundry instead of trying to get it clean. Fair enough...I've publicly complained about the company he works for (Technorati) instead of emailing someone at the company about my concerns, so maybe he had a right to hit back. Perhaps a little juvenile on both our parts, I'd say. (Oh, and I turned off the MT search thing that Niall used to check my work. I'm not upset he used it, but I'm irritated that it seems to be on by default in MT...I never intended for that search interface to be public.)

------

The NY Times recently released their list of the most blogged about books of 2005. Their methodology in compiling the list:

This list links to a selection of Web posts that discuss some of the books most frequently mentioned by bloggers in 2005. The books were selected by conducting an automated survey of 5,000 of the most-trafficked blogs.

Unsurprisingly, the top spot on the list went to Freakonomics. I remembered mentioning the book several times on my site (including this interview with author Steven Levitt around the release of the book), so I checked out the citations they had listed for it. According to the Times, Freakonomics was cited by 125 blogs, but not once by kottke.org, a site that by any measure is one of the most-visited blogs out there.[1] A quick search in my installation of Movable Type yielded 13 5 mentions of the book on kottke.org in the last 9 months. I had also mentioned Blink, Harry Potter, Getting Things Done, Collapse, The Wisdom of Crowds, The Singularity is Near, and State of Fear, all of which appear in the top 20 of the Times' list and none of which are cited by the Times as having been mentioned on kottke.org in 2005.

I chalked this up to a simple error of omission, but then I started checking around some more. Google's main index returned only three distinct mentions of Freakonomics on kottke.org. Google Blog Search returned two results. Yahoo: 3 results (0 results on Yahoo's blog search). Technorati only found one result (I'm not surprised). Many of the blog search services don't even let you search by site, so IceRocket, BlogPulse, and Bloglines were of no help. (See above for corrections.) I don't know where the Times got their book statistics from, but it was probably from one of these sites (or a similar service).

Granted this is just one weblog[2], which I only checked into because I'm the author, but it's not like kottke.org is hard to find or crawl. The markup is pretty good [3], fairly semantic, and hasn't changed too much for the past two years. The subject in question is not off-topic...I post about books all the time. And it's one of the more visible weblogs out there...lots of links in to the front page and specific posts and a Google PR of 8. So, my point here is not "how dare the Times ignore my popular and important site!!!" but is that the continuing overall suckiness of searching blogs is kind of amazing and embarrassing given the seemingly monumental resources being applied to the task. It's forgivable that the Times would not have it exactly right (especially if they're doing the crawling themselves), but when companies like Technorati and Google are setting themselves up as authorities on how large the blogosphere is, what books and movies people are reading/watching, and what the hot topics online are but can't properly catalogue the most obvious information out there, you've got to wonder a) how good their data really is, and b) if what they are telling us is actually true.

[1] Full disclosure: I am the author of kottke.org.

[2] This is an important point...these observations are obviously a starting point for more research about this. But this one hole is pretty gaping and fits well with what I've observed over the past several months trying to find information on blogs using search engines.

[3] I say only pretty good because it's not validating right now because of entity and illegal character errors, which I obviously need to wrestle with MT to correct at some point. But the underlying markup is solid.

Google search for "i don't read kottke"Dec 14 2005

Google search for "i don't read kottke" versus a search for "i don't read boing boing". Nottke** wins, 39 to 37! Sit on it, Cory!

** Nottke = not Kottke, coinage by John Gruber.

Amazon/Alexa is opening up their index,Dec 13 2005

Amazon/Alexa is opening up their index, letting people access the raw data, processing power, and even the crawlers. What a huge idea. (via bb)

AIGA Voice has an interview with PeterDec 12 2005

AIGA Voice has an interview with Peter Morville about his new book, Ambient Findability. A question from the interview that everyone responsible for a web site should be asking themselves (emphasis mine): "Can [people] find your content, products and services despite your website?" Love that.

Book author to her publishing company: your lawsuit is not helping me or my bookOct 20 2005

I got an email this morning from a kottke.org reader, Meghann Marco. She's an author and struggling to get her book out into the hands of people who might be interested in reading it. To that end, she asked her publisher, Simon & Schuster, to put her book up on Google Print so it could be found, and they refused. Now they're suing Google over Google Print, claiming copyright infringement. Meghann is not too happy with this development:

Kinda sucks for me, because not that many people know about my book and this might help them find out about it. I fail to see what the harm is in Google indexing a book and helping people find it. Anyone can read my book for free by going to the library anyway.

In case you guys haven't noticed, books don't have marketing like TV and Movies do. There are no commercials for books, this website isn't produced by my publisher. Books are driven by word of mouth. A book that doesn't get good word of mouth will fail and go out of print.

Personally, I hope that won't happen to my book, but there is a chance that it will. I think the majority of authors would benefit from something like Google Print.

She has also sent a letter of support to Google which includes this great anecdote:

Someone asked me recently, "Meghann, how can you say you don't mind people reading parts of your book for free? What if someone xeroxed your book and was handing it out for free on street corners?"

I replied, "Well, it seems to be working for Jesus."

And here's an excerpt of the email that Meghann sent me (edited very slightly):

I'm a book author. My publisher is suing Google Print and that bothers me. I'd asked for my book to be included, because gosh it's so hard to get people to read a book.

Getting people to read a book is like putting a cat in a box. Especially for someone like me, who was an intern when she got her book deal. It's not like I have money for groceries, let alone a publicist.

I feel like I'm yelling and no one is listening. Being an author can really suck sometimes. For all I know speaking up is going to get me blacklisted and no one will ever want to publish another one of my books again. I hope not though.

[My book is] called 'Field Guide to the Apocalypse' It's very funny and doesn't suck. I worked really hard on it. It would be nice if people read it before it went out of print.

As Tim O'Reilly, Eric Schmidt, and Google have argued, I think these lawsuits against Google are a stupid (and legally untenable) move on the part of the publishing industry. I know a fair number of kottke.org readers have published books...what's your take on the situation? Does Google Print (as well as Amazon "Search Inside the Book" feature) hurt or help you as an author? Do you want your publishing company suing Google on your behalf?

Google finally launches a blog search service.Sep 14 2005

Google finally launches a blog search service. The default search is by relevance, which I'm not sure is correct, and it's pretty bare bones so far, but I'm sure that many other people will be saying so long, Technorati. Also available in Blogger flavor. (via waxy)

Use the Technorati Accelerator to "search onSep 08 2005

Use the Technorati Accelerator to "search on any URL and get the same response you would have to wait thirty seconds for on their site". Zing!

These are the people in my (Web) neighborhoodAug 25 2005

In reaction to some ads of questionable value being placed on some of O'Reilly's sites (response from Tim O'Reilly), Greg Yardley has written a thoughtful piece on selling PageRank called I am not responsible for making Google better:

Google, Yahoo, Microsoft and the other big search engine companies aren't public utilities - they're money-making, for-profit enterprises. It's time to stop thinking of search engines as a common resource to be nurtured, and start thinking of them as just another business to compete with or cooperate with as best suits your individual needs.

I love the idea that after more than 10 years of serious corporate interest in the Web that it's still up to all of us and our individual decisions. The search engines in particular are based on our collective action; they watch and record the trails left as we scatter the Web with our thoughts, commerce, conversations, and connections.

Me? I tend to think I need Google to be as good a search engine as it can be and if I can help in some small way, I'm going to. As corny as it sounds, I tend to think of the sites I frequent as my neighborhood. If the barista at Starbucks is sick for a day, I'm not going to jump behind the counter and start making lattes, but if there's a bit of litter on the stoop of the restaurant on the corner, I might stop to pick it up. Or if I see some punk slipping a candy bar into his pocket at the deli, I may alert the owner because, well, why should I be paying for that guy's free candy bar every time I stop in for a soda?

Sure those small actions help those particular businesses, but they also benefit the neighborhood as a whole and, more importantly, the neighborhood residents. If I were the owner of a business like O'Reilly Media, I'd be concerned about making Google or Yahoo less useful because that would make it harder for my employees and customers to find what they're looking for (including, perhaps, O'Reilly products and services). As Greg said, the Web is still largely what we make of it, so why not make it a good Web?

GoogleOS? YahooOS? MozillaOS? WebOS?Aug 23 2005

Before we get going, here are some alternate titles for this post, just to give you an idea of what I'm trying to get at before I actually, you know, get at it:

  • You're probably wondering why Yahoo bought Konfabulator
  • An update on Google Browser, GooOS and Google Desktop
  • A platform that everyone can stand on and why Apple, Microsoft, and, yes, even Google will have to change their ways to be a part of it
  • The next killer app: desktop Web servers
  • Does the Mozilla Foundation have the vision to make Firefox the most important piece of software of this decade?
  • Web 3.0
  • Finally, the end of Microsoft's operating system dominance

Now that your hyperbole meter has pegged a few times, hopefully the rest of this will seem tame in comparison. (And apologies for the length...I got rolling and, oops, 2500 words. But many of them are small so...)

Way back in October 2004, this idea of how the Web as a platform might play out popped into my head, and I've been trying to motivate myself into writing it down ever since. Two recent events, Yahoo's purchase of Konfabulator and Google's release of a new beta version of Google Desktop have finally spurred me into action. But back to October. At the Web 2.0 conference, Stewart pulled me aside and said something like, "I think I know what Google is doing with Google Browser." From a subsequent post on his site:

I've had this post about Adam Bosworth, Alchemy and the Google browser sitting around for months now and it is driving me crazy, because I want all the credit for guessing this before it happens. So, for the record, if Google is making a browser, and if it is going to be successful, it will be because there is a sophisticated local caching framework included, and Google will provide the reference apps (replying to emails on Gmail or posting messages to Google groups while on the plane).

At the time, Adam Bosworth had been recently hired by Google for purposes unknown. In a blog post several months before he was hired, Bosworth mused about a "new browser":

In this entry, I'm going to discuss how I imagine a mobilized or web services browser handles changes and service requests when it isn't connected. This is really where the peddle hits the metal. If you just read data and never ever alter it or invoke related services (such as approving an expense report or booking a restaurant) then perhaps you might not need a new browser. Perhaps just caching pages offline would be sufficient if one added some metadata about what to cache. Jean Paoli has pointed out to me that this would be even more likely if rather than authoring your site using HTML, you authored it as XML "pages" laid out by the included XSLT stylesheets used to render it because then you could even use the browser to sort/filter the information offline. A very long time ago when I was still at Microsoft (1997) we built such a demo using XSLT and tricky use of Javascript to let the user do local client side sorting and filtering. But if you start actually trying to update trip reports, approve requests, reserve rooms, buy stocks, and so on, then you have Forms of some sort, running offline, at least some of the time, and code has to handle the inputs to the "Forms" and you have to think through how they are handled.

A couple weeks later, Google introduced the first iteration of their Desktop Search. To me, the least interesting thing about GDS was the search mechanism. Google finally had an application that installed on the desktop and, even better, it was a little Web server that could insert data from your local machine into pages you were browsing on google.com. That was a new experience: using a plain old Web browser to run applications locally and on the Web at the same time.

So this is my best guess as to how an "operating system" based on the Web (which I will refer to as "WebOS") will work. There are three main parts to the system:

  • The Web browser (along with other browser-ish applications like Konfabulator) becomes the primary application interface through which the user views content, performs services, and manages data on their local machine and on the Web, often without even knowing the difference. Something like Firefox, Safari, or IE...ideally browser agnostic.
  • Web applications of the sort we're all familiar with: Gmail, Flickr, and Bloglines, as well as other applications that are making the Web an ever richer environment for getting stuff done. (And ideally all Ajaxed up to provide an experience closer to that of traditional desktop apps.)
  • A local Web server to handle the data delivery and content display from the local machine to the browser. This local server will likely be highly optimized for its task, but would be capable of running locally installed Web applications (e.g. a local copy of Gmail and all its associated data).

That's it. Aside from the browser and the Web server, applications will be written for the WebOS and won't be specific to Windows, OS X, or Linux. This is also completely feasible, I think, for organizations like Google, Yahoo, Apple, Microsoft, or the Mozilla Foundation to make happen (more on this below).

Compared to "standalone" Web apps and desktop apps, applications developed for this hypothetical platform have some powerful advantages. Because they run in a Web browser, these applications are cross platform (assuming that whoever develops such a system develops the local Web server part of it for Windows, OS X, Linux, your mobile phone, etc.), just like Web apps such as Gmail, Basecamp, and Salesforce.com. You don't need to be on a specific machine with a specific OS...you just need a browser + local Web server to access your favorite data and apps.

For application developers, the main advantage is that instead of writing two or more programs for multiple platforms (one for the Web, one for Windows, etc.), they can write one app that will run on any machine with the WebOS using the same code base. Bloglines and NetNewsWire both do about the same thing and have radically different codebases (Bloglines uses HTML/JavaScript + some sort of backend programming/scripting language while NNW is a Cocoa app only for OS X), but a version of Bloglines developed for the above platform could utilize a single codebase.

You also get the advantages of locally run applications. You can use them when you're not connected to the Internet. There could be an icon in the Dock that fires up Gmail in your favorite browser. For applications using larger files like images, video, and audio, those files could be stored and manipulated locally instead of waiting for transfer over the Internet.

There are also disadvantages to WebOS applications, not the least of which[1] is that HTTP+JavaScript+XHTML+CSS+Flash is not as robust in providing functionality and user interaction as true desktop applications written in Cocoa or Visual Basic. But as Paul Graham points out, Web applications may be good enough[2]:

One thing that might deter you from writing Web-based applications is the lameness of Web pages as a UI. That is a problem, I admit. There were a few things we would have really liked to add to HTML and HTTP. What matters, though, is that Web pages are just good enough.

Web pages weren't designed to be a UI for applications, but they're just good enough. And for a significant number of users, software that you can use from any browser will be enough of a win in itself to outweigh any awkwardness in the UI. Maybe you can't write the best-looking spreadsheet using HTML, but you can write a spreadsheet that several people can use simultaneously from different locations without special client software, or that can incorporate live data feeds, or that can page you when certain conditions are triggered. More importantly, you can write new kinds of applications that don't even have names yet.

And how about these new kinds of applications? Here's how I would envision a few apps working on the WebOS:

  • Gmail. While online, you read your mail at gmail.com, but it also caches your mail locally so when you disconnect, you can still read it. Then when you connect again, it sends any replies you wrote offline, just like Mail.app or Outlook does. Many people already use Gmail (or Yahoo Mail) as their only email client...imagine if it worked offline as well.
  • A Web version of iTunes. Just like the desktop version of iTunes, except in the browser. Manages/plays audio files stored locally, with an option to back them up on the server (using .Mac or similar) as well. iTunes already utilizes information from the Internet so well (Web radio, podcasting iTMS, CDDB, etc.) that it's easy to imagine it as a Web app. (And why stop at audio...video would work equally as well.)
  • Flickr. Manage image files locally and on Flickr's server in the browser. You could even do some rudimentary photo manipulation (brightness, contrast, red-eye correction, etc.) in the browser using JavaScript or even Flash. Prepare a bunch of photos for uploading to Flickr while on the plane ride home and they automatically sync when you next connect to the Internet.
  • Newsreader. Read sites while offline (I bet this is #1 on any Bloglines user's wish list). Access your reading list from any computer with a browser (I bet this is #1 on any standalone newsreader user's wish list).
  • File backup. A little WebOS app that helps you back up your files to Apple's .Mac service, your ISP, or someone like Google. You'll specify what you want backed up and when through the browser and the backup program will take care of the rest.

I'm looking at the rest of the most commonly used apps on my Powerbook and there's not too many of them that absolutely need to be standalone desktop applications. Text editor, IM[3], Word, Excel, FTP, iCal, address book...I could imagine versions of these running in a browser.

So who's going to build these WebOS applications? Hopefully anyone with XHTML/JavaScript/CSS skills, but that depends on how open the platform is. And that depends on whose platform it is. Right now, there are five organizations who are or could be moving in this direction:

  • Google. If Google is not thinking in terms of the above, I will eat danah's furriest hat. They've already shifted the focus of Google Desktop with the addition of Sidebar and changing the name of the application (it used to be called Google Desktop Search...and the tagline changed from "Search your own computer" to the more general "Info when you want it, right on your desktop"). To do it properly, I think they need their own browser (with bundled Web server, of course) and they need to start writing their applications to work on OS X and Linux (Google is still a Windows company)[4]. Many of the moves they've made in the last two years have been to outflank Microsoft, and if they don't use Google Desktop's "insert local code into remote sites" trick to make whatever OS comes with people's computers increasingly irrelevant, they're stupid, stupid, stupid. Baby step: make Gmail readable offline.
  • Yahoo. I'm pretty sure Yahoo is thinking in these terms as well. That's why they bought Konfabulator: desktop presence. And Yahoo has tons of content and apps that that would like to offer on a WebOS-like platform: mail, IM, news, Yahoo360, etc. Challenge for Yahoo: widgets aren't enough...many of these applications are going to need to run in Web browsers. Advantages: Yahoo seems to be more aggressive in opening up APIs than Google...chances are if Yahoo develops a WebOS platform, we'll all get to play.
  • Microsoft. They're going to build a WebOS right into their operating system...it's likely that with Vista, you sometimes won't be able to tell when you're using desktop applications or when you're at msn.com. They'll never develop anything for OS X or for Linux (or for browsers other than IE), so its impact will be limited. (Well, limited to most of the personal computers in the world, but still.)
  • Apple. Apple has all the makings of a WebOS system right now. They've got the browser, a Web server that's installed on every machine with OS X, Dashboard, iTMS, .Mac, Spotlight, etc. All they're missing is the applications (aside from the Dashboard widgets). But like Microsoft, it's unlikely that they'll write anything for Windows or Linux, although if OS X is going to run on cheapo Intel boxes, their market share may be heading in a positive direction soon.
  • The Mozilla Foundation. This is the most unlikely option, but also the most interesting one. If Mozilla could leverage the rapidly increasing user base of Firefox and start bundling a small Web server with it, then you've got the beginnings of a WebOS that's open source and for which anyone, including Microsoft, Google, Yahoo, and anyone with JavaScript chops, could write applications. To market it, they could refer to the whole shebang as a new kind of Web browser, something that sets it apart from IE, a true "next generation" browser capable of running applications no matter where you are or what computer (or portable device) you're using.

So yeah, that's the idea of the WebOS (as I see it developing) in a gigantic nutshell. The reality of it will probably be a lot messier and take a lot longer than most would like. If someone ends up doing it, it will probably not be as open as it could be and there will likely be competing Web platforms just as there are now competing search engines, portals, widget applications (Konfabulator, Dashboard, Google Desktop Sidebar), etc., but hopefully not. There's lots more to discuss, but I'm going to stop here before this post gets even more ridiculously long. My thanks if you even made this far.

[1] Actually, the biggest potential problems with all this are the massive security concerns (a Web browser that has access to data on your local hard drive?!!!??) and managing user expectations (desktop/web app hybrids will likely be very confusing for a lot of users). Significant worries to be sure, but I believe the advantages will motivate the folks developing the platform and the applications to work through these concerns.

[2] For more discussion of Web applications, check out Adam Rifkin's post on Weblications.

[3] Rumor has it that Google is releasing an IM client soon (more here). I'll be pretty surprised if it's not significantly Web-based. As Hotmail proved for email, there's no reason that IM has to happen in a desktop app (although the alerting is problematic).

[4] Maybe Google thinks they can't compete with Apple's current offerings (Spotlight, Dashboard, Safari, iPhoto) on their own platform, but that's not a good way of thinking about it. Support as many people as you can on as many different architectures as you can, that's the advantage of a Web-based OS. Microsoft certainly hasn't thought of Apple as a serious competitor in the OS space for a long time...until Web applications started coming of age recently, Microsoft's sole competitor has been Microsoft.

Google introduces a new (beta) version ofAug 22 2005

Google introduces a new (beta) version of Google Desktop featuring Sidebar, their answer to Dashboard and Konfabulator. Here's more on Google's move from the Times, which also includes speculation on the possible release of an IM client this week.

Odd size comparison of Yahoo and GoogleAug 15 2005

Odd size comparison of Yahoo and Google indices. I think their assumption (that a "series of random searches to both search engines should return more than twice as many results from Yahoo! than Google") is pretty flawed. The number of returned results could vary because of the sites' different optimizations for dictionary words, for searches with small result sets, and differences in how their search algorithms include or exclude relevant results. Put it this way: if I'm looking for a frying pan in my apartment, I'm gonna refine my search to the kitchen and not worry about the rest of the house, no matter how large it is. (via /.)

David Filo and Jerry Yang are organizingJul 26 2005

David Filo and Jerry Yang are organizing the entire WWW into a hierarchical category system. They've named their site "Yahoo".

As We May Think by Vannevar BushJul 20 2005

As We May Think by Vannevar Bush. This influential essay that introduces Bush's Memex concept was published 60 years ago this month.

Google images search for before and after photosJun 27 2005

Google images search for before and after photos.

Search engines still aren't that good at Trivial PursuitJun 24 2005

Search engines still aren't that good at Trivial Pursuit.

Nokia.com comes up first in aJun 19 2005

Nokia.com comes up first in a Google search for "motorola mobile phones". I suspect it's because Motorola's site isn't optimized for Google (lots of Flash, little text) and a difference in usage: it's "cell phones" in the US versus "mobile phones" in Europe (where Nokia is from).

Google now offering customized home page, withMay 20 2005

Google now offering customized home page, with weather, stocks, movies, etc..

kottke.org

Front page
About + contact
Site archives

Subscribe

Follow kottke.org on Twitter

Follow kottke.org on Tumblr

Like kottke.org on Facebook

Subscribe to the RSS feed

Advertisement

Ads by The Deck

Support kottke.org shop at Amazon

And more at Amazon.com

Looking for work?

More at We Work Remotely

Kottke @ Quarterly

Subscribe to Quarterly and get a real-life mailing from Jason every three months.

 

Enginehosting

Hosting provided EngineHosting