Advertise here with Carbon Ads

This site is made possible by member support. ❤️

Big thanks to Arcustech for hosting the site and offering amazing tech support.

When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!

kottke.org. home of fine hypertext products since 1998.

🍔  💀  📸  😭  🕳️  🤠  🎬  🥔

kottke.org posts about google

Blog search still sucks (a little)

Update: I fucked up on this post and you should reread it if you’ve read it before. After reading this post by Niall Kennedy, I checked and found that I have mentioned or linked to the site for Freakonomics 5 times (1 2 3 4 5), not 13. The other 8 times, I either linked to a post on the Freakonomics blog that was unrelated to the book, had the entry tagged with “freakonomics” (tags are not yet exposed on my site and can’t be crawled by search engines), or I used the word “Freakonomists”, not “Freakonomics”. Bottom line: the NY Times listing is still incorrect, Google and Yahoo picked up all the posts where I actually mentioned “Freakonomics” in the text of the post but missed the 2 links to freakonomics.com, Google Blog Search got 2/3 (& missed the 2 links), Technorati got 1/3 (& missed the 2 links), and IceRocket, Yahoo Blog Search, BlogPulse, & Bloglines whiffed entirely. Steven Levitt would be very disappointed in my statistical fact-checking skills right now. :(

I wish Niall had emailed me about this instead of posting it on his site, but I guess that’s how weblogs work, airing dirty laundry instead of trying to get it clean. Fair enough…I’ve publicly complained about the company he works for (Technorati) instead of emailing someone at the company about my concerns, so maybe he had a right to hit back. Perhaps a little juvenile on both our parts, I’d say. (Oh, and I turned off the MT search thing that Niall used to check my work. I’m not upset he used it, but I’m irritated that it seems to be on by default in MT…I never intended for that search interface to be public.)

———

The NY Times recently released their list of the most blogged about books of 2005. Their methodology in compiling the list:

This list links to a selection of Web posts that discuss some of the books most frequently mentioned by bloggers in 2005. The books were selected by conducting an automated survey of 5,000 of the most-trafficked blogs.

Unsurprisingly, the top spot on the list went to Freakonomics. I remembered mentioning the book several times on my site (including this interview with author Steven Levitt around the release of the book), so I checked out the citations they had listed for it. According to the Times, Freakonomics was cited by 125 blogs, but not once by kottke.org, a site that by any measure is one of the most-visited blogs out there.[1] A quick search in my installation of Movable Type yielded 13 5 mentions of the book on kottke.org in the last 9 months. I had also mentioned Blink, Harry Potter, Getting Things Done, Collapse, The Wisdom of Crowds, The Singularity is Near, and State of Fear, all of which appear in the top 20 of the Times’ list and none of which are cited by the Times as having been mentioned on kottke.org in 2005.

I chalked this up to a simple error of omission, but then I started checking around some more. Google’s main index returned only three distinct mentions of Freakonomics on kottke.org. Google Blog Search returned two results. Yahoo: 3 results (0 results on Yahoo’s blog search). Technorati only found one result (I’m not surprised). Many of the blog search services don’t even let you search by site, so IceRocket, BlogPulse, and Bloglines were of no help. (See above for corrections.) I don’t know where the Times got their book statistics from, but it was probably from one of these sites (or a similar service).

Granted this is just one weblog[2], which I only checked into because I’m the author, but it’s not like kottke.org is hard to find or crawl. The markup is pretty good [3], fairly semantic, and hasn’t changed too much for the past two years. The subject in question is not off-topic…I post about books all the time. And it’s one of the more visible weblogs out there…lots of links in to the front page and specific posts and a Google PR of 8. So, my point here is not “how dare the Times ignore my popular and important site!!!” but is that the continuing overall suckiness of searching blogs is kind of amazing and embarrassing given the seemingly monumental resources being applied to the task. It’s forgivable that the Times would not have it exactly right (especially if they’re doing the crawling themselves), but when companies like Technorati and Google are setting themselves up as authorities on how large the blogosphere is, what books and movies people are reading/watching, and what the hot topics online are but can’t properly catalogue the most obvious information out there, you’ve got to wonder a) how good their data really is, and b) if what they are telling us is actually true.

[1] Full disclosure: I am the author of kottke.org.

[2] This is an important point…these observations are obviously a starting point for more research about this. But this one hole is pretty gaping and fits well with what I’ve observed over the past several months trying to find information on blogs using search engines.

[3] I say only pretty good because it’s not validating right now because of entity and illegal character errors, which I obviously need to wrestle with MT to correct at some point. But the underlying markup is solid.


Google search for “i don’t read kottke”

Google search for “i don’t read kottke” versus a search for “i don’t read boing boing”. Nottke** wins, 39 to 37! Sit on it, Cory!

** Nottke = not Kottke, coinage by John Gruber.


Hmm, this sounds like fun, an API

Hmm, this sounds like fun, an API for the Google homepage.


Wow, an interactive transit map for NYC.

Wow, an interactive transit map for NYC. I haven’t kept up with all the Google/Yahoo Maps subway mashups, but this one is pretty impressive. Click start and end points and it tells you which subway to board and how long the trip will take, including walking time.


Yahoo! buys del.icio.us…muxway is

Yahoo! buys del.icio.us…muxway is all growed up. There’s an interesting story in here somewhere about how Yahoo! is hiring/buying the “alpha geeks” (hackers, tinkerers, accidental entrepreneurs) and Google seemingly isn’t (Ph.Ds, computer scientists) and what effect that could have on each company’s development.


Seven key principles that Google uses to

Seven key principles that Google uses to make their employees more effective. “At Google, the role of the manager is that of an aggregator of viewpoints, not the dictator of decisions.”


Google Desktop 2 is out of beta. This

Google Desktop 2 is out of beta. This release includes new sidebar panels and support for scriptable plug-ins. No Mac version yet.


New version of Yahoo Maps catches up

New version of Yahoo Maps catches up to Google Maps and does them one or two better. Quite the homage, though. (via df)


George Dyson visits Google on the 60th

George Dyson visits Google on the 60th anniversary of John von Neumann’s proposal for a digital computer. A quote from a Googler — “We are not scanning all those books to be read by people. We are scanning them to be read by an AI.” — highlights a quasi-philosophical question about Google Print…if a book is copied but nobody reads it, has it actually been copied? (Or something like that.)


Nerdy Halloween costumes alert: Ricky dressed as

Nerdy Halloween costumes alert: Ricky dressed as Google Image Search. I know someone out there is planning their Web 2.0 or folksonomy costume. Let’s see it!


Google is launching something called Google Base

Google is launching something called Google Base soon…an open web database type thingie. From what little info there is, this sounds very cool. (via waxy)


Merlin is collecting funny eBay ads from

Merlin is collecting funny eBay ads from Google. “Looking for Handjob? Find exactly what you want today. www.eBay.com”. Dictionary.com used to have Amazon ads tied to search terms that would say things like “Buy crack cocaine at Amazon” or “Buy hookers at Amazon”. I for one welcome our new robot marketing overlords.


Parable about Google’s Library Project and copyright (

Parable about Google’s Library Project and copyright (discussed here last week). “All I have to do is borrow the CDs or DVDs, downloaded music or video or whatever, copy them, and then offer some sort of ‘fair use’ excerpt index service, just like Google is doing with the books. It’s the perfect gimmick.”


Book author to her publishing company: your lawsuit is not helping me or my book

I got an email this morning from a kottke.org reader, Meghann Marco. She’s an author and struggling to get her book out into the hands of people who might be interested in reading it. To that end, she asked her publisher, Simon & Schuster, to put her book up on Google Print so it could be found, and they refused. Now they’re suing Google over Google Print, claiming copyright infringement. Meghann is not too happy with this development:

Kinda sucks for me, because not that many people know about my book and this might help them find out about it. I fail to see what the harm is in Google indexing a book and helping people find it. Anyone can read my book for free by going to the library anyway.

In case you guys haven’t noticed, books don’t have marketing like TV and Movies do. There are no commercials for books, this website isn’t produced by my publisher. Books are driven by word of mouth. A book that doesn’t get good word of mouth will fail and go out of print.

Personally, I hope that won’t happen to my book, but there is a chance that it will. I think the majority of authors would benefit from something like Google Print.

She has also sent a letter of support to Google which includes this great anecdote:

Someone asked me recently, “Meghann, how can you say you don’t mind people reading parts of your book for free? What if someone xeroxed your book and was handing it out for free on street corners?”

I replied, “Well, it seems to be working for Jesus.”

And here’s an excerpt of the email that Meghann sent me (edited very slightly):

I’m a book author. My publisher is suing Google Print and that bothers me. I’d asked for my book to be included, because gosh it’s so hard to get people to read a book.

Getting people to read a book is like putting a cat in a box. Especially for someone like me, who was an intern when she got her book deal. It’s not like I have money for groceries, let alone a publicist.

I feel like I’m yelling and no one is listening. Being an author can really suck sometimes. For all I know speaking up is going to get me blacklisted and no one will ever want to publish another one of my books again. I hope not though.

[My book is] called ‘Field Guide to the Apocalypse’ It’s very funny and doesn’t suck. I worked really hard on it. It would be nice if people read it before it went out of print.

As Tim O’Reilly, Eric Schmidt, and Google have argued, I think these lawsuits against Google are a stupid (and legally untenable) move on the part of the publishing industry. I know a fair number of kottke.org readers have published books…what’s your take on the situation? Does Google Print (as well as Amazon “Search Inside the Book” feature) hurt or help you as an author? Do you want your publishing company suing Google on your behalf?


Investing is risky?

From a Washington Post article about google.org, Google’s philanthropic effort:

Shareholder activists said Google’s charitable commitment raises questions about whether this is an appropriate use of company cash or whether company founders Sergey Brin and Larry Page ought to make donations to their favorite causes personally. The foundation of Bill Gates, the founder and chairman of Microsoft Corp. and the nation’s richest person according to Forbes, gave away more than a billion dollars last year to fight poverty, hunger and disease around the world. But Gates donates through a personal foundation, rather than through Microsoft itself.

“The board of directors should make it clear to the company’s founders what should be personal and what should be corporate,” said Patrick S. McGurn, special counsel to Institutional Shareholder Services Inc. “Google is spending shareholders’ money, and it raises questions if there is not a valid corporate purpose.”

Shareholder activists? You’ve got to be kidding me. You’d think that stock shareholders are a bunch of babies that need their noses wiped and hands held to go potty or something. If you don’t want to support Google’s philanthropic efforts and think that they’re throwing your money away by doing so, there’s an easy way to opt out: DON’T BUY GOOGLE STOCK. It’s a free country and open market…vote with your money on what you think is a “valid corporate purpose”. There are thousands of other companies to invest in that are doing other things, many of which operate exactly the same…nice and safe and by the book. The information on what these companies are doing with their shareholders’ money is freely available…get informed about what you’re buying. Given their P/E ratio, unique corporate approach, and incredible rate of growth, Google might just be the riskiest large-cap stock opportunity out there, but the potential upside (as well as the downside) is a lot greater than all of those companies playing it safe. As long as it’s stated (and I believe Google certainly has made their views very clear), risk isn’t something from which shareholders should be warned away.


VGMap is a library developed at Eyebeam

VGMap is a library developed at Eyebeam that lets you overlay arbitrary data and graphics onto Google Maps with Flash. Since you can dump anything you want into a Flash movie, you’re free to annotate Google Maps with anything you want, from audio clips to banner ads of businesses. As an example, they’ve overlayed the NYC subway onto a map of Manhattan.


Google Reader is Google’s RSS/Atom reader.

Google Reader is Google’s RSS/Atom reader.


Working offline

Back when I wrote about how a WebOS might work (basically XHTML/JS web apps that run on the desktop as well), I got a lot of responses along the lines of: with internet access becoming more ubiquitous (broadband, wifi, wireless broadband, WiMax, etc.), there will be less and less need for applications that don’t need a connection to the network to function. When you can literally get a fast, cheap internet connection anywhere, you don’t need a version of Gmail that works offline and so that’s not going to drive the development of this WebOS thing you’re talking about.

I’ve been thinking for several weeks about why I think that’s wrong and I’ve come up with a couple ideas.

1. Fast, cheap internet everywhere? Hoo boy, wake me when that happens…you’ll likely find me driving my hydrogen-powered hovercar with ESP to my paperless office.

2. For many people, the more you get used to having access to your applications/data/etc., the more important that access becomes. Let’s say 98% of the applications you use are entirely on the web (with no offline capabilities) and you’re online almost all the time wherever you go. Then the network winks out for 1/2 an hour. Or Salesforce.com is down for a couple hours. That last little inch is going to be painful. And no use telling me that sounds insane because I’ve seen the madness and fear in people’s eyes while they clutch their Crackberries, furiously reading email mere minutes away from the office and the full-speed, full-screen experience.

3. The offline thing is a good way for companies to bootstrap the WebOS. I think most people have a sense that the apps they use in their browser are more alive, more social, more connected, even if they can’t articulate that feeling. And whether it’s true or not (Gmail isn’t actually more “connected” than Outlook), companies can market the “aliveness” of their web apps (even when they run offline) versus the “deadness” of desktop apps.


Ning is a platform on which you

Ning is a platform on which you can build your own social software…your own craigslist or del.icio.us. We were just talking about something like this the other day at Eyebeam, a MMORPG in which you write applications to adventure together or fight each other in a world instead of characters. Google, Yahoo, and Microsoft should be kicking themselves that they didn’t think of this…this is the perfect WebOS app, like Dashboard, Konfabulator, and Desktop, but multi-user and on the web. (via waxy)


Google and NASA have announced plans to

Google and NASA have announced plans to collaborate on projects like “large-scale data management, massively distributed computing, bio-info-nano convergence, and encouragement of the entrepreneurial space industry”. In 6 months, Yahoo will announce a collaboration with the Russian Space Agency to launch original content into space. Microsoft will announce in a year that they’ve had space travel capabilities built into Office for years now but no one uses it…in two years time, they’ll completely reorg around manned missions to Mars.


Tim O’Reilly op-ed about the Authors Guild’s

Tim O’Reilly op-ed about the Authors Guild’s lawsuit against Google regarding their Library Project. “Obscurity is a far greater threat to authors than copyright infringement, or even outright piracy”. The op-ed follows Tim’s earlier post on the subject.


Profile of Google’s Marissa Mayer, Google’s answer

Profile of Google’s Marissa Mayer, Google’s answer to Apple’s Jonathan Ive. She grew up about 100 miles from me in northern WI.


Cool Google Maps photo of the taxi

Cool Google Maps photo of the taxi lot at JFK. (via new york hack)


Neat article on Charlie Ayers, Google’s former

Neat article on Charlie Ayers, Google’s former chef, and his future plans to open his own eco-aware restaurant.


Google finally launches a blog search service.

Google finally launches a blog search service. The default search is by relevance, which I’m not sure is correct, and it’s pretty bare bones so far, but I’m sure that many other people will be saying so long, Technorati. Also available in Blogger flavor. (via waxy)


Dan Gillmor on Google’s unnecessary arrogance. I

Dan Gillmor on Google’s unnecessary arrogance. I believe some of what people call Google’s arrogance isn’t that at all, but they are still a deeply weird company.


Robert Cringely: Google may have peaked (“What

Robert Cringely: Google may have peaked (“What if search and PageRank and AdSense are Google’s corporate apex?”) and Microsoft may have more to worry about from Apple if they start distributing older versions of OSX (the Intel version) for free on iPods.


The NY Times takes Google to task

The NY Times takes Google to task for blacklisting Cnet over them publishing some publicly available information about Google CEO Eric Schmidt. I wonder if there’s a useful distinction to be made between implicitly available information and explicitly broadcast information?


News.com ruminates about Google building a

News.com ruminates about Google building a collection of tools that serve as a replacement OS. Where have we heard that recently? You’re welcome for the story idea and thanks for the non-link, guys…tech journalism at its finest. I hereby institute a policy of not linking to you for a year.
Michael via email: “please tell me you were kidding”. Well, mostly yes, particularly about the no link policy thing (it’s actually going to be two years).


These are the people in my (Web) neighborhood

In reaction to some ads of questionable value being placed on some of O’Reilly’s sites (response from Tim O’Reilly), Greg Yardley has written a thoughtful piece on selling PageRank called I am not responsible for making Google better:

Google, Yahoo, Microsoft and the other big search engine companies aren’t public utilities - they’re money-making, for-profit enterprises. It’s time to stop thinking of search engines as a common resource to be nurtured, and start thinking of them as just another business to compete with or cooperate with as best suits your individual needs.

I love the idea that after more than 10 years of serious corporate interest in the Web that it’s still up to all of us and our individual decisions. The search engines in particular are based on our collective action; they watch and record the trails left as we scatter the Web with our thoughts, commerce, conversations, and connections.

Me? I tend to think I need Google to be as good a search engine as it can be and if I can help in some small way, I’m going to. As corny as it sounds, I tend to think of the sites I frequent as my neighborhood. If the barista at Starbucks is sick for a day, I’m not going to jump behind the counter and start making lattes, but if there’s a bit of litter on the stoop of the restaurant on the corner, I might stop to pick it up. Or if I see some punk slipping a candy bar into his pocket at the deli, I may alert the owner because, well, why should I be paying for that guy’s free candy bar every time I stop in for a soda?

Sure those small actions help those particular businesses, but they also benefit the neighborhood as a whole and, more importantly, the neighborhood residents. If I were the owner of a business like O’Reilly Media, I’d be concerned about making Google or Yahoo less useful because that would make it harder for my employees and customers to find what they’re looking for (including, perhaps, O’Reilly products and services). As Greg said, the Web is still largely what we make of it, so why not make it a good Web?