Last month it was revealed that JK Rowling of Harry Potter fame recently published the crime novel The Cuckoo’s Calling under the pseudonym Robert Galbraith. How’d she get outed? Turns out it was a tip-off from her law firm, but with just an anonymous tip there wasn’t much to go on. Time to call in the language experts. Ben Zimmer at the WSJ has a good general report but the real meat of things is in a post that computer science professor Patrick Juola wrote for the Language Log blog.
I was given e-text copies of Cuckoo to compare against Rowling’s own The Casual Vacancy, Ruth Rendell’s The St. Zita Society, P.D. James’ The Private Patient and Val McDermid’s The Wire in the Blood. […]
I actually ran four separate types of analyses focusing on four different linguistic variables. While anything can in theory be an informative variable, my work focuses on variables that are easy to compute and that generate a lot of data from a given passage of language. One variable that I used, for example, is the distribution of word lengths. Each novel has a lot of words, each word has a length, and so one can get a robust vector of
% of the words in this document have exactly letters. Using a distance formula (for the mathematically minded, I used the normalized cosine distance formula instead of the more traditional Euclidean distance you remember from high school), I was able to get a measurement of similarity, with 0.0 being identity and progressively higher numbers being greater dissimilarity.
Of the 11 sections of Cuckoo, six were closest (in distribution of word lengths) to Rowling, five to James. No one else got a mention. […]
Does this prove that Rowling wrote Cuckoo? Of course not. All it really “proves” — suggests, rather — is that out of the four authors studied, the most likely candidate author is probably Rowling.