Pages

Labels

Saturday, October 15, 2005

New ways to get lost in Amazon.

Have you noticed the new features in Amazon? I haven't seen any announcements on their main page, but if you go to an individual book, now, you can find all kinds of new, cool things. I'm not finding it for every book, and it seems as though it wouldn't be possible for books that are not open to the "search inside" function. But take, for example, "Dress Your Family in Corduroy and Denim." Let's look at the SIPs -- or "statistically improbable phrases":
Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book.

SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements.
For "Dress Your Family," they've only come up with: "eight black men." If you've read the book, you know what that refers to. If you click on the phrase, you get all the other books with that phrase: here. Useful? Possibly not in this case, though still interesting, in a rather random way.

Then there are the CAPs (capitalized phrases):
Aunt Monie, Monie Changes Everything, Baby Einstein, The Girl Next Door, The Ship Shape, Full House, Nuit of the Living Dead, Blood Work, North Carolina, Slumus Lordicus, Kwik Pik, Great Dane, Puta Lid, Saint Nicholas, Anne Frank, The End of the Affair, Royal Pavilion, Who's the Chef, Apple Pan, The Empire
Each is clickable. You can then see other books that frequently use that capitalized phrase. Well, no one else is using Slumus Lordicus yet, but here are the Anne Frank references.

Then there's the concordance:
Concordance is an alphabetized list of the most frequently occurring words in a book, excluding common words such as "of" and "it." The font size of a word is proportional to the number of times it occurs in the book. Hover your mouse over a word to see how many times it occurs, or click on a word to see a list of book excerpts containing that word.
There are also "text stats," showing you how easy or hard the book is to read. Sedaris, it turns out, is awfully easy to read -- 7th grade level. Doesn't say how funny he is, though.

Let me check another, similarly funny, but much darker book I like, "Running With Scissors." Hey, that's even easier to read! Let's try "The Curious Incident of the Dog in the Night-Time." Oh, that's easy too. Hmmmm.... that's got a SIP of "bloody dog," so who else is SIP-ing "bloody dog"?

James Joyce! All right, then. Let's get the text stats for "Ulysses." I see that's easy to read too, so they say. According to the Flesch-Kinkaid analysis, it's written at less than a 7th grade level, so if you're not ready to tackle "Dress Your Family in Corduroy and Denim"...

UPDATE: You can check text stats for blogs at this website. In case you're wondering, this blog has the following numbers:
Gunning Fog Index 10.31
Flesch Reading Ease 66.29 (higher is easier, with 100 being the easiest)
Flesch-Kincaid Grade 7.13
The numbers for "Ulysses" are: 9.0, 68.1, 6.8. Do you find this confusing? Generally, I think it's a good sign if your ease-of-reading stats seem low for the difficulty of the material.

0 comments:

Post a Comment