Friday, April 28, 2006

Context clues

In the "Inside Dictionaries" session at ACES, Erin McKean spent a lot of time talking about the importance of the corpus, a huge electronic databank of writing and speech that helps identify -- in context, not just how words are used.

The contents of the Oxford English Corpus hit 1 billion words Wednesday, and the Associated Press (through ASAP) has an explainer on it.

McKean is her usual charming self. On additions to the corpus:
"We're always happy to hear from people who want to have their text in," says Erin McKean, editor-in-chief for American Dictionaries for Oxford University Press. "It's like volunteering your body for science, only you don't have to die."
And the example du jour was not asshat; it was pre-game, "to drink a lot of alcohol before an event where no alcohol will be served" (or where it will be too expensive, I'd add).

I often hear people bagging on the ASAP concept, but let's say this for the service: The story didn't have much of headline on it, but what was there wasn't wrong. The same can't be said for the regular AP headline: English Language Hits 1 Billion Words. (Perhaps we can blame the hoopla over the millionth-word claim?) Yes, there are a billion words in the corpus, but most of them repeat. As McKean points out, the word the is there 50 million times itself.

Read more about the corpus here.
Read more about bad (and good) corpus headlines here.


Post a Comment

<< Home