Meeting the Challenge of Language Change in Text Retrieval with
Machine Translation Techniques
Miles Efron (http://people.lis.illinois.edu/~mefron)
Graduate School of Library and Information Science
501 E. Daniel St. Champaign, IL 61820
This project is funded by a Google
Digital Humanities Award. The work aims to improve peoples'
ability to find information in large collections of books, such as the
collection created by the Google
In particular, we are focusing on historical language change.
Google Books contains millions of books in English. But English
is a moving target. Fourteenth-Century vernacular is very
different from its 20th-Century counterpart. Thus a query issued
in modern English will fail to find related middle English
documents. People researching the history of a proverb such as many hands make light work or
finding literary allusions to the Shield of Achilles (a common example
of ekphrasis, a poetic trope) can find historically diverse passages
only by issuing queries in many forms and styles.
To improve on this situation, we are using cross-language information
retrieval models to inform the problem of retrieving passages from
historically diverse corpora. The primary goal of this project is
to posit statistical models (and build software that instantiates them)
that allow a single query to retrieval relevant information in
documents from a wide variety of English historical periods.