Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques

Principal Investigator: 

Miles Efron (
Graduate School of Library and Information Science
501 E. Daniel St. Champaign, IL 61820

This project is funded by a Google Digital Humanities Award.  The work aims to improve peoples' ability to find information in large collections of books, such as the collection created by the Google Books project. 

In particular, we are focusing on historical language change.  Google Books contains millions of books in English.  But English is a moving target.  Fourteenth-Century vernacular is very different from its 20th-Century counterpart.  Thus a query issued in modern English will fail to find related middle English documents.  People researching the history of a proverb such as many hands make light work or finding literary allusions to the Shield of Achilles (a common example of ekphrasis, a poetic trope) can find historically diverse passages only by issuing queries in many forms and styles.

To improve on this situation, we are using cross-language information retrieval models to inform the problem of retrieving passages from historically diverse corpora.  The primary goal of this project is to posit statistical models (and build software that instantiates them) that allow a single query to retrieval relevant information in documents from a wide variety of English historical periods.