A National Endowment for the Humanities (NEH) grant will make many of the first books printed in the Americas available for the first time in digital full-text format, thanks to innovations in optical character recognition (OCR) technology.

The University of Texas at Austin is one of six recipients of a Digital Humanities Implementation Grant award from the NEH. The grant of $215,000 will fund “Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros,” a project to extend the capabilities of current open-source OCR technology for use in the transcription of 16th-century texts. LLILAS Benson Latin American Studies and Collections will administer the grant as part of its new Digital Scholarship program.

The tool developed under the project will be used to produce transcriptions of the digitized books in the Primeros Libros de las Américas collection, which currently includes over 330 copies of books printed in the Americas before 1601. Books in the collection include text in Spanish, Latin and several indigenous Latin American languages, including Nahuatl, once spoken by the Aztecs and still spoken by some 1.5 million people. UT Libraries and the Benson Latin American Collection are founding members of the international Primeros Libros consortium, which currently has over 20 member libraries from throughout the Americas and Europe.

The ability of scholars and students to work with ancient texts in digital form has been limited by the challenges of transcribing early-modern books. Printed long ago, they contain variable typefaces, typesetting, spelling and multilingual text that is not recognized by conventional OCR software. The goal of this project is to develop and implement groundbreaking methods in the automatic transcription of such books. This will help scholars shine a light on a period of historical transition from oral culture to the rise of literacy and the birth of the scientific method.

 

Read more here.

 

Sign up to our newsletter: