ARI researcher awarded NEH Digital Humanities Advancement grant


ARI researcher awarded NEH Digital Humanities Advancement grant 

Story by Alexandra Goodman

According to Dr. Ana Lučić, Staff Research Scientist at the Illinois Applied Research Institute in the Algorithms and Software group, everything we read, regardless of the medium through which we access the text, is likely to contains peritext, the material that frames the main body of what we are reading.

Despite advances in technology, Lučić notes, “At the end of the day, our reading practices have not changed that much, we are still accessing text and we still have this extraneous information that frames the content we are reading. This can be the number of views an article had, the advertisement that we see as we browse content on the internet, the index at the end of the physical copy of the book, or the illustration and texture of the hardcover jacket of the book we are reading.” The problem Lučić would like to tackle is how to detect this peritext automatically and differentiate it from the core body of the work.

“Our reading practices have not changed that much, we are still accessing text and we still have this extraneous information that frames the content we are reading. ”

Dr. Ana Lučić, ARI

In January 2024, Lučić was awarded a Digital Humanities Advancement grant from the National Endowment for the Humanities to develop a method to computationally identify these peritextual elements within the digitized book collections in the HathiTrust Research Library. These peritextual elements in the digitized books include publisher-developed material outside of the main text, such as an introduction, a table of contents, and a bibliography.

In addition to Lučić at ARI, the project team includes two consultants, Professor John Shanahan at DePaul University in Chicago and Professor Robin Burke at the University of Colorado, Boulder, as well as a four-person advisory board: Professor J. Stephen Downie, HathiTrust Research Center; Glen Layne-Worthey, HathiTrust Research Center; Prof. Peter Organisciak, University of Denver; and Amy Kirchhoff, JSTOR/Ithaka.

The awarded project will seek to develop a supervised machine learning model to classify the core text and peritext of full-text digitized works in the HathiTrust Digital Library. The model will rely on about 1,000 works from the library for which peritext boundaries have been established.

The team will investigate whether this method of identifying textual elements could supplement existing metadata that describe a library holding. The project also seeks to represent the structure of each work in visualizations that will accompany the HathiTrust volumes, so that users can intuit the structure and the amount of peritext in a given volume.

This project expands upon previous work that developed out of Reading Chicago Reading, an undertaking based out of DePaul University in which Lučić participated. RCR is another NEH-funded project, in which researchers analyze books selected as part of the citywide public library program One Book, One Chicago and other associated data and built a predictive computer model based on their findings.

As part of that earlier project, researchers manually identified the core text and peritextual elements of almost 80 books, which Lučić said was tedious, time-consuming work. That inspired the research team to wonder if these elements could be separated in an easier, automated way.

Lučić said the results of this new NEH project could benefit anyone whose research involves textual analysis, particularly at scale using large repositories. She hopes this model can be applicable to different repositories, with minor modifications to be made as needed for different features of works and varying collections.

Automating the process of identifying those peritextual elements makes the work not only easier, but also could lead to potentially more accurate findings for future researchers.

“[Publishers] can add a lot of extra information that can skew your analysis and can steer it potentially in a different direction,” she said.

The visualization the team would develop would make trends in peritext easier to identify and assist users in making such information more accessible.