The DIL has teamed up with William L. Andrews, foremost expert on slave narratives, to transform UNC’s Documenting the American South North American Slave Narratives digital collection into a searchable database of people and places. The database would enable search and analyses within and across narratives and support spatial and network visualizations. This approach would provide additional layers of exploration for scholars and students and could assist genealogists and descendants of slaves interested in tracing their family histories.
During the 2012-2013 academic year, our team employed automation in combination with crowd-sourcing refinement to develop a complete set of names and places from a small subset of the 275 narratives. We automatically extracted words that might be reasonably considered proper names and locations (capitalized words). From there, a team of undergraduate students worked to verify whether a word captured by the parser was or was not a proper name of an actual person or a location.
Then DIL staff member Chien-Yi Hou built a unique interface to allow students to go through each narrative and confirm how each instance of a word should be treated. Every narrative was verified by two students; where there was agreement between the two students, we assumed the word was a name/place. In cases of disagreement, we relied on scholarly expertise for arbitration.
After a semester hiatus, we resumed work on the project in Spring 2014. That semester, we focused our efforts on improving text mining (via Natural Language Processing) by developing syntactical rules for training our parser. We also experimented with extracting other information, including sentiment/affect and social roles. Working with Annie Chen, a PhD student at UNC’s School of Information and Library Science, we began developing an interface for visualizing affect and social roles. We conducted a user study of university faculty and NC K-12 teachers to begin testing the interface.
During the current academic year (2014-2015), we are picking up our work on Named Entity Recognition (NER) to develop an overarching index of people and places. We are using human annotators to verify and improve the NER. At least two annotators will review the work of the parser; inter-rater reliability will help us to refine the text extraction process.
Annotators will create a list of names and synonyms (for instance, Solomon Northup is also called Platt), and assign social roles to each individual (such as enslaved person, free person of color, etc.). The results will be visualized in Annie’s interface, which she is continuing to enhance, improve usability, and hopefully make available for more generalized use.
As we work to refine the text extraction, we will keep experimenting and exploring ways of visualizing the collection, based on affect, social roles, networks, and topic modeling. Categorizing and visualizing characters by their social roles, in addition to displaying affect patterns in an interactive interface, will help users see historical or literary trends across the entire corpus, select passages to assign in classes, or focus their close readings of specific passages. We would also like to map the places mentioned across the collection to see if we can discern geographic patterns across the corpus.
Want to play with the collection now? Check out DocSouth Data, a new initiative by UNC Libraries to make the content of DocSouth more usable.