DIL Publishes “Big Historical Data” Feature Extraction

Increase font size
Decrease font size

Digital Innovation Lab members Richard Marciano, Bobby Allen, Chien-Yi Hou and Pam Lach have published “Big Historical Data,” a Feature Extraction in the most recent issue of the Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives, Volume 9, Issue 1-2, 2013).


In the 1930s, the Home Owners’ Loan Corporation (HOLC), a New Deal federal agency, surveyed hundreds of U.S. cities, producing a national map collection that documented the demographic, economic, infrastructural, and ethnic status of tens of thousands of neighborhoods across the country. The resulting collection of so-called redlining maps is one of the preeminent urban and racial surveys conducted in the history of the United States. We at the Digital Innovation Lab of the University of North Carolina at Chapel Hill are building a national digital collection of these paper maps, currently housed at the National Archives in Washington D.C., and using this collection to explore the use of semiautomated feature extraction techniques on large historical content. Our methodology is based on supervised, classification image processing techniques. We use a commercial tool called ArcScan, an extension to the popular ESRI ArcGIS software, to extract tens of thousands of neighborhood boundaries that can then be saved as vector overlays and used to drive the development of new types of research interfaces. We conclude the paper with examples of these new types of interfaces. Finally, we describe the potential impact of linking vectorized national collections together and the need for further research in this area, including using hybrid approaches that involve large-scale crowdsourcing.

Access to the article is available through UNC Library’s ejournal subscription.