P3: Connecting People, Place, Past explores the intersection of people with places in the past by harnessing big historical data sets. P³ is a project intended to develop a software platform for harvesting and spatializing historical data from the most comprehensive and publicly-available sources of information about everyday life in early 20th century America: city directories, urban ground plans, newspapers, and census enumerations. Funding for the pilot year of P³ (2011-2012) was provided by an Interdisciplinary Initiative Grant from the UNC-CH College of Arts and Sciences.
During the 2011-2012 academic year, the Lab explored techniques for automating the extraction of data from city directories published across North Carolina. City directories are rich historical resources that provide valuable details about the individuals and businesses in a community. Hundreds of late-19th and early 20th century N.C. city directories from the North Carolina Collection of the University Library have been digitized and published by the Internet Archive. City directories contain alphabetical lists of the residents for a given town, along with their addresses and occupations. Many city directories published after 1900 also contain a street index, listing the residents of each street in street address order. In the southern U.S., city directories were racially coded until the 1950s, making it possible to identify African-American residents, businesses, and neighborhoods. Read more about our work on harvesting data from city directories. Back to top
P³ is predicated on the belief that the value of data sources such as city directories is significantly increased by spatializing (mapping) individual data points, particularly when they can be visualized in relation to maps that reflect the historical moment of their production, and, hence, the historical period during which the represented individuals lived and the businesses at which they worked operated. More than 4000 Sanborn Fire Insurance maps have been digitized and published online by the North Carolina Collection. They show the built environment of more than 100 towns and cities in North Carolina, as these urban spaces developed between 1880 and 1920. Produced at roughly five year intervals, Sanborn maps represent every built structure of each city’s “downtown” business district, residential neighborhoods around the downtown, and outlying industrial neighborhoods (including many mill villages in North Carolina).
The promise of “recombinant mashups” of historical data sources such as city directories and Sanborn maps was suggested by Charlotte, 1911, a Main Street, Carolina project developed in partnership with Dr. Tom Hanchett and the Levine Museum of the New South. More than 4000 residential and business listings from the 1911 Charlotte, North Carolina, city directory were manually extracted, georeferenced (assigned longitude/latitude coordinates), and tagged. They can be searched and displayed as place markers on the digitized, stitched, and georeferenced 1911 Sanborn Fire Insurance Map for Charlotte. The visualization of the directory in this way clearly shows what Hanchett has called the “sorting out” of Charlotte by race and class by 1911. It also allows the user to see patterns within and across neighborhoods that would be difficult if not impossible to discern if the two data sources were examined separately.
These data sources are not only available for cities and towns in North Carolina, but for tens of thousands of communities across the U.S. City directories were produced for nearly every “city” (in 1900 the U.S. Census Bureau defined city as any incorporated settlement of 2500 or more residents), and more than 500,000 Sanborn Map pages for more than 12,000 “cities” were published. For projects such as Charlotte, 1911 to be scalable, however, the process of harvesting, organizing, and georeferencing individual listings from city directories needs to be at least partially automated. Back to top
The increasing availability of historical newspapers in digital form offers the opportunity to add yet another data source to this recombinant mix. We digitized all of the issues of a daily Charlotte newspaper published in 1911. Students in Robert Allen’s Spring Term 2012 offering of “Main Street, Carolina” (AMST 350) tested what happens when these three data sources come together in an integrated fashion in a two-part assignment: “Charlotte through 100 Households.” Once again, in order for the full potential of multiple, “interoperable” historical data sources to be realized and implemented at the local level, new techniques for automating data harvesting and relating different data types will need to be developed, along with supporting strategies for crowd-sourcing at least some of the aspects of data collection that resist automation. Back to top
Census enumerations provide household-level data for every community in the U.S. at ten-year intervals between 1790 and 1930. Release of identifiable personal data from the census enumerations is embargoed for 70 years. The release of the 1940 census enumerations on April 2, 2012 presents a tantalizing opportunity to add yet another comprehensive, fine-grain data source. For the first time, the census enumerations will be released as digital images, accessible at National Archives and Records Administration (NARA) facilities nationwide as well as on personal computers via the internet. No microfilm copies will be made available to the public. Genealogical search sites such as Ancestry.com have revolutionized family history by making census enumerations name-searchable. However, to date the millions of hand-written census enumerations have been processed manually. Access on commercial sites to the digitized enumeration images and the search engine needed to identify one person among millions is limited to paid subscribers. The challenge here is the development of processes for automating and crowdsourcing the extraction and georeferencing of individual and household records from census enumerations, so that these public documents can be used by local organizations as public goods.