Richard Marciano’s CI-BER Project Included in White House “Big Data” Announcement

Original press release published on SILS website.

CHAPEL HILL – On March 29, 2012 the United States White House announced the “Big Data Research and Development Initiative.” According to news sources quoting Tom Kalil, deputy director for Policy at the Office of Science and Technology Policy, “Six Federal departments and agencies announced more than $200 million in new commitments that, together, promise to greatly improve the tools and techniques needed to access, organize and glean discoveries from huge volumes of digital data.”

The Cyberinfrastructure for Billions of Electronic Records (CI-BER) project was included in the White House fact sheet titled, “Big Data Across the Federal Government,” which was distributed in conjunction with the announcement. CI-BER was listed as one of the leading projects in the country “that address the challenges of, and tap the opportunities afforded by, the big data revolution.”

The research and the underlying infrastructure of CI-BER are sponsored by multiple agencies of the Federal Government. The fact sheet also noted that the testbed will evaluate technologies and approaches to support sustainable access to ultra-large data collections.

The CI-BER project is led by principal investigator, Dr. Richard Marciano, professor at the University of North Carolina at Chapel Hill’s School of Information and Library Science (SILS). It aims to further the understanding of infrastructure that scales and provides insights into the management of “big data” in general. Researchers associated with the project look at ultra-high scale collections and visual analytics techniques in order to enhance the value of government records that can lead to generalizable infrastructure and technology.

The CI-BER project is a collaboration between the National Archives and Records Administration (NARA) and UNC at Chapel Hill. Some 17 student interns working in NARA’s Applied Research lab in Rocket Center, WV, under the leadership of Mark Conrad, archives specialist, have assembled a collection of almost 100 million unique files containing electronic records of the Federal Government. Marciano and students at UNC at Chapel Hill have built a stand-alone collection at the Renaissance Computing Institute (RENCI), which is federated with the Rocket Center holdings. The UNC students working on the project include Chien-Yi Hou, doctoral student and Pamella Lach, master’s student at SILS.  Both are also members of the College of Arts and Sciences Digital Innovation Lab (DIL).

CI-BER demonstrates the interplay between the sciences and the humanities and the potential educational impact of big data collaboratives.

UNC at Chapel Hill CI-BER members include:  Richard Marciano, PI, SILS professor and co-founder of the DIL; Stan Ahalt, co-PI and director of RENCI; and RENCI staff:  Leesa Brieger, senior research software developer; Jeff Heard, senior research software developer; Joe Hope, interface developer Web master; Erik Scott, manager, project engineer; and SILS students/DIL staff: Chien-Yi Hou and Pamella Lach.


The 1940 Census as Digital Data

The 1940 Census as Digital Data

Presented by the Digital Innovation Lab


Tuesday, April 10

University Room, Hyde Hall, UNC-Chapel Hill

Image from the National Archives

On April 2, the National Archives will release the full 1940 U.S. Census to the public, following the required 72-year restriction of access to enumeration data. This census will provide a window into the lives of ordinary Americans, immigrants, and refugees during the Great Depression / the eve of the country’s entry into World War II. For the first time, these records will be released solely in digital format. Though the records will be freely accessible, they will not be fully searchable until indexing is complete.

The Digital Innovation Lab at UNC-CH will host two events to mark the occasion of the release, and to explore the implications and applications of this digital dataset for librarians and archivists, historians, population researchers and genealogists, and those interested in “big data.” In addition to discussing the 1940 census as a historical document, additional topics to be covered will include applying Optical Character Recognition (OCR) technology to the handwritten enumerations, and expanding accessibility through the development of indexing, crowdsourcing, and search tools and platforms.

The 1940 Census: A Public Roundtable Discussion

12:30 pm – 2:00 pm in University Room, Hyde Hall

Join us for this lively discussion of the uses of the 1940 census from the perspective of genealogists, historians, and computer scientists. We’ll explore the challenges of handling millions of digital records, and how those records can be used with other types of historical data.


Constance Potter, National Archives and Records Administration

Kenton McHenry, National Center for Supercomputing Applications

Stephen Robertson, University of Sydney and co-author of “Digital Harlem

 Emily Stanford Schultz,

Using the 1940 Census: A Hands-On Workshop

3:00 – 4:30 pm in University Room, Hyde Hall

Constance Potter, National Archives and Records Administration

Emily Stanford Schultz,

Robert Allen, American Studies, UNC

In this afternoon workshop we’ll explore different approaches to accessing and using the 1940 census. Constance Potter will provide historical and interpretive context about the census, and Robert Allen will share some pedagogical applications for using census data alongside other sorts of historical data. Participants will also have the opportunity to try out FamilySearch’s crowdsourcing tool for indexing and transcribing census enumeration files (laptops encouraged).

For questions visit or email Pam Lach:

To access the census after April 2 visit

More details are available at