DIL Publishes “Big Historical Data” Feature Extraction

Digital Innovation Lab members Richard Marciano, Bobby Allen, Chien-Yi Hou and Pam Lach have published “Big Historical Data,” a Feature Extraction in the most recent issue of the Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives, Volume 9, Issue 1-2, 2013).

Abstract

In the 1930s, the Home Owners’ Loan Corporation (HOLC), a New Deal federal agency, surveyed hundreds of U.S. cities, producing a national map collection that documented the demographic, economic, infrastructural, and ethnic status of tens of thousands of neighborhoods across the country. The resulting collection of so-called redlining maps is one of the preeminent urban and racial surveys conducted in the history of the United States. We at the Digital Innovation Lab of the University of North Carolina at Chapel Hill are building a national digital collection of these paper maps, currently housed at the National Archives in Washington D.C., and using this collection to explore the use of semiautomated feature extraction techniques on large historical content. Our methodology is based on supervised, classification image processing techniques. We use a commercial tool called ArcScan, an extension to the popular ESRI ArcGIS software, to extract tens of thousands of neighborhood boundaries that can then be saved as vector overlays and used to drive the development of new types of research interfaces. We conclude the paper with examples of these new types of interfaces. Finally, we describe the potential impact of linking vectorized national collections together and the need for further research in this area, including using hybrid approaches that involve large-scale crowdsourcing.

Access to the article is available through UNC Library’s ejournal subscription.

DH Press Beta Launch

I am thrilled to announce the release of DH Press Beta 1.0! Our toolkit is officially live and ready to use. Our showcase pilot project, “Mapping the Long Women’s Movement” will come online shortly.

Beta 1.0 Overview

While we had to put some functionality on the backburner for this first release (including the non-geo-spatial visualizations and some backend customization), DH Press is nonetheless a robust, flexible, easy-to-use tool for visualizing humanistic data, one that will continue to grow over time.

DH Press makes creating a digital humanities project manageable, even for non-technical users and those new to DH. Indeed, early testers report that the biggest challenge to using the tool is building the dataset (i.e. the intellectual work of the project), which happens outside of the DH Press environment.

Project Creation Workflow

Once your data is created and formatted appropriately, it’s easy to import a CSV version into DH Press (see our documentation for more information about formatting and importing data).

DH Press Project Settings Interface

DH Press Project Settings Interface

After importing your data into DH Press, configuring the project settings is relatively straightforward. First, an admin user configures the motes (see DH Press Terms and Definitions), which are data wrappers that add functionality to custom data fields and allow them to render on the frontend. Then, set up the map view (add map layers, set the map center and zoom level, create legends for filtering, and customize the look of the markers). Finally, set up the main project page and the marker modal (the window that pops up when a marker is clicked).

The end result is a clean map that pans and zooms, filters markers, and allows you to adjust the map size and transparency (even of the base map), all while supporting the sort of exploration and search that you’d want in a DH project. And because DH Press is based in WordPress, you can incorporate narrative and in-depth analysis, or even embed your own project blog on the site.

DH Press Test Map Visualization

DH Press Test Map Visualization

The same map in Fullscreen Map mode

The same map in Fullscreen Map mode

Visualizations and Other Features

Currently, the only visualization DH Press supports is a map. However, we hope to get the following visualizations up and running shortly (by this summer):

  • Timelines (standalone and integrated with the map to show the intersection of time and space)
  • Topic Cards (like a heat map; see this demo)
  • Animations

Additionally, we hope to add the following functionality to the tool very soon:

  • Customizable page views for data points
  • New data types (integers, full-size images, thumbnail images)

And we have started thinking about how to expand the audio/transcript tool (which we’ll release when the Long Women’s Movement project launches) to handle non-English oral histories, videos, and even manuscript images.

Using DHP

There are two ways to use DH Press. If you have your own installation of WordPress (that is, you are not using the wordpress.com hosting service), you can grab the plugin from GitHub and install it. Otherwise, you can sign up to play in our Sandbox. We’ll create a user account for you, and give you your own standalone DH Press site to work in. We’ve even prepared some sample data to get you started.

Coming Soon/Next Steps

In the coming weeks, we’ll be working hard to launch our first pilot project, “Mapping the Long Women’s Movement,” which has driven the development of Beta 1.0 from the start. We’ll also be working to get our four new pilot projects (created in Bobby Allen’s AMST 840: Digital Humanities/Digital American Studies graduate seminar) launched. We’re still debugging, so keep an eye out for updates to the code (GitHub users will have to install updates; Sandbox users will not). And we’ll keep working on the new visualizations and functions. We’ll also begin user testing and our security audit to see if DH Press can be used in UNC’s WordPress environment (web.unc.edu). So stay tuned for these developments.

Acknowledgments

DH Press Beta 1.0 would not be possible without the hard work and dedication of our project team.

Our lead developer, Joe Hope, has worked tirelessly to make DH Press as clean and intuitive as possible. Chien-Yi Hou handled our map functionality and Joe Ryan is taking the lead with our security audit.

Our graduate students, current and former, have played a critical role in project management, creating documentation and offering general project support: Stephanie Barnwell, Jade Davis, and Bryan Gaston. We’ve also relied heavily on Jessie Wilkerson and Liz Lundeen, History PhD students, who have led the data gathering for our Long Women’s Movement Pilot. Seth Kotch, Digital Humanities Coordinator at the SOHP, has proven to be a fearless client for this pilot. And our four DIL undergraduates contributed greatly to the Women’s Movement data gathering: Chris Breedlove, Beth Carter, Charlotte Fryar, and Lauren Stutts.

Finally, the DIL co-directors, Bobby Allen and Richard Marciano, have provided the team with support and guidance along the way.

Many thanks to you all!

Bobby Allen on DH and Cinema Studies

DIL Co-Director Bobby Allen delivered the final “Future Knowledge” lecture at the University of South Carolina on Monday, April 8, sponsored by the Center for Digital Humanities.  “Please Step Away From the Screen: How Digital Humanities Can Re-Write Cinema History” explored the implications of his work on Going to the Show for cinema history and the potential of digital humanities methods and materials to reshape the field of cinema studies.

View the talk and discussion

Position Announcement: CDHI Programs Coordinator

The Carolina Digital Humanities Initiative is looking for a Programs Coordinator!

This position will be responsible for administering and coordinating the diverse activities and programs of the Carolina Digital Humanities Initiative. These programs include the Digital Innovation Lab/Institute for Arts and Humanities Fellows Program, the CDHI Graduate Fellows Program, the CDHI Postdoctoral Fellows Program, and the Graduate Certificate Program in Digital Humanities. Duties associated with coordination of these program include liaising with academic departments and support units in the College of Arts and Sciences and other university academic and support units; organizing and executing selection recruitment evaluation for all programs; supporting the work of the CDHI Faculty Steering Committee Chair, serving ex officio on faculty sub-committees; and maintaining the CDHI website.

The position will also:

  • participate in and coordinate professional development and training activities in digital humanities for faculty and graduate students;
  • plan campus activities and events designed to increase interest and involvement in digital humanities across the campus in cooperation with other university units, other universities and digital humanities programs, and cultural heritage organizations;
  • develop and administer assessment impact metrics for all CDHI programs and be responsible for documenting and reporting on them to the CDHI Faculty Steering Committee, College of Arts and Sciences, and external funding bodies;
  • share project management and supervision responsibilities for digital humanities projects undertaken under the auspices of the CDHI with the Manager of the Digital Innovation Lab;
  • administer the graduate certificate program in digital humanities, including advising graduate students, and coordinating digital humanities course offerings with academic units at UNC and at Duke and NCSU;
  • work with faculty to develop new digital humanities course offerings and to add digital humanities methods and approaches to existing courses;
  • develop teaching/learning platforms and content for online and in-person online hybrid course offerings in digital humanities.

The position will work under the direction of the Co-PI of the CDHI. He/she may be assigned other related responsibilities and duties, including (but not limited to): supervision of graduate research assistants and undergraduate student workers and managing development and submission of grant gift proposals to external funding agencies.

More details and application instructions are available here.

Zephyr Frank on HGIS

On Tuesday, February 19, 2013, Zephyr Frank gave a public talk, “Layers, Flows, Intersections: Historical GIS for 19th-century Rio de Janeiro,” to an audience at UNC and King’s College London (KCL). The event was streamed and recorded via Microsoft Lync.

If you missed it, you can check out his PowerPoint presentation or watch the video of the entire event:

This videoconference seminar is part of a broader planned cooperation between the DIL at UNC and the Department of Digital Humanities at KCL. Zephyr’s visit was sponsored by the Triangle Digital Humanities Network (TDHN), a DH coordination effort between the National Humanities Center, Duke, NCSU, and UNC.

 

Zephyr Frank is Associate Professor of Latin American history at Stanford University, where he has taught since 2000.  His research interests include quantitative methods for social and economic history, the application of GIS techniques in historical analysis, and the study of literature in relation to social and cultural history.  His research has appeared in the pages of the Journal of Economic History, Comparative Studies in Society and History, the Journal of Social History, and the Journal of Latin American Geography, among other venues.  He is a founding member of the Spatial History Project and the current director of the Center for Spatial and Textual Analysis (CESTA) at Stanford University.

Marciano on Socializing ‘Big Data’

DIL Co-Director Richard Marciano gave a talk at Duke on January 15: “Socializing ‘Big Data’: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanities.”

Harnessing the “data deluge” is promoting new conversations between disciplines.  Prof. Marciano and his collaborators have been pursuing research in a number of areas including: big cultural data, access to big heterogeneous data, records in the cloud, federated grid/cloud storage, visual interfaces to large collections, policy-based frameworks to automate content management, and distributed cyberinfrastructure to enable data sharing.  But more importantly, innovative technical approaches require the convergence of creative insights across computer science, the social sciences, and the humanities.  This talk touched on these topics and highlighted a new collaboration with partners at Duke.

The talk was co-sponsored by the Franklin Humanities Institute and Social Science Research Institute.

Meet the Newest Member of the diPH Team

The Digital Innovation Lab welcomes the newest member of our team: Jade Davis. Jade is a PhD student in the department of Communication Studies, with a Media Studies and Performance focus. Her research interests explore how diasporic people use digital technology, asking the question, “what does it mean to be Black online?” Jade was a HASTAC Scholar and Student Representative to the Steering Committee, has been involved in the Mozilla Labs Design Jam, and helped organize last fall’s THATCamp RTP. She comes from a web design and production background.

Jade will join the diPH technical team, assisting us with documentation and training materials for diPH Beta 1.0. She will also work on theme design this spring.

 

diPH Beta is Nearly Ready

We have been hard at work getting diPH beta 1.0 ready for release in early 2013. Our tech team has been finishing up the administrative backend to ensure that project creation is as intuitive as possible. Meanwhile, our pilot project, Mapping the Long Women’s Movement, is close to completion. We’ve prepared nearly all of the data, and are beginning to play with it in our development space to improve the visualizations and interactions.

Beta 1.0

In advance of the first beta release, here’s an overview of where we expect the beta version of the plugin will be by the end of January.

–Project Layer

As I’ve discussed in a previous blog post, any material you upload into diPH will be organized and bundled into projects. Essentially, admin users will set up new projects, where they’ll define their project settings: the type of visualizations (primary and secondary), the custom fields they’ll use, the taxonomy for structuring those fields, and the categories which will drive the visualizations. Associating new content with a particular project—whether via bulk data ingest from a csv file or manually adding a single data point—will apply the project settings to the data, resulting in the proper structure and display of the data. Finally, you’ll be able to create and customize a project-level visualization legend—what fields will users be able to search and filter on?

Map Layer

The map layer should be ready by the end of January. This feature allows you to select a base map(s) for your spatial visualization (e.g. Google Satellite imagery, or Bing maps). From there, you can layer other maps over the base map. We’ll have a map library containing some historic maps (made available through the Carolina Digital Library and Archives Map API). Eventually, you’ll be able to add new maps to the Map Library: if the map was processed in a format compatible with OpenLayers, all you’ll need is the map’s URL to render it as a map layer in diPH. Map layers will be interactive, allowing end users to turn maps on/off and change the level of transparency.

Data Points

As I’ve noted in an earlier post, the visualization(s) an admin selects for a project will dictate how the data point will display on the front end. A map visualization, for instance, will result in a marker (or a polygon, a feature that will likely not be included in Beta 1.0 but should be available by the end of the Spring 2013 academic semester). A timeline visualization would produce events. The visualization can be rendered on the fly depending on what a site visitor chooses as the active visualization. We’ve also been developing an Icon Library to support icon customization. And we’re currently working on ways to display large numbers of data points at different zoom levels via on-the-fly customization. We’re not sure if that feature will be ready by the end of January.

Because diPH is built over WordPress, each data point will be a stand-alone post. Once you associate the point (currently called a “marker”) with a project, all of the pre-defined fields you’ve already established will show up on your “Add New Post” page. You’ll have the option to add new fields for any given data point, though you’ll want to be careful that your data is as standardized across the project as possible. Of course, you’ll be able to include the usual post content—free text, images, videos, and other multimedia.

Data Management/Interoperability

We’ll be creating documentation and training materials for managing bulk data in diPH—how to format data, import it, delete it, and bulk update it. For now, we strongly recommend users create their data outside of diPH and then import it in. We intend to provide a few csv templates for diPH users, but this won’t be ready for beta 1.0. Data will be exportable out of diPH via JSON feeds, as well as the normal WordPress export functions.

Another thing that won’t quite be ready, but that we’re keen to build into the tool, is a data formatting tool. To make your data as interoperable as possible, diPH will eventually be able to transform your data into an open standard, such as GeoNames and Well-known Text (WKT). Rather than require admin users to know how to format their data, diPH will be able to read and convert data. This should apply to locations and dates for starters.

Other Visualizations

At the time of the initial beta release, we expect to have at least two types of visualizations: maps and timelines. We probably won’t be able to offer anything but marker point visualizations, but eventually we will incorporate polygon and line data on the map. Using the open source TimeLine JS program, we’ve begun playing around with timeline displays. Eventually, we hope to combine the two visualizations so that we can render space and time together.

We are also currently testing the capability of rendering more than one interactive map in a single project interface. This will facilitate side-by-side comparison of different locations. We hope to be able to support up to four unique maps in a single interface, assuming load time isn’t too adversely impacted.

Audio/Transcript Tool

See this previous post about our A/T tool to learn more about our work on this piece of diPH, which will be part of the 1.0 beta release. Transcript editing and data point creation will not be part of the initial plugin release.

End User Interface

In addition to user interactions with the visualizations, as well as search and browse capability, diPH beta should feature a help mode and hover display of additional information. For admin users who activate these features, they will be able to provide in-line instructions and explanations to their site visitors.

User-Generated Content

While WordPress allows unregistered users to post comments, diPH beta 1.0 will not allow any other user-generated content. We plan to let users tag and create their own data in a future release, but we will need to think through user account management (possibly relying on existing social media account management, thereby allowing people to log onto a project via Facebook or Twitter).

Administrative Interface

diPH beta will feature what we hope will be an intuitive administrative back end/dashboard for easy project and data creation.

Site Structure and Organization

Since WordPress allows a high amount of customization with respect to website structure, we expect diPH will allow that as well. We’re hoping to include some sort of breadcrumbs and wayfinding, including a way to either undo or reset visualizations. Another possible implementation would be to show to the site visitor his/her selection/click history.

OS Capability

diPH is currently compatible with iPads but not smartphones. We’d like to address this next semester if possible. We’ll also start testing diPH in different operating systems. While Internet Explorer is a particularly problematic browser for loading large amounts of data, we need to make sure diPH will work in IE, as we expect a great many of our public users rely on this browser.

Install and User Documentation

Finally, as we finalize diPH beta’s look and feel, we’ll begin creating modular training documentation and videos (expect this material to start coming online in February). This will help admin users create projects, format data, and customize the look of their projects. We’re also hoping to support multi-site instances, so that every time we update the diPH beta code, it will cascade down to all diPH sites (whether we’ll be able to deliver this by 1.0 release is still uncertain).

We don’t expect to release the plugin to the WordPress plugin directory until we’ve gone through several more development cycles. For now, we’ll make the code available on GitHub, and we’ll include a zip file along with install instructions on diph.org. Depending on demand, the DIL may be able to provide some support for individual installations.

A New Year

Look for our first beta pilot project to come online sometime in January. And, we’ll begin four new beta projects early next year as part of Robert Allen’s AMST 840 graduate seminar. Stay tuned!

UNC Graduate Student Wins Impact Award for Greensboro Digital History Project

The Digital Innovation Lab heartily congratulates Journalism PhD Candidate Lorraine Ahearn. Lorraine is one of this year’s recipients of the Graduate Education Advancement Board Impact Award for her work on “Windows to the Past: People, Places & Memory in Downtown Greensboro.”

The award competition, organized by UNC’s Graduate School, recognizes outstanding graduate student research of particular benefit to North Carolina.

Homepage of “Windows to the Past” digital history project.

Lorraine’s project was developed in the Digital Innovation Lab’s AMST 890: “Digital Humanities/Digital History: Recovering and Representing the Past” (Fall 2011) in collaboration with the Public History graduate program at the University of North Carolina-Greensboro under the direction of Benjamin Filene. Lorraine, along with SILS Masters student Kami LaBerge, teamed up with six graduate students at UNC-G to develop a virtual walking tour of downtown Greensboro.

The site was built with our “Main Street, Carolina” software, developed in collaboration with the Carolina Digital Library and Archives (the DIL is currently working on a new WordPress-based tool, inspired by MSC, called diPH: Digital Public Humanities Toolkit).

The digital project includes an interactive historical map, overlaid onto contemporary Google satellite imagery, with informational markers pinned to the map.

The virtual tour was part of a three-tier project to document downtown Greensboro storefronts consisting of 1) a six-month exhibit of storefront displays in sixteen buildings on Elm Street, 2) a self-guided walking tour, and 3) a virtual walking tour. The MSC site maps thirty-six historic buildings, tracing their changing uses over the years to uncover untold stories of the city’s past. Drawing on oral histories, public records, and photographs, the site represents change over time, the power of personal memory, and the links between lived experience and the built environment. As Lorraine explains, “the site enabled users, like Bennie P. Harden, Jr. to contribute to the story of the downtown. Harden, for example, recalled a performance by little-known singer, Elvis Presley, in the mid-1950s at the theater Harden’s father managed downtown.”

Lorraine and Kami worked largely virtually with the UNC-G students, providing technical guidance and consultation while helping conceptualize, organize, and create the site. Lorraine and Kami also developed a mobile version of the site that can be accessed via QR codes included on each storefront display.

Greensboro residents at the December 2011 opening reception. The storefront panels included QR codes linking to a mobile site.

The site premiered on December 2, 2011 at Greensboro’s First Friday event. At its first unveiling, “Windows to the Past,” was shown and demonstrated to hundreds of visitors who came through the opening reception. Outside the Elm Street Center in the crowded streets, closed to traffic, potentially many more pedestrians got their first look at the storefront panels. The main website received nearly 1,000 visitors in its first year, significantly higher than other MSC project sites, but this did not include the traffic to the mobile sites from the QR application.

“Windows to the Past” was made possible in part by a grant from the North Carolina Humanities Council, a statewide nonprofit and affiliate of the National Endowment for the Humanities.

As Lorraine reflects, “As exciting as the public unveiling of the website was at Greensboro’s Festival of Lights in December 2011, with storefront “Windows” panels bearing QR tags potentially seen by thousands of downtown visitors, what is still more promising is the idea that the site persists after the physical exhibit has gone.” She added, “It was a stroke of serendipity to be a small part of the interdisciplinary team which launched this website on downtown Greensboro public memory, completed in the span of a semester.”