Over 5.2 million pages strong... and counting
- Image
- Interior of the Library of Congress
- Image
- Chris, Shawn & Robin with "batch_wa_lacamas"
Pulling all the teams, awardees, conversion specialists, NEH contacts, and LC resources together is the NDNP Coordinator, Deborah Thomas. Deb has a long history of working with digital collections in our national library, most notably, the American Memory project, a multimedia collection of American history and culture with over nine million items. In my short interview with the team, she really helped put the national project into context for me. One of the most significant challenges is managing “a sustainable collection of significant scale produced by many organizations” which includes careful planning for maintaining access and managing the data and processes long term. She reminds us that “Digital objects are not just pictures. For newspapers, they are pictures of pages and machine-readable text from those pages and metadata that describes the pages and the relationships between pages.” In order to help people find what they’re looking for we need to figure out “how to make the cream rise to the top.” These millions of pages of newspapers would be pretty overwhelming to wade through without text search capabilities at the page level. Creating standards for metadata and text recognition software (OCR) is only a piece of making these pages accessible. Each state has their own workflow; software vendors; page or article level OCR; file storage systems; and even multiple languages that need to be filtered and standardized.
When I asked the team about what they enjoy most about their work Robin admitted she loves how “something wacky pops up every day” referring to the many series of cartoons, entertaining articles and sometimes sensational headlines. Chris agreed and mentioned his favorites are the illustrations of the future, which led to discussion of Deb’s favorite article from the December 20, 1908, New-York Tribune, “Public Library of the Future.”
Unlike the library vision in the article, we may not be sending facsimiles of our newspapers and important manuscripts through pneumatic tubes to our Congressional Library, but we will be sending a dozen or so hard drives with thousands of files of newspaper pages to real people, the people I met in the James Madison Building. These are the people who will be helping us create the new digital libraries of a very real future where we can still have “a library in every hotel, train, trolley car and steamship!”(html)