Volume 21, Number 9/10
Table of Contents
The Value of Flexibility on Long-term Value of Grant Funded Projects
Lesley Parilla and Julia Blase
The Field Book Project is an initiative to increase accessibility to field book content that documents natural history primary source documents that describe the events leading up to and including the collection of specimens or observations during field research. It is a partnership between Smithsonian Institution Archives, National Museum of Natural History, and Smithsonian Libraries. The Project began in 2010 with a grant from the Council for Library and Information Resources (CLIR) to identify, locate, and catalog field books across the Smithsonian Institution. Since then, the Project has cataloged more than 7,500 field books across 8 departments and divisions of the Institution. Field book catalog records were made available to the public for the first time in December 2012 on Smithsonian's Collection Search Center. The Project is now digitizing the cataloged field books which are available on Collections Search Center and Biodiversity Heritage Library.
1 Project Description
From its inception, the Field Book Project has been a small entity in a big institution. Permanent, full-time staff members are few and funding is derived solely from grant proposals, though the project goals are considerable and are supported by institutional technological resources and permanent staff who are able to act in an advisory capacity. Instead of looking at staff and funding challenges as limits, the project has instead seen them as opportunities. Full time project staff have developed and maintained a robust system of communications and workflows that enable the project to adapt and take advantage of current and future institutional efforts as they arise. It has developed to be flexible and able to take advantage of new opportunities as they arise. When the project started, staff anticipated what it might look like in five years and the project's current iteration varies in important ways from the original expectations. Yet in most cases, by adapting to new opportunities and aligning itself with institutional goals, the project has met or exceeded original expectations in terms of the public access to field book content and how records are utilized.
Five years ago, at inception, the Field Book Project set out to solve the problem of discovering, cataloging, and preserving an unknown (but known to be large) quantity of archival items across the institution1. Some of the identified challenges were:
- Finding a balance between cataloging quantity and cataloging quality.
- Providing essential metadata while acknowledging and planning for the time needed to catalog at identified levels (e.g. up to one hour to catalog one item).
- Cataloging with multiple access points such as taxonomic coverage, collection numbers, vessel names and expedition names.
- Developing a consistent and logical method for describing content, such as geographic location, that is not fully governed by an authority source. The Project uses multiple authorities for several descriptive elements. Established authorities such as Library of Congress Subject Headings (LCSH), Getty Art & Architecture Thesaurus (AAT), and Getty Thesaurus of Geographic Names (TGN), are not universal in their coverage for this type of content. It is described by different authors at varying levels of detail, and by different names, spellings, and languages over time.
- Developing new workflows with an awareness of other institutional needs.
- Creating workflows for the Smithsonian project that would also be as useful as possible for partner organizations both large and small.
Five years later, with insights from consulting colleagues, the Project has developed a flexible approach in data structure and has leveraged that flexibility to gain a leadership role in initiatives across the institution. It has allowed the project to be utilized in ways that were not part of the original goals but were made possible because:
- Project data has been gathered and stored in a robust and flexible manner.
- The collections described are of a manageable size and sufficiently uniform format to use as a test bed for new initiatives and approaches, such as digitization and transcription.
- Project staff have been able to develop a broader understanding of institution-wide goals and directions through their work across unit divisions. This understanding contributes to their capacity to proactively adjust project language, activities, and goals so that they are always in line with the most recent institutional goals and more easily supported and funded.
By collecting robust and flexible data, staying in tune with long-term institutional goals, and volunteering project data as a test bed for new initiatives, the Project has been able to act as a leader in developing workflows and standards for successful experiments and building knowledge and capacity for its own goals even when the experiments were not continued.
To date, we have produced:
- Flexible, efficient, replicable workflows relating to cataloging and digitization.
- Documentation of business processes now utilized by other Smithsonian units as well as non-Smithsonian partner institutions.
- Flexible, robust data that is contributed via a variety of online platforms.
- The first contributions to Smithsonian's new transcription center and, through that and similar initiatives, expanded input into discussions of new approaches and tools for digitization, transcription, and online publication of archival materials. .
2 Field Book Project Workflows
The workflows developed by the Field Book Project were for cataloging, conservation, and digitization of archival materials at the item-level.
Cataloging: Project staff were able to develop a cataloging workflow in which archival items are described on average in an hour or less, a fraction of the time required for cataloging archival items in traditional MARC format or Encoded Archival Description (EAD) format, increasing potential throughput from two items to eight items per day. Items are described in what might seem like an overly complicated blend of Metadata Object Description Schema (MODS), Encoded Archival Context (EAC-CPF), and Natural Collections Description (NCD) schemas. However, the custom schema addresses all internal institutional needs and, should MARC or EAD records be desired, all records can be and are regularly converted into the desired formats using metadata maps and standard workflows developed by Project staff for ease of integration into the workflows of other units. Furthermore, the cataloging workflows include a place for the input of conservation and digitization information into the database as well, proactively supporting those dependent workflows. Cataloging workflows have been used to train partner institutions on efficient approaches to cataloging similar materials in their own archives.
Conservation: The Field Book Project cataloging process flows naturally into the conservation process. A separate database module is triggered and populates when an item is marked for conservation during cataloging. Conservation staff may then access this database and use it to better organize their own work for maximum efficiency. The seven thousand cataloged items provide a test bed for communication within the Archives unit and have resulted in better communication between processing and conservation staff when, for instance, processing staff identify unexpected condition issues in the course of their regular work.
Digitization: The Project was formulating its digitization workflows at the same time the Smithsonian Institution Archives was designing its overall digitization approach. Because the Project relied on a close partnership with the Archives for its technical digitization capacity, the units developed digitization workflows in a close, iterative, and highly cooperative process, the Archives staff providing technical knowledge and the Field Book Project providing metadata, throughput, and testing of data capacity.
The Field Book Project digitization workflow included a mandate to digitize complete folders and, when possible (exceptions were made for items in need of conservation) complete boxes from a series identified for digitization both from a broad pick list and from occasional reference requests. Project workflows also called for steady digitization throughput, providing complete item-level metadata and adding page-level technical and administrative metadata during digitization. At the time, the Institution Archives was doing item or even page-level digitization for reference upon request, and item-level description was often based on information available from the requestor. Archival description had been completed only at the series level as is common in the "More Product, Less Process" approach2, but which had complicated the process of providing expedient and complex metadata for materials as they were digitized at a more granular level. Field Book Project records provided for the first time a consistent folder level description that could be used as a model for the Archives staff, enabling the Archives to change their approach to digitization without extensive use of staff time and resources. The Archives also followed the lead of the Project in deciding that, when an item was requested for digitization, the entire folder or box would also be digitized and, when possible, made available online.
Project social media about field book content and online record availability translated to an increase in digitization requests for field notes. On top of the regular Project digitization pick list, this increase meant a regular enough flow of digitization for the Project and Archives staff to test multiple workflows and baseline digitization activities, developing standard guidelines for how to digitize archival items. These items offer a wide range of challenges such as materials that are over- and under-sized; materials in poor condition; materials with inserted photos, negatives, and specimens; materials that include previously unidentified contributing authors; and materials with potentially sensitive material (such as breeding ground or migration pattern data), all of which are addressed and expedited by the new workflows.
The time required to digitize has markedly decreased even as requests have increased. The resulting guidelines and workflows are now being made available to other Smithsonian departments and research institutions as they begin to take on the challenges of digitizing archival materials.
Figure 1: Digitization Workflow
Figure 1 shows the digitization workflow developed with Smithsonian Institution Archives, with additional workflows for contribution to Biodiversity Heritage Library (BHL) and the Transcription Center. Smithsonian Institution Archives (SIA) maintains images on the Institution Digital Assets Management System (SI DAMS) which are then made available to the public through the Smithsonian Collections Search Center (CSC). In order to contribute to additional systems like Internet Archive (IA) and BHL, an additional workflow was developed to send images and metadata to BHL's Metadata Collection and Workflow System (MACAW). The workflow has been of use to other BHL institution partners looking to contribute primary source materials.
3 Flexible Data Records
As mentioned briefly in the overview of the cataloging workflows, the Field Book Project cataloged items in a custom schema that combined MODS, EAC-CPF, and NCD. The Project also developed a custom installation of a Filemaker Pro database. Normally, using a custom metadata schema and database is problematic. However, the custom schema and database have allowed the Project to create item level records that are detailed yet flexible enough to be exported in a variety of formats for many archival and library systems. At the item level, the extensive descriptions in controlled subject fields, multiple access points, and metadata maps, which allow records to be exported in an Anglo-American Cataloguing Rules (AACR2) MARC-compliant format, mean that records can be published by the Biodiversity Heritage Library (BHL).
Collection level records provide the basis of collection level description in Archives online records for newly accessioned field book collections. Item level abstracts are utilized by SI archives and now provide a template for description of non-field-book materials that are digitized and require description. Full item, collection, and EAC-CPF records with abstract and controlled subject fields can be, and are, made available through a simple XML export to the Smithsonian's Collections Search Center (CSC) and, through there, on Digital Public Library of America (DPLA) and Europeana. Field Book Project records can not only be easily exported for use by more traditional systems like the MARC- and XML-based BHL, CSC, and DPLA, but are also flexible enough to contribute to new systems, as the Project is currently a pilot partner with Social Networks and Archival Context (SNAC), developing new exports of EAC-CPF records for SNAC's shared search database.
4 Early Contribution to Systems
As discussed earlier, the Field Book Project was an early contributor to the development of the Smithsonian Institution Archives' digitization processes. The Field Book Project has also been an early contributor to the Smithsonian Transcription Center. The Project's strong description, well developed workflows, large selection of catalog data, and expert content knowledge enabled the Project to serve both efforts. When the Transcription Center first came into being, staff could easily sift through Field Book Project content to select those materials most likely to engage volunteers and offer those materials quickly. Digitized field books could be easily loaded to the Transcription Center and helped the staff learn what appeals to their audience of "volunpeers," tailor what they make available and when, and offer guidelines and expertise to other units based on what they learned in their experiments with the Field Book Content. Because of this early availability and willingness to serve as a testing base, even today field books make up a substantial portion of the materials fully transcribed despite the Project's smaller size and limited funding
Furthermore, the full text of completed transcriptions is searchable through the Smithsonian's online catalog. More than 100 field books are now text searchable. This is a large enough group that both Project and Archives staff have been able to develop search methods to find content, which has in turn provided Transcription Center staff with solid feedback about the utilization of crowd sourced transcription for accessibility. The result has been a positive feedback loop, where new users share their ideas and experiences with the collection based on the transcriptions, and the project staff can then use that input to manipulate their data in other ways and address concerns or create services to further increase discoverability and usability.
5 Workflows Offered to Other Institutions Unexpected Results
The project planned to develop all workflows with the idea that they could be used and re-used by other units and other institutions. Originally, the Project anticipated that smaller partner institutions might be more interested in our workflows relating to collection level description. This has not been the case. The greatest expressed interest from both small and large partner institutions has been for the item level workflows, especially as the Project has demonstrated the capability to contribute to well-known and established systems like Biodiversity Heritage Library. BHL's strong online presence and wide consortium of libraries and research institutions has demonstrated that even though the Project uses an innovative, hybrid cataloging structure, Project methods and data can easily contribute to current systems. This contribution to BHL has been especially important to the long term Project goal of reconnecting all primary source field book documentation with the resulting publications. To date, the Project has contributed more than 543 digitized field books, using commonly available software (Excel) and file formats (CSV, PDF, TIFF), to BHL, where they can be searched for and discovered alongside their sibling publications.
6 Social Media Not to Be Underestimated
Many institutions have questioned the effect and value of social media; the Field Book Project has benefited markedly from a consistent, strong output of online materials and interactions. The Field Book Project maintains a website, blog, and Twitter account. Early in the Project, catalogers began to write blog posts that included collection highlights and descriptions of unexpected finds in materials. They have also written blogs that have been important for explaining Project methods, providing examples of the type of work the Project does, and providing informal access points to field book collections for researchers.
As staffing numbers have changed, the Project has expanded and maintained a consistent social media presence which has meant significant numbers for blog visits (60 unique hits per day), our Twitter followers (750), and Flickr and website despite our diminutive size. Field book content has been used for Wikipediathons, Flickr sets, blog posts, and transcription projects. Field books have been a consistent source of new information about the history of women in science at SI and citizen science contributions. Project blog posts and other media are important tools to sharing stories and showing connections that often inspire new research interest and, on the Transcription center, new "volunpeer" participation. The Project has also begun to coordinate its social media output with that of other Smithsonian units, increasing visibility while decreasing the time that full time Project staff spend creating social media content.
7 Still to Accomplish
While the Field Book Project has come a long way, from hardly knowing what content might be held at the Smithsonian to having:
- over 7,300 items cataloged, 550 items digitized, 105 items in the Smithsonian Transcription Center,
- catalog records published in the Collections Search Center, DPLA, and BHL, and
- robust and flexible workflows for accomplishing all of those tasks with fewer than three full time staff.
The Project is still investigating how to strengthen the relationship between the field notes, the specimens they describe, and related published literature. Contributing field books to BHL answers part of this challenge. Social media may also play a role. Transcription and other crowd sourcing appear to be an imperative to finding a connection between specimen and publication. There are simply no financial resources currently available for such a huge task.
Furthermore, while the Project has methods to manage historic field books, it is also concerned about what the upcoming challenges might be for dealing with field books from current creators. The Project has been opening dialogues with natural history departments and divisions not only about their closed collections of field notes, but also about how they have used and stored field documentation for the last five and ten years. This effort has already produced unique information on what the Project is likely to encounter (such as digital "field notes" kept in Word, Google Docs, hard drives, and other cloud sources) and also unique partnerships for describing and using historical field notes, as demonstrated by the February 2015 #FWTrueLove Transcription Center Challenge.
Finally, how does the Project continue to accomplish its goals in a way that is consistent with current and future trends? The Project has had to be flexible with field book data in online systems in the past. Furthermore, even as project throughput and upload have increased, staff have been able to demonstrate enough value both to the Project and to future users to ask those online systems to adapt to the Project rather than the other way around. For instance, Smithsonian's Collections Search Center now utilizes image galleries in order to show high resolution images of field book pages. BHL originally required MARC records for upload and could not display transcribed content or item level abstracts, but in seeing the value of Project requests for other BHL users, it has adapted to Project requirements and is now able to accept CSV files and transform them into MARC, it worked with Project staff to find a place for transcriptions, and is currently working on a display for item level abstracts.
The Project staff cannot predict what might be the next system requirement, user service, or functional capacity it will be asked to serve, nor can it anticipate where the changes will originate and what the Project resources will be at that time. The Field Book Project has been agile enough to stay on top of institutional developments for the first five years of its existence. It only hopes to be able to continue to adapt and remain relevant in the shifting digital archival landscape.
1 This article is an update to Nakasone, Sonoe, Carolyn Sheffield (2013). "Descriptive Metadata for Field Books: Methods and Practices of the Field Book Project," D-Lib Magazine, Vol. 19. http://doi.org/10.1045/november2013-nakasone, which describes the project cataloging structure in detail.
2 Greene, Mark A., Dennis Meissner (2005). "More Product, Less Process: Revamping Traditional Archival Processing". American Archivist 68: 208263.
About the Authors
Lesley Parilla is the database manager and principal cataloger for the Field Book Project. She coordinates Project outreach efforts in conjunction with Smithsonian Institution Archives and NMNH Staff as well as manages Project's social media content. She began as a contract cataloger with the Project in 2011.
Julia Blase is the Project Manager for the Smithsonian Field Books Project. She manages day-to-day project operations and coordinates communications between project partners. Julia Blase comes to the Biodiversity Heritage Library from the National Digital Stewardship Residency, a fellowship program with the Library of Congress, where she spent the last year completing a digital asset management analysis, needs assessment, and strategic plan for the National Security Archive. Prior to that she managed the Denali '13 Centennial Exhibition project at the American Alpine Club Library in Golden, CO.