Archives around the world are filled with handwritten letters and typed memos. But what about correspondence of a later vintage? How should governments, universities, business, and archives ensure the future generations can access and render email?
In 2018, this problem led the archives and records community to assess options. With support from the Andrew W. Mellon Foundation and Digital Preservation Coalition, a working group authored a comprehensive report, The Future of Email Archives, looking at the many ways that messages can be captured, preserved, and rendered.
The working group noted that some archives and libraries choose to preserve and represent email within platforms that use email-specific formats such as MBOX, EML or PST. Others maintain or emulate old email environments. A few store messages in XML formats. These approaches require a relatively high level of technical development or support.
Archives, libraries, and other memory institutions have experimented with these approaches, but have not widely implemented them as production services. As a result, many organizations are simply storing format specific email archives as unprocessed holdings. For this group, email to PDF offers a relatively straightforward migration pathway, with demonstrated downstream benefits.
One may ask: Why should PDF be considered as a potential target format for archival-quality, preservation-enabled emails? That’s a good question. Answers can be grouped under two headings:
PDF addresses gaps and risks inherent to current email formats and migration pathways
Email to PDF migration leverages existing standards and a diverse vendor community
In short, the "email archiving in PDF" concept seeks to build on widely implemented standards and technologies. It would allow individuals and institutions a pathway to migrate email into the most widely used format for the distribution of text documents.
PDF is, of course, a marketplace leader in universal document presentation. But there is a catch.
While PDF is integrated into many email systems, current outputs typically amounts to little more than a digital printout. Attachments, metadata, context, and sometimes, even searchable text are missing. Simply "printing to PDF" fails to meet the specific needs of institutions archiving volumes of complex email messages, at least as currently implemented. How can institutions ensure authenticity, completeness, privacy, security and other needs, especially when working with thousands or millions of messages, when most header metadata and attachments are lost in the conversion?
In 2019 the Mellon Foundation funded some additional work to come up with the beginning of a solution. We assembled a small group of experts, some in email archiving and others in PDF. Members included representatives of the Library of Congress, the National Archives and Records Administration, university and state archival institutions, and several PDF technical experts. The group identified and documented the essential characteristics and technical requirements for converting email into PDF. The work will soon be published as a set of fundamental requirements for archiving email. The recommendations set out an approach to considering ISO 32000 Portable Document Format (PDF) technology as a model for capturing email for long-term archival purposes using open, ISO-standardized technologies.
Following the publication of Requirements for Archiving Email using PDF, the working group developing these recommendations will seek additional funding to extend the exploration into a superset specification for PDF, oriented towards the specific needs of email archiving. At present, we are exploring many options, and we are very interested to get thoughts, suggestions, and feedback. I’m only an email message away: email@example.com!
Chris Prom is Associate Dean for Digital Strategies at the University of Illinois at Urbana-Champaign, where he coordinates the University Library’s efforts related to resource discovery, information technology, digital preservation, and user experience. He holds a Ph.D. in History from the University of Illinois, with an emphasis in British social policy and working class movements. Chris is currently the Principal …