Skip to Content

Digital Preservation

Digital preservation is an umbrella concept that refers to the diverse range of activities needed to ensure that authentic digital assets survive and remain accessible and usable over long periods of time. While the roots of the digital preservation problem are chiefly technological, it has been clear for many years that solutions will also need to address significant questions relating to business models and organisation, as well as legal and policy frameworks. This broad understanding of the digital preservation problem space is reflected in many places, including Margaret Hedstrom's well-known definition of it as "the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable" ("Digital preservation: a time bomb for digital libraries," Computers and the Humanities, 31(3), 1997, pp. 189-202).

Author: Michael Day
Last updated on 2 May 2013 - 3:19pm

Concerns with the potential long-term sustainability of digital content first emerged in the 1970s and 1980s as information began to migrate into computerised storage. Following traditional preservation practice, much of the initial focus was based on the relatively short life-spans of digital storage media, but gradually it became clearer that media longevity was just one component of a wider set of preservation challenges related to the rapid evolution of hardware and software. It began to be recognised that both hardware (including peripherals like disk drives) and software rapidly become obsolete or unusable over relatively short time-scales. It was also realised that much digital content is interactive or dynamic, time-dependent or otherwise more complex than earlier generations of content mediums. This is as much true of simple formats like e-mail as it is of interactive Web pages generated by ever evolving databases. Critical to the acceptance of these more 'sophisticated' understandings of the digital preservation problem was the publication in 1996 of the report of a US Task Force on Archiving of Digital Information. This highlighted the emerging threats and problems from digital technology and argued for the emergence of a 'deep infrastructure' capable of dealing with factors unknown at that time: "the effort to meet the cultural imperative of digital preservation [...] requires a complex iteration and reiteration of exploration, development and solution as the relevant factors and their interrelationships emerge and become clearer and more tractable" (John Garrett and Donald Waters, Preserving Digital Information (Commission on Preservation and Access, 1996), p. 7).

At around about the same time, several different conceptual approaches to the digital preservation challenge began to emerge. Perhaps the most influential of these has been the Reference Model for an Open Archival Information System (OAIS), an ISO standard (ISO 14721) initially developed by the space data community - but since adopted fairly widely by the cultural heritage sector. As its name might suggest, the OAIS model provides a useful terminological framework for discussing digital preservation concepts and processes, as well as an outline of the main activity types undertaken by a typical repository (the functional model) and the main types of information - including metadata - that would be required to ensure continued access to content over time (the information model). Alternative approaches were developed by the archives sector, where research projects initially focused on the definition of requirements for computer systems that would be able to support the ongoing integrity and authenticity of archival records. This research has underpinned the ongoing development of what are now known as Electronic Document and Records Management (EDRM) systems (e.g. through the Model Requirements for Electronic Records Management (MoReq2) specification) and the emerging series of ISO standards on records management (e.g., ISO 15489, ISO 23081). The debates over these approaches to digital preservation led to a consensus on :

  • Digital preservation involves the active management of content, defined by a JISC briefing paper as "the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value." (http://www.jisc.ac.uk/media/documents/publications/digitalpreservationbp...).
  • Successful preservation is dependent upon the capture of appropriate metadata documenting the identity and characteristics of both content (e.g., its provenance and context) as well as the preservation actions that have been carried out on it.
  • While the heart of the digital preservation challenge is technical, solving it can not be divorced from its underlying social, economic, legal and organisational context. Digital preservation is essentially a societal challenge, not least because continued access to information of value is so vital to many sectors of society, including research and education and the cultural heritage domain. Indeed, the wide range of potential stakeholders and beneficiaries has made it very difficult to work out exactly who should be responsible for solving the digital preservation challenge.

With a level of agreement on these general principles, attention soon began to turn to the development of the infrastructure and tools needed to support digital preservation in real-world contexts. Given the wide range of stakeholders and usage scenarios, these initiatives tended to focus on particular institutional requirements or content types, e.g. scholarly communication (electronic journals and institutional repository content), research data, Web-based content, sound recordings or video. The resulting initiatives have driven forward the digital preservation agenda in three main areas:

  • Gaining practical experience of digital preservation through the development of active preservation programmes.
  • The development of the infrastructure and tools necessary to support sustainable preservation activities.
  • Identifying the organisational and policy frameworks needed to underpin sustainable long-term preservation.

Digital preservation programmes and exemplars

Over the past fifteen years, a large number of organisations have begun active digital preservation programmes. One of the very first to do so was the Internet Archive, which started collecting Web content in 1996 and has since extended its reach to many other digital content types, e.g. digitised texts, video and images. The Internet Archive has demonstrated the power of a flexible, independently-funded approach to dealing with the digital preservation challenge, rapidly creating what the historian Roy Rosenzweig has described as "the world's largest database and library in just five years" ("Scarity or Abundance? Preserving the past in a digital era," American Historical Review 108(3), 2003, pp. 735-762, here p. 750). By contrast, cultural heritage organisations have tended to focus their preservation activities on particular content types in response to existing organisational mandates. An early example of this was the e-Depot of the National Library of the Netherlands (Koninklijke Bibliotheek), which developed a preservation workflow for electronic publications (specifically e-journals) based on IBM’s DIAS (Digital Information and Archiving System). Many other cultural heritage organisations have followed suit and implemented things like EDRMS for archives or other kinds of digital repository for things like audiovisual content or the products of scholarly communication. In the UK, organisations like the British Library and The National Archives have invested significant sums in digital preservation activity. In addition, HE funding-bodies like the JISC have funded many studies and initiatives on digital preservation topics, mainly focused on the stewardship and curation of research data and the other outputs of scholarly communication. The Digital Curation Centre (DCC) plays a key part in promoting this agenda. In the US, a number of preservation initiatives have been supported by the National Digital Information Infrastructure and Preservation Program (NDIIP), with partner projects developing active approaches to preserving many different types of digital content, e.g.: images, recorded sound, government information, electronic journals (e.g. Portico), multimedia content, geospatial data, Web content and virtual worlds.

Digital preservation infrastructure and tools

A large focus in many recent digital preservation initiatives has been on the development of infrastructure and tools. Some digital preservation tools have emerged from institutions undertaking their own preservation activities; others have emerged from research projects funded as part of national initiatives - e.g., the NDIIPP or the DCC - or through the research programmes funded by organisations like the JISC or European Commission.

The 'tools' currently available vary widely in nature and scope, ranging from fully-functional commercial products, through online registries of technical information about file formats and other kinds of 'representation information,' to decision-support processes intended to inform the planning, development or audit of preservation services. The following list of tools - compiled in 2011 - is intended to give a flavour of the wide range of tools that are currently available. [Please note that this list is not meant to be exhaustive and the inclusion of tools here does not constitute an endorsement].

Planning and audit tools:
  • Organisational preparedness and planning: DAF (Data Asset Framework), AIDA (Assessing Institutional Digital Assets), CARDIO (Collaborative Assessment of Research Data Infrastructure and Objectives), IDMP: Integrated Data Management Planning Toolkit & Support
  • Requirements gathering: IBM Component Business Modelling (CBM)
  • Repository audit: DRAMBORA, CCSDS 652.0-R-1 Audit and Certification of Trustworthy Digital Repositories (TDR), Nestor Catalogue of Criteria for Trusted Digital Repositories
Ingest storage and migration tools:
  • Workflow tools: TAVERNA, Digital Preservation Recorder (DPR)
  • Ingest: File Information Tool Set (FITS), Xena (XML Electronic Normalising of Archives), NLNZ Metadata Extraction Tool, Web Curator Tool, Heritrix, SOAPI (Service Oriented Architecture for Preservation and Ingest of Digital Objects), Library of Congress BagIt Transfer Utilities, SWORD (Simple Web-service Offering Repository Deposit), Archive-It, Embedded Metadata Extraction Tool (EMET), Web Archives Workbench
  • Preservation planning and content characterisation: Plato
  • Format identification and characterisation: JHOVE2 (JSTOR/Harvard Object Validation Environment)), DROID
  • Registries of format or representation information: PRONOM, UDFR (Unified Digital Format Registry), Library of Congress Digital Formats Website, CASPAR/DCC Representation Information Repository
  • Storage interfaces: SRB (Storage Resource Broker), integrated Rule Oriented Data Systems (iRODS), Federated Archive Cyberinfrastucture Testbed (FACIT), L-Store (Logistical Storage)
  • Format migration: CRiB (Conversion and Recommendation of Digital Object Formats)
  • Fixity checking: Checksum Checker, ACE (Audit Control Environment)
  • Integration with cloud services: Duracloud
  • Content packaging and exchange: METS, MPEG-7 DIDL, EchoDep Hub and Spoke Framework
Preservation management tools:
  • Repository systems: DSpace, Fedora, ePrints, LOCKSS, DAITSS (Dark Archive in the Sunshine State)
  • Third-party services: Internet Archive, OCLC Digital Archive, Iron Mountain content archiving, Portico (for e-journal content)
  • Commercial products: IBM DIAS, Ex Libris Rosetta, Tesella Safety Deposit Box
  • Preservation metadata frameworks: PREMIS Data Dictionary , ISO 23081, ContextMiner
  • Packaging formats: METS, MPEG-7 DIDL
  • Preservation research and development: PLANETS Testbed
  • Preservation watch: IBM Preservation Manager

Much of the success in implementing a successful digital preservation initiative will be related to matching organisational requirements to the wide range of potential tools and products becoming available.

Organisational and policy frameworks

Ensuring the long-term preservation of digital content is both a significant technical challenge as well as an organisational one. In fact, Rosenzweig considers the social, economic, legal and organisational problems to be worse, in part because digital content has "disrupted long-evolved systems of trust and authenticity, ownership, and preservation" ("Scarcity or abundance?" American Historical Review 108(3), 2003, pp. 735-762, here p. 743). A major component of digital preservation initiatives to date has, therefore, focused on the development of organisational and policy frameworks, including policy development and the selection (or appraisal) of content. A major recent focus has been on the long-term economic sustainability of preservation activities as well as on more specific topics like training requirements.

Digital preservation policies:

Many of the organisations involved with digital preservation have developed specific policies or strategies. Examples might include the British Library Digital Preservation Strategy or the Digital Preservation Policy for Parliament developed by the UK Parliamentary Archives (2009). The National Archives has recently developed draft guidance for archives on the development of digital preservation policies and model policies have been outlined in the JISC-funded Digital Preservation Policies Study (2008).

The economics of digital preservation:

One of the key principles of digital preservation audit frameworks like the emerging Audit and Certification of Trustworthy Digital Repositories (TDR) standard are that digital repositories need to ensure their financial sustainability over time. The economic lifecycles of digital information has been explored by three phases of the JISC-funded LIFE (Life Cycle Information for E-Literature) project, which has developed a methodology to model lifecycles and to calculate the costs of preserving certain types of digital information over five, ten and twenty years. The wider context of the economics of digital preservation have also been analysed by the Blue Riband Task Force on Sustainable Digital Preservation and Access (2010).

Educational and training requirements for digital preservation:

A topic of growing importance is the need to integrate digital preservation concerns into the professional education and training of cultural heritage professionals and researchers. There are a number of collaborative approaches to determining educational requirements emerging in this area. One example is the Closing the Digital Curation Gap (GDCG) project, a US-UK initiative focused on building a common information environment to help bridge the gap between cultural heritage professionals, researchers and educators. US Initiatives like DigCCurr (Preserving Access to Our Digital Future: Building an International Digital Curation Curriculum) and ESOP-21 (Educating Stewards of Public Information in the 21st Century) have helped to provide a deeper understanding of some of the competencies required.

In addition, digital preservation training programmes have been developed over the past decades by initiatives like the German Nestor network of expertise and by European Union-funded projects like the DELOS Network of Excellence on Digital Libraries and the Digital Preservation Europe (DPE) and PLANETS (Preservation and Long-term Access through Networked Services) projects. In the UK, the University of London Computer Centre and the Digital Preservation Coalition (DPC) have built on Cornell University's Digital Preservation Management Workshop to create a residential Digital Preservation Training Programme that has been run since 2005. Since 2008, the Digital Curation Centre has also offered various iterations of an introductory 'Digital Curation 101' course and many of the materials from that are available online (http://www.dcc.ac.uk/training/). Other collections of digital preservation training materials are available from DPE and the PLANETS and JISC KeepIt projects.



Dr. Radut | subject