Ngā aratohu mahi pai rawa mō te pupuri, tiaki rawa matihiko
Best practice guidance on digital storage and preservation
Guidance on digital storage and digital preservation for public sector organisations.
Why digital storage is important
How and where digital information and records (digital records) are stored will affect their viability over time. Public offices and local authorities (public sector organisations) need to manage their digital records to meet the requirements of the Public Records Act 2005 and the principles of the Information and Records Management Standard (16/S1). This helps to ensure digital records remain:
authentic
reliable
discoverable
accessible
usable
protected
preserved
for as long as they are required.
This also enables public sector organisations to meet their business needs and legal requirements, particularly for digital records identified as high-risk or of archival value.
This guide is for organisations considering how and where they’ll store digital records and provides best practice criteria for those organisations intending to use or provide digital preservation storage. Given the rapidly changing and evolving nature of digital storage and preservation, this guidance is deliberately generic and high level. You should first consult with your organisation’s IT specialists to gain an understanding of the organisation’s technical environment before contacting us for advice.
Please note that this guidance does not cover contractual, jurisdictional or funding issues that may also be part of any decisions on digital storage and preservation solutions.
What is digital storage
Previously, your organisation may have been using discrete media such as individual CDs, tapes, etc. to store digital records which then must be migrated periodically to address media degradation and obsolescence.
Nowadays it's becoming more for organisations to use resilient IT storage systems for the growing volume of digital records that need to be preserved, and more importantly, that need to be easily and quickly retrievable in a culture of online access.
According to the Digital Preservation Coalition’s Digital Preservation Handbook, a resilient IT storage system consists of storage media contained within a server that provides built in resilience to various failure modes by using inbuilt redundancy and recovery. For example, it might be data tapes in a tape library, remote cloud storage, or automated replication of digital records across multiple sites and systems.
In this way, management of the digital content can be decoupled from the mechanism of its storage, that is, the media or technologies and the supporting IT infrastructure. This has the added benefit of allowing you to handle different preservation activities independently.
How and where your organisation stores its digital records is key to ensuring they remain accessible and trustworthy for the entire time they need to be retained.
How and where can you store digital records
There are several ways in you can store your digital records.
Storing digital records online
You can do this locally on your organisation’s server infrastructure, or by hosted storage through the internet, for example in cloud storage. Networked online storage is where data is stored on multiple virtual servers that are hosted by a third party, which may be offshore. Digital records held in online storage devices are immediately accessible to users and are more likely to be identified and included in changes such as system wide migration processes, and in regular integrity checks and back-ups.
Storing digital records offline
This allows your digital records to be relatively mobile, for example, on removable storage media such as magnetic tapes, CDs, DVDs, memory cards, flash drives (USB sticks). You need to be aware of the risks with using removable media, for example, data security, malware infections, loss and hardware failures. Also, removable media is often overlooked when systems are upgraded, and digital records migrated to new formats.
Storing digital records near-line
This is where your digital records are stored separate to and not directly accessible by your organisation’s systems but can be quickly retrieved and brought online for access, for example, from a local tape library or a cloud storage service.
You should consult with your IT specialists about your organisation’s specific digital storage requirements.
Storage approach considerations
You should consider the following to guide your selection of digital storage systems.
Security
If your digital records have privacy and/or security requirements, consider how specific storage systems will allow these to be managed and enforced.
Access and availability
If you need to access your digital records often and/or quickly, consider selecting storage media with fast retrieval times. Also consider what specific combinations of hardware and software you need.
Longevity
If you need to keep your digital records long term, consider selecting storage media with a proven lifespan that is appropriate for the period you want to retain them. Longer lifespans will reduce the need to migrate or refresh the storage media or undertake other preservation activities to reduce the risk of data loss.
Viability
Consider what error detection and integrity checks you have in place to monitor and ensure against inadvertent change, deterioration or loss of your digital records over time and/or when storage media is refreshed, or data is migrated.
Obsolescence
Most digital storage media will only last 5 to 7 years before it will be necessary to refresh or update it. Consider when your storage media and their technical infrastructure are likely to become obsolete or unsupported. Select storage systems that are robust with a regular, clearly defined migration path and widespread industry support.
Principles from the Digital Preservation Coalition’s Digital Preservation Handbook
Principles from the Digital Preservation Coalition’s Digital Preservation Handbook. When selecting or designing storage systems for preservation storage, you should also consider the following principles from the Digital Preservation Coalition’s Digital Preservation Handbook.
Redundancy and diversity
Make lots of copies, stored in different locations. Use a combination of online storage systems and offline media; use different types of storage technology to spread risk and balance data safety with easy access.
Fixity, monitoring and repair
Use fixity measures such as checksums to record and regularly monitor the integrity of each digital record and each copy. Store fixity information alongside the digital records as well as in separate systems. If you find out that data is corrupted or lost, then use one of the copies to create a replacement.
Technology and vendor watch, risk assessment and proactive migrations
Understand that storage technologies, products and services all have a short lifetime. Keep an eye on new and changed technology and the viability of vendors or classes of storage solutions. Be proactive, migrate storage before your digital records become at risk.
Consolidation, simplicity, documentation, provenance and audit trails
Minimise the proliferation of legacy media types and consolidate your digital records onto a minimum number of storage systems. Document how digital records have been acquired and transferred into the storage system(s) as well as how these are set up and operated. Use this to provide audit information on data authenticity.
Digital preservation storage categories
Preservation storage supports digital preservation which is defined in the Digital Preservation Coalition Glossary as “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary…beyond the limits of media failure or technological and organisational change.”
The following nine categories or characteristics of preservation storage are based on international best practice. They are intended not only to help your organisation with developing requirements for digital preservation storage systems or solutions, but also to help with evaluating digital storage options and services and informing your IT infrastructure design and planning. You should adapt the criteria to suit your organisation’s individual requirements, practices, legislation and environment.
Content security
The digital preservation storage solution (the solution) provides and/or supports features and methods that prevent harm (intentional or unintentional) to your digital records. For example, the solution:
provides remediation actions for content found to have malware (for example, quarantine, notification)
supports permanent deletion by authorised users in a way that prevents recovery, in accordance with your organisation’s policies and rules
provides the additional level of security required for personal, sensitive or confidential content according to your legal or organisational needs.
Flexibility
The solution is adaptable, interoperable and customisable to your organisation’s preservation requirements or preferences. For example, the solution:
is able to adjust storage infrastructure in response to changing requirements (for example, legal requirements, audit results)
includes storage components that can be easily integrated with other systems and applications, that is plug and play (for example, uses standard file access protocols and file system semantics).
Infrastructure security
The solution has controls and safeguards that protect the storage system’s infrastructure from interference or intrusion. For example, the solution:
provides role-based, access controls to ensure that your digital records cannot be easily altered or inappropriately accessed
includes software that regularly conducts checks to identify malware.
Preservation actions
The solution supports your organisation’s preservation requirements by allowing you to take an active role in monitoring, managing, and preforming interventions on your digital records. For example, the solution:
performs verifiable and/or auditable integrity checks to detect changes or loss in or across copies (for example, checksum recalculation, fixity checking, missing files) at regular interval, during transfers.
allows or supports the use of tools to perform preservation actions both at the individual object level and in bulk.
Resilience
The ability of the solution to resist, remediate or recover quickly from threats, errors or other difficulties. For example, the solution:
provides sufficient backup and disaster recovery functionality to ensure continuity of repository functions
has failure tolerance measures in place to enable continuous operation for a long period of time (for example, by eliminating single points of failure with effective monitoring)
replaces or repairs missing or corrupt files in acceptable timeframes or provides the ability and tools for your organisation to perform these actions independently.
Scalability and performance
The ability of the solution to meet the diverse and changing storage, architectural and computational needs of your organisation. For example, the solution:
supports the entire export of your content and metadata for any reason, within an acceptable timeframe (for example, as part of an exit strategy)
is able to support long file, path or directory names and diverse character encodings.
Support
The assistance provided to your organisation related to its use of the solution. For example, the solution:
supports periodic performance reviews, assessments, validations and audits required by your organisation (for example, reports, technical documentation, transaction history, performance data, and continuity practices)
supports contingency plans and strategies to stop using the solution (for example, the ability to transfer content to another solution without loss)
provides appropriate training to your staff across all relevant operational and maintenance tasks.
Sustainability
The financial, environmental and/or other impacts on your organization or its broader context (society, the natural environment). For example, the solution:
costs relatively less overall than other comparable solutions, by being designed with cost efficiencies (for example has resource pooling and sharing, multi-tenancy, that is multiple users share the same applications)
takes advantage of energy conservation principles and techniques (for example, prefers green computing options that require less cooling, consume less power, or use less rack space).
Transparency
This characteristic refers to the assurance, evidence and visibility that your organisation has into the activities and status of the solution and its content. For example, the solution:
provides reports about content (for example, number of objects/files/formats, average file size, types of objects) as well as custom configurable and on-demand reporting of content or activity
captures and documents all actions relating to the content (for example, information about integrity check failures, deletions, modifications, additions, preservation actions) and who or what performed the actions
provides full, complete, current and available documentation of key processes, services, systems, procedures, known limitations and functions, and changes that have been made to them.