Te matawai i ngā tapekemoka
Checksums overview
Checksums overview (17/F25 v2 April 2023)
Please note that the PDF of this page has been removed because the content is the same and our website has a feature to print a webpage. You can also see a preview of this by using the print command (CTRL + P). If you still require a PDF version, please contact us at rkadvice@dia.govt.nz
What is a checksum?
A checksum is a computer-generated string of numbers and letters that act as a digital fingerprint for a digital object. Even the smallest change to a digital object will cause its checksum to change completely.
For example, here is a checksum for a digital object:
f75d91cdd36b85cc4a8dfeca4f24fa14 and here is the checksum for the same digital object when one letter was changed in the content: 7aca5ec618f7317328dcd7014cf9bdcf.
Why are they used?
Checksums are a tool for ensuring the integrity of digital objects. A change in a digital object’s checksum indicates a change to the object’s data (i.e. changes in content, data loss or corruption). An unchanged checksum indicates that no change has occurred to the object’s data since the checksum was created.
What can they be used for?
Checksums are a tool which allows a ‘chain of custody’ to be established between those who create, preserve, and access digital objects. In data management there are generally at least three uses for checksums:
to confirm that a digital object has successfully been transferred and received from a source without change
to confirm that the integrity of a digital object has been maintained while in storage (i.e., it has not been changed or corrupted)
to provide users confirmation that the digital object they are accessing has been retrieved, stored and delivered to them without any changes occurring to the data.
When are checksums required?
Checksums are required from public offices when transferring digital information and records (digital records) to us and when assuring the continued unaltered state of born-digital and digitised records. The ability for public offices to produce checksums for each digital record is an important readiness characteristic for digital transfer to us.
Checksums can also be a useful tool for public sector organisations to ensure the continuity and integrity of born-digital and digitised information and records in their control during migration or while in storage.
Generating and validating checksums
The generation (i.e., creation) and validation of checksums can be performed in multiple ways by a number of software tools, many of which are free such as Free Commander (Windows) and SHA1SUM or MD5SUM (Linux). The tool DROID (Digital Record Object Identification) can also be used to generate checksums. DROID is a file format identification freeware created by The National Archives UK and can be downloaded from their website.
Checksums are generated using standard algorithms known as hash functions and there are many different types. Archives has the capability to work with any; most often organisations provide either MD5 or SHA1.
It is important to remember that while checksums can detect changes to digital objects, they do not document where the change occurred or what the change is.
How we use checksums
We will not accept physical or digital records for transfer without information that describes what those records are. This information is metadata. A metadata file or list for all the records must accompany the transfer. At a minimum, we expect organisations to provide the mandatory metadata elements required by the Information and records management standard (16/S1). In addition to descriptive metadata, we require metadata that allows us to confirm the fixity of digital records. Fixity ensures the integrity of a record by verifying that it has not been altered or corrupted during migration, transfer or while in storage. This is usually accomplished through the application of checksums, checksum algorithms and file paths.
We use checksums supplied by transferring public offices to ensure that transferred digital records are not altered or corrupted from the time they are copied or exported from the system(s) in which they are stored until they are received by us. Checksums must be generated by a public office before transfer, either from within the original storage system(s) or immediately after the records are copied or exported from it. The checksum values must be provided to Archives as part of the metadata describing each digital record, or in a separate file (ideally both). Providing the checksums allows us to validate each digital record and make sure that all the records have been transferred successfully and that no changes or errors were introduced during the transfer.
Checksums are also used by us to monitor integrity in the Government Digital Archive (GDA). Rosetta, the long-term preservation system used for managing the GDA, creates checksums using three different hash functions or algorithms for every single digital record during upload to the GDA. The GDA system then monitors and validates those checksums continuously to make sure the integrity of the digital archives is never compromised.