In the light of the recent interest in checksums, this is a revised version of the blog originally published in 2016.

A checksum is a string of numbers and letters that act as a fingerprint for a file against which later comparisons can be made to detect errors in the data. They are important because we use them to check files for integrity.

Our digital preservation policy uses the UNESCO definition of integrity.

“Digital content is information encapsulated in one or more digital objects. Within this context, integrity of a digital object is the quality of its content remaining ‘uncorrupted and free of unauthorized and undocumented changes'"

National Library of Australia/UNESCO. (2003). Guidelines for the Preservation of Digital Heritage.

Checksums are useful when moving files from one environment to another for example validation after migration; for regularly checking the integrity of files managed in a system - where you expect the file content to remain unchanged over time - and also when working with files to uniquely identify what we are working with.

Checksums will bridge the gap, quite literally, between the organisation and permanent preservation in our archive during transfer or deposit. A file must remain unchanged from the duplicate in your Content Management System when you extract it. We will attempt to prove that unchanged state when we store it in the digital repository. An exception procedure triggers if anything unexpected has happened. Use of checksums is also relevant for local authorities managing digital protected records.

The actual procedure which yields the checksum is called checksum generation. A generation uses one of a collection of checksum functions or algorithms. These algorithms usually output a significantly different value even for the tiniest of changes to the data. So, checksums ensure a corrupt-free transmission. They also indicate when the file has been tampered with; an important byproduct of integrity is security.

We need to monitor checksums throughout the transfer or deposit lifecycle. There are two important points where we must guarantee integrity. Firstly, when we receive the files including checksums from your organisation and compare them to a new checksum output that we create. Secondly, when we deposit the files into the permanent repository and check them against the original transfer sent to us by your organisation. Once in our repository, we will continue to monitor the checksums to ensure the files remain unchanged in perpetuity.

Open source tools

Checksums can be generated and validated with many tools. Below is a list of some open source tools for your convenience: TOOL: Free Commander

Operating System: Win

Generate: Yes

Validate:Yes


TOOL: Double Commander

Operating System: Win, Linux, MacOS

Generate: Yes

Validate: Yes


TOOL: DROID

Operating system: Win, Linux, MacOS

Generate: Yes

Validate: No


TOOL: AVPreserve Fixity

Operating System: Win, MacOS

Generate: Yes

Validate: Yes


TOOL: Checksum-comparator

Operating System: Win, Linux

Generate: No

Validate: Yes


TOOL: Spreadsheet (LibreOffice)

Operating System: Win, Linux, MacOS

Generate: No

Validate: Yes


TOOL: SHA1SUM, MD5SUM commands

Operating System: Linux

Generate: Yes

Validate: Yes.

Use in a command line


TOOL: Online MD5 generator

Operating System: Win, Linux, MacOS

Generate: Yes

Validate: No

Further reading

Digital Preservation Coalition (UK) – Fixity and Checksums. Contains further reading and links to other tools. There are many other tools out there and many internet links.

Capability assessment

To assess your own capability, here are some questions for you and/or your organisation:

  • Does your organisation use checksums and if so what type?

  • Has your organisation used checksums in any other scenario e.g. for de-duplication?

  • Would your organisation be able to create a checksum comparison list like the one described?

  • We are very interested to hear any questions about or practices of working with checksums and will use these to produce further relevant information.


Originally published on the Records Toolkit blog on 22 June 2017