The Guide to Managing Web Records is a good practice guide for public offices and local authorities managing web records and developing web information management frameworks in line with their organisation-wide recordkeeping policies and systems.
The Public Records Act 2005 requires public offices and local authorities to create full and accurate records of their affairs in accordance with normal prudent business practice. Records associated with websites, intranets and extranets form part of these records.
To ensure this guide is applicable to many different processes and systems, its focus is on assisting analysis and decision-making rather than describing process. The information given is not a prescription for implementation. It aims to provide direction
and present options for organisations to consider when applying recordkeeping policies to websites. The guide will assist planning for the management of web records, including existing and legacy web records.
The Guide to Managing Web Records is intended for:
This guide applies to all records of the activities, decision-making processes, actions, and transactions of organisations covered by the Public Records Act 2005 as they apply to web records. It is applicable to websites, intranets, secured extranets, and discontinued websites, maintained by a public office or local authority and include records:
The following key terms relate specifically to the web and web records and are used throughout the guide:
Content Management System (CMS)
A system or software package that allows authoring and publishing of web pages or content.
Websites or web pages that can not be discovered by internet search engines and are often not accessible by the general public (e.g. password protected sites, or sites restricted to users on specified networks). Publicly available web pages that are created
dynamically (e.g. as the result of a specific search) are also classed as deep web, as these pages exist briefly, on demand, and thus do not exist when a search engine spider indexes the site. A spider is a very simple form of an automated browser. When a spider visits a web page it first reads the text on the page and then tries to follow links from that page off to other web pages and websites.
A website that uses databases and/or logical programming to deliver pages, look and feel or content based on variables such as date, user, or randomly selected elements.
A private, secured website that shares part of an organisation's information or operations with suppliers, vendors, partners, customers or other businesses.
A private website used to distribute information within an organisation.
A website where pages are ‘served’ with the same information to every user. Nothing on the page changes unless the page is edited. The website does not use a database, and does not use scripting to perform background logic related to user, page content, layout or structure.
A collection of related web pages, images, videos, or other digital assets, written in mark-up language such as HTML or XML, accessed through an IP address or domain name.
This guide has been informed by Archives New Zealand’s standards (mandatory and discretionary) and guides, and should be read in conjunction with them. They can be accessed in the Continuum Resource Kit.
Ease of managing web records may be assisted where they exist on websites that conform to the New Zealand Government Web Standards 2.0. These can be found on the web standards website at: http://webstandards.govt.nz/new-zealand-government-web-standards-2/
A web record may be any information in whole or part that appears on a website and provides evidence of business activity. Websites include ‘deep web’ content such as information found on intranets, extranets, decommissioned, and secured websites. It also includes records in informal web applications such as wikis, blogs, forums, and shared workspaces.
In some situations, an entire website may be managed as a record. However in most instances, a record will be an element. For example a page, content item, image, document, submitted form, or log entry recording an action taken. The record may
have multiple pages, items, or log entries, and will include information about changes made to the record over time.
Organisations must make a clear decision on how to manage entire websites and parts of a website. Consider the following:
Will capturing a web record include its context?
Capturing the context of a web record might require reproduction of the look and feel of the site, page or element. Capturing the look and feel of a website includes recording the graphical user interface (GUI) of a website or software package, including the colours, shapes, layout and typefaces (the look) as well as the behaviour of dynamic elements such as buttons, boxes, and menus (the feel).
Is content on the website re-published from another source?
Some content on websites is likely to have been re-published from another source, with the original managed as a record outside of the website e g, annual reports and strategic plans may be captured and managed in your corporate recordkeeping system. In situations like these, the document on the website is a duplicate which may not need to be kept unless there is good reason, e g, where the additional functionality and context the website gives changes the interpretation of the information. You will still need to consider keeping change logs to record when the information was available to the public.
Do web records exist on third party web sites and applications?
Web records located on third party sites need to be managed as records and be accessible to your business and the public for reference, e g, through an Official Information Act 1982 or Local Government Information and Meetings Act 1987
request. The creation of records on these sites should be informed by recordkeeping policies and procedures.
Examples of third party websites include:
Information about web projects and the management of a website are records
Information on the web can have multiple uses. Although public offices and local authorities need to manage web records for business and legislative purposes entirepublic facing websites can also be managed as publications. The National Library of
New Zealand harvests entire websites as part of its statutory legal deposit function. This includes the selecting, copying and harvesting of websites found on the internet (it does not cover intranets, deep web, or extranets). Most web harvesting is undertaken by theAlexander Turnbull Library on a selective basis and forms part of the New Zealand Web Archive, which is part of the Alexander Turnbull Library’s published collections. Harvesting a website does not replace managing web records but does provide a means of archiving websites. Managing websites no longer in current use, including archiving decommissioned websites, is discussed further in section 4 of this guide.
The creation and maintenance of web records should form part of organisation wide recordkeeping processes and support good business practice. As web information often changes quickly, organisations must be aware of the risks posed by failing to capture changes made to web records. The level of risk will depend on the type of information being created and published on the web, and the nature and frequency of the changes. Organisations should undertake a risk assessment to inform decisions on how regularly changes made to a web site must be captured.
Minimum requirements for the creation and maintenance of records are described in Archives New Zealand’s Create and Maintain Recordkeeping Standard. The principles described in this standard can be applied to web records.
Important factors to consider in choosing an approach to creating and maintaining web records are:
Approaches to developing a strategy for creating and maintaining web records is discussed in more detail in section 3 of this guide. When planning the implementation of systems to manage websites, procedures and training for all staff should also be
developed to inform the creation and maintenance of web records.
Approaches to choosing a system to manage web records include:
The majority of websites today are dynamic, data-driven sites, published and managed by a CMS. While all references in this document relate to web publishing using a CMS, the techniques described also apply to static ‘hand-coded’ websites as creating and publishing web pages is in essence the same as that performed in an automated fashion by a CMS.
Further information about choosing a system to manage web records is discussed in section 3 of this guide.
Web records are subject to loss and inaccuracy as they change frequently over time. Web content may also be short-lived in its published form, and the process of publication may differ from other formats in the scrutiny it receives before being published.
Assessing potential risks to web records and finding ways to mitigate these through implementing systems, procedures and training will benefit organisations by improved access to and confidence in the quality of information. Good management of web
records can mitigate against business risk.
These risks can include:
|Benefits of Managing Risks to Web Records||Benefit of Risk Mitigation|
|Inability to discover and access records for business purposes
Records are not managed and are lost due to severe system failures or are subject to unauthorised disposal
|Improved discovery and retrieval of web records in a timely manner
Records disposed of in a managed way in accordance with approved disposal authorities
|Organisation cannot demonstrate good performance, increased efficiencies or levels of service delivery.||Enhanced visibility of business outputs and performance measures|
|Records are not managed and are lost in system upgrades or migrations||Improved ability to migrate, access and manage website information and records|
|Inability to retrieve and interpret records in obsolete formats and systems||Improved ability to migrate, access and manage website information and records|
|Organisational embarrassment or loss of credibility||Enhanced confidence in public sector records|
|Inability to transfer or manage data across systems.||Increased opportunities for automation and process efficiency.|
|Data in multiple repositories.
Disconnection of workflow, business inefficiency.
|Enhanced capability for organisational access to timely, complete and authoritative information.
Improved business efficiency and use off staff time.
|Inability to record and manage web records in a manner complying with regulatory or statutory requirements.||Good management of web records in line with relevant legislation (PRA, OIA, and Privacy Act).|
Appraisal of web records ensures consistency of high level decision making with regard to the management and disposal of public records and supports good records management. Web records should be appraised and disposed of either using an existing disposal authority or by undertaking a specific web appraisal resulting in the approval of an agency specific disposal authority. This will be determined by the way web records are created and maintained, the values inherent in their content and context, and the functions and activities they document.
The appraisal of web records requires all elements of a website be considered, including pages, content items, files, background processes, look and feel, automated or manual transactions, and evidence of business activity in creating and maintaining websites.
The appraisal process generally does not consider the format of the record as this does not usually have a bearing on its value. However some web records may be defined by format. In this case, the appraisal process should identify where the format of the
record provides specific recordkeeping requirements. For example, information that is part of the organisational record as a static document, but which is also offered on the website with interactive functions, such as user comments acting as consultation feedback, may need to kept as a record because of the additional functionality provided by the website.
The appraisal process may also be driven by format where an organisation manages a decommissioned website. In this case it is possible all or part of the website may be archived for long term preservation through a web harvest undertaken by the National
Library of New Zealand. This would be on a case by case basis as technological and resource constraints may mean some sites will be unable to be archived, or sit outside the scope of the National Library’s collection policy. In cases where websites can be harvested and archived in the National Digital Heritage Archive, once an organisation is satisfied they no longer need to keep the website for recordkeeping and business purposes a public office can use Archives New Zealand’s General Disposal Authority (GDA) 4 Administration and Corporate Services Records to destroy the whole site.
For information on the use of GDA 4 and the appraisal and authorised disposal of public records and local authority protected records see section 3.2 of this guide and the Archives New Zealand website's Appraisal and Disposal section.
Records created by public offices and local authorities, including web records, should meet the access requirements of all relevant legislation. Key legislation that informs access to information includes, but is not limited to, the Public Records Act 2005, the Official Information Act 1982, the Local Government Official Information and Meetings Act 1987, and the Privacy Act 1993.
When establishing access requirements, ensure you also consider web records that are not freely accessible on public websites. This includes decommissioned websites are no longer publically accessible and information on intranets and extranets. If an organisation chooses to maintain and archive web records in a separate system from other organisational records they will need to ensure access obligations are met.
Electronic systems that control records should allow an administrator to set granular permissions, based on groups and roles, to control access to records. The Archives New Zealand S4 Access Standard contains a list of minimum access requirements for records managed in electronic systems and applies to web records and archived web sites.
For further information on determining access to public records and local authority protected records see Archives New Zealand’s Making Decisions on the Access Status of Public Records.
Digital information must be proactively managed and cared for to ensure that it is accessible for as long as it is needed, for both current and ongoing business purpose and as archives in the future. Ongoing and managed processes need to be in place when maintaining records over time to ensure digital continuity. This manages the risks associated with unauthorised access to records as well as to events which may damage or destroy them.
Key principles to consider for ensuring continuity of digital information are:
Ensuring continuity of web records is particularly important when websites are migrated (migration is discussed further in section 3.4 of this guide) between content management systems or when a website is decommissioned. Planning and testing for continued access to records in these situations should be part of the migration or decommissioning process. Storage of records on carrier formats, such as tape or DVD, have a high risk of loss or inaccessibility, and this should be considered particularly when determining storage of decommissioned websites.
Developing an organisational approach to web records involves managing them as part of a recordkeeping framework. A recordkeeping framework is a combination of people, policies, procedures, methods, technology, institutional culture, data and knowledge. It is a strategy an organisation develops to assess needs, implant recordkeeping practices, manage change and implement software and technology support.
Consider the creation and maintenance of web records when developing recordkeeping policies to support the organisation’s objectives and main functions. This involves stating the principles of managing web records, and identifying the strategies used by the organisation to manage those records. Define responsibility for their capture and management, and communicate this to all staff. Also define monitoring and review processes to ensure the policy is kept current and continues to support business needs.
Minimum requirements for creating and maintaining records, including further information on recordkeeping frameworks, can be found in Archives New Zealand’s mandatory S7 Create and Maintain Recordkeeping Standard.
Document any specific systems, processes and tools used by your organisation to create, maintain and dispose of web records in relevant recordkeeping procedures. All staff should receive training on these policies and procedures.
Further information on developing a recordkeeping policy can be found in Archives New Zealand’s G6 Guide to Developing a Recordkeeping Policy.
It is good business practice to sentence records (implement authorised disposal requirements) as close as possible to the point of their creation so they can be managed according to their value and retained for no longer than is necessary. This is particularly important with electronic records such as web records, where adherence to retention periods can avoid unnecessary migration costs or retention of unnecessary data volume.
A CMS may be used to automate part of the sentencing and disposal process by utilising the content scheduling function to delete a record according to the assigned disposal class. Not all CMS will have scheduling functionality included. In some systems existing functionality may need to be modified to meet requirements. When organisations are considering using a CMS for web records management, consideration should be given to the disposal functionality required. In an electronic recordkeeping
system, disposal can be linked to the classification structure, and the disposal class automatically applied to a record when it is captured in the system. For further guidance on sentencing records see Archives New Zealand’s G10 Guide to Implementing a Disposal Schedule.
Sentencing of web records should be done using current approved disposal authorities to avoid unauthorised disposal of public records and local authority protected records.
The following types of disposal authorities can be used by public offices to dispose of web records:
The following types of disposal authorities can be used by local authorities to dispose of web records:
Web records, irrespective of the system in which they are managed, must havemetadata. Applying metadata to information ensures that the information has meaning, can be found when needed, can be relied upon to be what it purports to be, and can
be moved safely from one system to another.
Much of the metadata associated with web records will be generated by a CMS and should be captured as part of record creation and the ongoing management process. Consider what metadata is needed to adequately describe web records when new CMS are being implemented or undergo a major upgrade. Web records being managed in other electronic systems should also have metadata applied to them.
Recordkeeping metadata has two broad categories: point of capture metadata and recordkeeping process metadata.
Point of Capture Metadata
Point of capture metadata documents the initial context of a record’s creation. It is captured at the time of the record’s creation and ingested/registered into the system which manages it. For web records, this is likely to be within the CMS. Once registered, this metadata is fixed and should not be altered.
Recordkeeping Process Metadata
Recordkeeping process metadata is information about the process of managing records. It ensures the integrity and authenticity of the record so that alterations, linkages, and uses of the record can be authoritatively tracked over time. For web records, this
information may be recorded by the CMS or by another specialist recordkeeping system.
Regardless of what systems are used to create and manage web records the metadata requirements should be assessed against Archives New Zealand’s mandatory Electronic Recordkeeping Metadata Standard and accompanying Technical Specifications for Electronic Recordkeeping Metadata Standard.
For web records the following requirements from the Electronic Recordkeeping Metadata Standard and the accompanying Technical Specifications for Electronic Recordkeeping Metadata Standard are particularly relevant:
Metadata in all business critical systems/applications which create records must be mapped to the recordkeeping metadata schema in the technical specifications
|See Electronic Recordkeeping Metadata Standard and Technical Specifications for Electronic Recordkeeping Metadata Standard|
Recordkeeping metadata must be assigned to all record objects and aggregations
|See G14 Technical Guide to Implementing Recordkeeping Metadata in EDRMS|
At point of capture of a record object, minimum recordkeeping metadata must be attributed
|Minimum point of capture recordkeeping metadata including a unique identifier, a name, date of creation who created the record, what business is being conducted, and the creating application name and version|
For each action undertaken on a record, the following minimum recordkeeping process metadata must be maintained
|Minimum recordkeeping process metadata including the date of the action, identification of the person or system undertaking the action, and what action was undertaken|
Recordkeeping metadata must be persistently linked with a record for its entire period of retention
|Electronic Recordkeeping Metadata Standard|
Recordkeeping metadata must accompany record objects being transferred from their original creating environment or systems
|Electronic Recordkeeping Metadata Standard|
Recordkeeping metadata must be protected from unauthorised disposal
|Electronic Recordkeeping Metadata Standard|
Migration means moving a web record from its current system (e g, the current CMS) to a new system (system replacement) or as part of a significant version upgrade of a system (often called a technology refresh). A migration strategy ensures that archived material remains available within the currently used system. Successful migration strategies enable records to be maintained over time when they are moved from legacy systems to current systems.
Migration strategies are most useful when a CMS is being used as a records management system, as required metadata is maintained and access is ensured.
A common example of where a well implemented migration strategy is of benefit, is when material archived in a CMS is migrated along with current material to a new CMS during a system replacement project.
Successful migration involves consideration of the following records management elements:
Website content management systems (CMS) may be used to manage web records by building in additional recordkeeping functionality. In order to use a CMS as a records management system for web records, the CMS must meet the minimum requirements of Archives New Zealand’s mandatory Electronic Recordkeeping Metadata Standard and Create and Maintain Recordkeeping Standard
Management of web pages, content fragments, transactions, or resources (web elements) as a record within a CMS requires more than simply keeping versions of an element. The above standards require that the system can:
Most CMS products have the ability to be modified to suit end user requirements, and when modified for recordkeeping requirements, may be suitable for managing web records. Many recordkeeping processes are similar to web publishing processes, and modifications to existing features, often allow for additional recordkeeping specific information or functions. While modifications should be made as automated as the system allows, some may require manual end user input.
The following are examples of modifications that could be required:
|Modification to meet Recordkeeping Requirements||Example(s) of Modification|
|Recordkeeping specific metadata requirements||Adding additional metadata fields to web elements allows specific recordkeeping metadata to be added at the time of a record’s creation and meets requirements 8 and 9 of Archives New Zealand’s Electronic Recordkeeping Metadata Standard
The ability to control the metadata elements to conform to specified formats may also be possible in many systems and would be desirable to implement
|Creation and persistent linking of metadata||Element creation process modified to include point of capture metadata to meet requirement 8 of Archives New Zealand’s Electronic Recordkeeping Metadata Standard
This can be achieved by adding specific recordkeeping metadata fields to page settings, content snippets, and resources
|Recordkeeping process metadata||Versioning and rollback functionality modified to capture required recordkeeping process metadata to meet requirement 9 of Archives New Zealand’s Electronic Recordkeeping Metadata Standard
This can be achieved by adding a metadata field allowing a description of the change made to the version log conducted, and the creating application name and version
|Record identification||Generate and validate unique identifiers according to the organisation’s requirements for records, record extracts and aggregations (including volumes) to meet requirement 8 of Archives New Zealand’s Electronic Recordkeeping Metadata Standard|
|Security||Have appropriate roles and permissions added to elements for recordkeeping purposes to protect against unauthorised access and tampering. Ensures records are what they purport to be|
|Record aggregation, management, and reporting||Creation of reporting, analysis, and management interfaces. These allow aggregation, retention and disposal activities, review and management of records, metadata (including reclassification), and volume management|
|Retrieval and rendering of records||Implementation of search or reporting to allow retrieval and rendering of records, recording of transactions, and recordkeeping metadata|
|Workflow||Implement recordkeeping specific workflows to enable records management, such as notifications, creation/capture of a record by multiple users|
|Persistent linking of recordkeeping metadata to records||Ensure persistent linking of all record elements by creating permanent pointers. The processes to maintain these links must remain over time
Meets requirement 11 of Archives New Zealand’s Electronic Recordkeeping Metadata Standard
Web records may be managed within an existing electronic recordkeeping system. Records transferred from the website into the electronic recordkeeping system must retain the characteristics of the record. For example, if time span is a characteristic, this information must be part of the record saved in the electronic recordkeeping system. The format of the record is relatively unimportant unless format has been identified as a characteristic of the record (e.g. an interactive characteristic must be maintained).
Simple manual processes such as creating a static PDF or HTML file of a web page, and adding this file plus required recordkeeping metadata to the electronic recordkeeping system may well suffice provided changes made to the record over time are also identified and added to the system.
More automated processes, such as using records within an electronic recordkeeping system as the content source for publishing web records, would help in the capture and management of records prior to their publication on the web. However, care must be taken to ensure any records created or modified by the website, such as transactions, are then captured and added to the electronic recordkeeping system.
Electronic recordkeeping systems are typically structured towards managing documents whereas web records may be a variety or combination of file types, data or presentation layers. The format and dynamic quality of some web records may create implementation issues depending on the capabilities of the electronic recordkeeping system.
Using a CMS and electronic recordkeeping system in an integrated manner to manage web records relies on functionality existing in both systems that can be developed to meet integration requirements.
Several CMS products offer document management capabilities, and some electronic recordkeeping products offer web publishing capabilities. In practice most organisations implement ‘best of breed’ products for the primary purpose of the system (e.g. a CMS is assessed and purchased primarily for its suitability for the process of web publishing). The use of separate products tends to require development to connect these systems to meet organisational requirements for systems integration.
Connectors such as Application Programming Interfaces (APIs), which allow for the development of code that interfaces with a system or software package and enables development of communication with other systems or software packages, can be used
to integrate CMS and Electronic recordkeeping systems. APIs have made integration strategies for web records management more reasonable than in the past, however the development work required to implement such a strategy is still significant.
The advantage of a successful integration strategy is that records are available and managed seamlessly, and the required recordkeeping metadata and records management processes are applied throughout the life of the record. Disadvantages
can include the high cost of implementation, and the relative difficulty of ensuring seamless records management processes are working. This is a result of visibility of processes being lower when more than one system is involved.
When a whole website or part of a website is decommissioned it may be necessary to keep a copy of the website to ensure the records associated with it can be referred to for recordkeeping and business purposes. Using the web archiving methods described below may also form part of a recordkeeping strategy for web records.
Some archiving processes can capture web records that might not be captured by other systems and processes available. For example, if web records are managed by a CMS which does not provide rollback functionality it could be necessary to take a snapshot of the website at determined intervals to capture the look and feel. Determine these intervals by how often the look and feel of a website is refreshed and any risks associated with the information on the website.
Websites may also be archived in order to preserve them long term for cultural history purposes. Websites captured for this reason might not always be managed by the creating public office or local authority. In some cases the website may be harvested through the National Library of New Zealand’s web harvesting programme. For more information about this programme see the National Library’s website.
Although having a website harvested by the National Library of New Zealand does not meet the recordkeeping requirements for an individual organisation, it may inform decisions around how to manage web records that are required to be kept as public
archives. Websites captured by the National Library’s web harvesting programme may not need to be kept as an archive elsewhere. See section 2.5 and 3.2 of this guide for further discussions on the principles of the appraisal and disposal of web records and instances when copies of websites harvested by the National Library can be destroyed.
Harvesting is the process of capturing a whole website or specified portions of a website, usually achieved by utilising site crawler tools. Crawlers are software packages or pieces of code that index or copy websites in a methodical manner, then save the selected elements as static pages to disk. The resultant data is a snapshot of the site at a known point in time.
In order to examine or search the content of the site, a curatorial tool must be used. Curatorial tools are a piece of software designed to manage digital archives. At a minimum they allow access and searching of a digital archive.
The scope of a harvest is an important factor when utilising a crawler to manage web records and crawlers may be directed or given parameters for operation. The elements ‘time’, ‘extent’ and ‘functionality of information’ -identified as part of the appraisal
process -ensure the right amount of data is collected, at the appropriate intervals and with the required metadata and functionality remaining intact, to meet recordkeeping requirements. Harvest results should be quality tested and appropriately managed using a curatorial tool.
Transactional logging is the recording of actions that occur to a web page, information or artefact. Almost all CMS products enable the recording of actions known as a transaction. Actions may be the creation of a new page, the publishing of a new
content item, or the submission of a form. Collated lists of transactions are the transaction logs. They are often saved to a database table or text file within the application that generates the transactions.
An individual transaction or group of transactions may themselves be records where they are required to be kept for legal or evidentiary reasons. The transaction may generate some form of information (e.g. automated email responses), which may in
turn need to be captured out of the process and stored in a place where it can be retrieved and accessed.
Transaction logs are typically system dependant, so logs managed as records are generally only accessible as long as the application that created them is still available. The risk of obsolescence or dependencies on content or artifacts that are not available requires that transaction log archiving is regularly tested to ensure accessibility and availability of the records over time.
Normalisation (or format migration) means accessing a web record in a different format from that in which it is currently held. A normalisation or migration strategy ensures archived material remains available independent of the system that created it, yet important elements of the record, such as look and feel and context are retained.
Normalisation usually involves converting a record to another format (either a static document format such as PDF or as an HTML document) or using an emulation tool to read an obsolete or uncommon format. Format migration should consider issues such as Unicode normalisation or data normalisation to ensure continuity.
Normalisation strategies are most useful when using an electronic recordkeeping system to manage web records. It allows records to be stored as files with associated metadata, a process for which an EDRMS is most suited. Most records managed with normalisation strategies are treated as final when committed to the recordkeeping system, although this does not preclude updating the record as long as these changes are recorded.
Successful digital preservation requires the following elements of records management be considered:
The Digital Continuity Team at Archives New Zealand can provide further advice about digital preservation approaches.
For further recordkeeping advice and guidance contact Archives New Zealand:
Phone: (04) 499 5595
Issued September 2009