Archivematica AIP Structure

This readme file describes the basic structure of an Archival Information Package (AIP) generated by Archivematica.

Acronyms

AIP = Archival Information Package

METS = Metadata Encoding and Transmission Standard

OAIS = Open Archival Information System

PDI = Preservation Description Information

PREMIS = Preservation Metadata Implementation Strategies

UUID = Unique Universal Identifier

Introduction

Archivematica is an open-source suite of tools designed to ingest diverse digital content and prepare AIPs for long-term storage. Once an AIP is generated it is not dependent on Archivematica for retrieval, and can be opened using any standard file browser. The concept of an AIP is derived from the ISO 14721:2012 Reference Model for an Open Archival Information System (OAIS), which defines it as "[a]n Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved within an OAIS."

Content Information

In an Archivematica AIP, the Content Information consists primarily of the originally ingested digital objects and any preservation versions of the objects created to mitigate the risk of format obsolescence over time. The preservation copies typically have the same filenames as the original objects but with different file extensions and with UUIDs appended to the filename. For example, for an original file named BBhelmet.ai the preservation version may be named BBhelmet-e3a3988d-8149-49ea-adc5-c255fb68d4f9.pdf.

The originally ingested digital objects and any preservation versions are located in the objects directory of the AIP. There will be nested subdirectories in the objects directory if these subdirectories were included in the original transfer or added during SIP arrangement. The objects directory also includes a submissionDocumentation folder and a metadata folder. The submissionDocumentation folder contains documentation such as donor agreements and transfer forms, if these are included in the AIP, as well as a METS file that records the contents of the original transfer(s) from which the AIP was created. The metadata folder holds any metadata files included in the original transfer(s), and any OCR text files generated during processing.

Preservation Description Information (PDI)

The PDI in an Archivematica AIP is recorded in a METS XML file. METS is maintained by the Library of Congress, which defines it as "a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium." In the Archivematica AIP the METS filename is composed of the name METS with a UUID file extension and an XML file extension; for example METS.0ad8cdab-dbbf-4863-8a4d-9a675c227216.xml. The METS file typically consists of the following standard METS sections:

<mets:metsHdr> (METS header): basic information about the METS file;

<mets:dmdSec> (descriptive metadata section): descriptive metadata about the digital objects;

<mets:amdSec> (administrative metadata section): technical and provenance information about the digital objects;

<mets:fileSec> (file section): a list of the digital objects and an indication of their role in the AIP (original, preservation, metadata, submission documentation, license etc.);

<mets:structMap> (structural map): a physical or logical ordering of the digital objects. As of Archivematica version 1.8, all AIP METS files contain a logical structural map which lists all files and directories including empty directories. The physical structural map accurately documents the final directory structure of the AIP. Because AIPs conform to the BagIt specification and because bags cannot contain empty directories, the logical structural map labelled "Normative Directory Structure" indicates how the AIP should be structured when the empty directories are reconstructed.

The technical and provenance information in the METS amdSec is recorded as PREMIS metadata. PREMIS is also a Library of Congress standard, and is described as "the international standard for metadata to support the preservation of digital objects and ensure their long-term usability." The PREMIS entities are wrapped in the METS file as follows:

<mets:amdSec>

--<mets:techMD> (technical metadata)

----<premis:object> e.g. UUID, size, checksum, format, original name, extracted technical metadata

--<mets:digiprovMD> (digital provenance metadata)

----<premis:event> e.g. ingestion, message digest calculation, virus scan, format identification, validation, normalization, fixity check

----<premis:agent> for each PREMIS Event there are three associated Agents: the organization, the digital preservation system (e.g. Archivematica 1.x) and the logged-in user

--<mets:rightsMD> (rights metadata)

----<premis:rights> Rights pertaining to the preservation, reproduction and use of the preserved digital objects (only included if the user added rights metadata prior to or during ingest)

The fileSec and structMap use identifier attributes to link a digital object to its amdSec and (if used) dmdSec. For example, if a file entry in the fileSec has the attribute ADMID="amdSec_1" this means that the amdSec with the identifier amdSec_1 contains the administrative (i.e. technical and provenance) metadata for that file. The fileSec also uses a group identifier attribute to indicate relationships between digital objects. For example, if file A in fileGrp "USE=original" and file B in fileGrp "USE="preservation" both have the group identifier attribute "Group-269b494d-01cb-451b-8d5e-590d57126d3d", then file B is a preservation version generated from file A.

AIP structure

An Archivematica AIP is packaged into a bag in accordance with the IETF Trust BagIt File Packaging Format, and contains some contents not described in the sections above. This tree structure depicts a typical Archivematica AIP:

(1) AIP-name-e3a3988d-8149-49ea-adc5-c255fb68d4f9 
(2)  ├── bag-info.txt 
(3)  ├── bagit.txt 
(4)  ├── manifest-sha512.txt 
(5)  ├── tagmanifest-md5.txt 
(6)  └── data 
(7)        ├── logs 
(8)        ├── objects 
(9)        ├── thumbnails 
(10)       ├── METS.0ad8cdab-dbbf-4863-8a4d-9a675c227216.xml 
(11)  	   └── README.html

(1) AIP root directory, with an appended UUID

(2)-(5) Standard packaging files produced in accordance with the BagIt specification.

(6) data directory - this is also a standard directory specified by the BagIt specification. The data directory contains the AIP Content Information and PDI

(7) logs directory - contains log outputs of some of the tools that Archivematica uses in generating the AIP

(8) objects directory - contains the original digital objects as well as normalized versions

(9) thumbnails directory - contains thumbnails generated from the original object for use in the Archivematica user interface

(10) the Archivematica METS file

(11) this README file