This readme file describes the basic structure of a Transfer package generated by Archivematica.
Archivematica is an open-source suite of tools designed to ingest diverse digital content and prepare archival information packages (AIP) for long-term storage and preservation. The contents of this Archivematica Transfer Package have undergone some initial processing before being placed into backlog where it awaits further processing. At some point in the future, this package will be retrieved from its backlog location and undergo final processing actions before being converted by Archivematica into an AIP and placed in long-term archival storage.
An Archivematica transfer package is organized into a bag in accordance with the IETF Trust BagIt File Packaging Format, using the following tree structure.
(1) archivematica_transfer-name-e3a3988d-8149-49ea-adc5-c255fb68d4f9 (2) ├── bag-info.txt (3) ├── bagit.txt (4) ├── manifest-sha512.txt (5) ├── tagmanifest-md5.txt (6) └── data (7) ├── logs (8) ├── metadata (9) ├── submissionDocumentation (10) ├── METS.XML (11) ├── objects (12) ├── processingMCP.xml (13) └── README.html
(1) transfer package root directory, with an appended UUID
(2)-(5) standard packaging files produced in accordance with the BagIt specification.
(6) data directory - this is also a standard directory specified by the BagIt specification. The data directory contains the original content transferred into Archivematica along with additional files produced by Archivematica during transfer processing
(7) logs directory - contains log outputs of some of the tools that Archivematica uses in generating the transfer package
(8) metadata directory - contains information about the transferred objects including pre-transfer checksums and, if any, descriptive metadata and rights metadata in CSV format.
(9) submissionDocumentation - a subfolder of the metadata directory, it contains documentation that describes the context within which content has been transferred to Archivematica (e.g. a donor's agreement). This directory also contains the METS.xml file.
(10) the transfer package METS.XML file
(11) objects directory - contains the original digital objects transferred to Archivematica
(12) processingMCP.xml file - a configuration file used by Archivematica to determine which features or options to enable during processing.
(13) this README file
The METS.XML file captures information about the processing actions taken on the content during Archivematica's transfer processing. This information is used by Archivematica to restore transfer packages from backlog and continue their processing as submission information packages and into archival information packages.
METS is maintained by the Library of Congress, which defines it as "a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium." The METS file typically consists of the following standard METS sections:
<mets:metsHdr> (METS header): basic information about the METS file;
<mets:dmdSec> (descriptive metadata section): descriptive metadata about the digital objects;
<mets:amdSec> (administrative metadata section): technical and provenance information about the digital objects;
<mets:fileSec> (file section): a list of the digital objects and an indication of their role in the AIP (original, preservation, metadata, submission documentation, license etc.);
<mets:structMap> (structural map): a physical or logical ordering of the digital objects. As of Archivematica version 1.8, all AIP METS files contain a logical structural map which lists all files and directories including empty directories. The physical structural map accurately documents the final directory structure of the AIP. Because AIPs conform to the BagIt specification and because bags cannot contain empty directories, the logical structural map labelled "Normative Directory Structure" indicates how the AIP should be structured when the empty directories are reconstructed.
The technical and provenance information in the METS amdSec is recorded as PREMIS metadata. PREMIS is also a Library of Congress standard, and is described as "the international standard for metadata to support the preservation of digital objects and ensure their long-term usability." The PREMIS entities are wrapped in the METS file as follows:
<mets:amdSec>
--<mets:techMD> (technical metadata)
----<premis:object> e.g. UUID, size, checksum, format, original name, extracted technical metadata
--<mets:digiprovMD> (digital provenance metadata)
----<premis:event> e.g. ingestion, message digest calculation, virus scan, format identification, validation, normalization, fixity check
----<premis:agent> for each PREMIS Event there are three associated Agents: the organization, the digital preservation system (e.g. Archivematica 1.x) and the logged-in user
--<mets:rightsMD> (rights metadata)
----<premis:rights> Rights pertaining to the preservation, reproduction and use of the preserved digital objects (only included if the user added rights metadata prior to or during ingest)
The fileSec and structMap use identifier attributes to link a digital object to its amdSec and (if used) dmdSec. For example, if a file entry in the fileSec has the attribute ADMID="amdSec_1" this means that the amdSec with the identifier amdSec_1 contains the administrative (i.e. technical and provenance) metadata for that file.