Future-proofing XMP identification schema

A developer and researcher working on PDF technologies for more than 20 years, Peter is the PDF Association’s CTO and an independent technology consultant.


All ISO standardized subsets of PDF utilize XMP-based identification metadata, enabling files to claim their conformance to a specific part of one or more ISO standards. As a result of changes in ISO policy, the defined XMP field names have evolved.
The presence of PDF standardized subset identification data in the document’s XMP metadata is never a guarantee of the file’s conformance. Any such guarantee requires specialized validation software to perform a detailed set of checks, and, in the case of PDF/UA, human confirmation of so-called “human checks”, all of which typically result in a compliance report and an overall pass/fail result. Failure to account for future ISO publications can provide users with inaccurate information.
PDF/UA
To ensure that the correct version of the ISO 14289 standard is being referenced, PDF/UA validation software must process all currently defined PDF/UA identification metadata fields to ensure the correct PDF/UA ISO publication is being referenced. If an unknown (unsupported) ISO publication is detected, whether it is a new part, a new amendment, or an unrecognized year, then the validation software must inform the user, as validation results may be incorrect due to the lack of support. Although unlikely, it might also be incorrect metadata, so providing options to the user may be an appropriate solution for interactive software.
XMP Metadata Field | XMP Field Type | PDF/UA-1 ISO 14289-1:2012 (withdrawn) |
PDF/UA-1 ISO 14289-1:2014 (PDF 1.7) |
PDF/UA-1 Amd 1 ISO 14289-1:2014 Amd1:202x (proposed, PDF 1.7) |
PDF/UA-2 ISO 14289:2024 (PDF 2.0) |
Comment |
---|---|---|---|---|---|---|
pdfuaid:part | Integer | 1 |
1 |
1 |
2 |
Required. Part Number: 1, 2, … |
pdfuaid:amd | Text | 1:202x |
Not defined | Optional. Amendment number, COLON, 4-digit year of publication. |
||
pdfuaid:corr | Text | Not defined | Effectively deprecated as ISO no longer publishes technical corrigenda. | |||
pdfuaid:rev | Integer | Not defined | Not defined | Not defined | 2024 |
Required for PDF/UA-2. 4-digit year of publication. |
All PDF/UA ISO publications require the schema namespace prefix to be pdfuaid with a namespace URI of “http://www.aiim.org/ pdfua/ns/id/
” and the ISO 14289 part number (pdfuaid:part).
With PDF/UA-2, support for ISO dated revisions was added by requiring the year of publication (pdfuaid:rev), with the current 2024 edition identified by the value 2024
.
Currently, ISO allows dated revisions (as indicated by the presence of pdfuaid:rev containing a 4-digit year date of the dated revision publication that will be after 2024) and a maximum of two Amendments (as indicated by the presence of pdfuaid:amd containing the amendment number, a COLON (:
), and 4-digit year of publication).
ISO no longer allows Technical Corrigenda, so pdfuaid:corr will never be used, and no Technical Corrigenda exist for any version of the PDF/UA standard.
Missing metadata fields are equivalent to blank (empty) metadata (e.g., <pdfuaid:rev></pdfuaid:rev>
).
To assist developers, we are providing 3 sample PDF/UA files (ZIP) containing unexpected XMP metadata values:
- PDF-UA-1-pdfuaid-amd.pdf is a PDF/UA-1 file where pdfuaid:amd is "
1:2099
" - PDF-UA-2-pdfuaid-rev.pdf is a PDF/UA-2 file where pdfuaid:rev is "
2099
". - PDF-UA-9-pdfuaid-part.pdf where pdfuaid:part is "
9
".
PDF/A
Let's review the equivalent options for PDF/A XMP metadata, as defined in the ISO 19005 series of standards.
XMP Metadata Field | XMP Field Type | PDF/A-1 ISO 19005-1:2005 (PDF 1.4, inc. 2 corrigenda) | PDF/A-2 ISO 19005-2:2011 (PDF 1.7) | PDF/A-3 ISO 19005-3:2012 (PDF 1.7) | PDF/A-4 ISO 19005:2020 (PDF 2.0) | PDF/A-4 dated revision |
Comment |
---|---|---|---|---|---|---|---|
pdfaid:part | Integer | 1 |
2 |
3 |
4 |
4 |
Required. |
pdfaid:rev | Integer | Not defined | Not defined | Not defined | 2020 |
202x |
Only defined for PDF/A-4. Required for PDF/A-4. 4-digit year of publication. |
pdfaid:amd | Text | Not defined | Not defined | Optional. Amendment number, COLON, 4-digit year of publication. | |||
pdfaid:corr | Text | Not defined | Not defined | Not defined | Effectively deprecated as ISO no longer publishes technical corrigenda. | ||
pdfaid:conformance | Closed text | A , B |
A , B , U |
A , B , U |
none, F , E |
none, F , E |
Required by PDF/A-1, PDF/A-2, and PDF/A-3. Optional in PDF/A-4. |
This same awareness for potential new parts, new amendments, or dated revisions (i.e., unrecognized years) in the corresponding XMP identification metadata applies to all ISO standardized subsets of PDF – including PDF/UA (ISO 14289), PDF/A (ISO 19005), and PDF/X (ISO 15930). See each ISO publication for details.
Staying informed
ISO publications become publicly available at their draft (DIS) stage, to provide all stakeholders (including implementers) with insight into forthcoming changes. Through the PDF Association’s Category A Liaison with ISO TC 171 SC 2 its members may also access and comment on all ISO drafts prior to the DIS stage. PDF Association TWGs provide forums for all stakeholders to discuss and provide feedback to ISO work groups.
Implementers should never complete their implementations against ISO drafts, since further changes can occur before final publication. Software should only be released against the final ISO publications. Final publications are not automatically available to either ISO TC 171 SC 2 experts or PDF Association members; however, through a special agreement with ISO and the generosity of some Association members, sponsored final PDF/UA publications are available at no cost.
PDF/A-4 and PDF/X-6 are both currently being revised in their respective ISO working groups, with new dated revisions of ISO 19005-4 and ISO 15930-6 (respectively) likely to be published soon. The XMP metadata in files that conform to these dated revisions will be required to specify different years in their XMP metadata (i.e., it will no longer be the 2020 edition!), and thus validation software should be prepared to detect these updates and validate accordingly.
Conclusion
When performing validation, software must always check all defined XMP metadata fields defined in any of the ISO standards for each ISO standardized subset. This includes XMP fields that were originally defined as reserved “for future use”, such as pdfuaid:rev and pdfuaid:amd for PDF/UA, or pdfaid:rev and pdfaid:amd for PDF/A.
If any of the defined XMP fields contain values that are unexpected (i.e., do not match the validator’s implementation), a warning message should be generated or the user informed in some way. This ensures software will not incorrectly validate non-conforming files (a false positive failure) or incorrectly invalidate conforming files (a false negative failure) for PDF files “from the future” (e.g., when created against new dated revisions or amendments to existing ISO standards).