
Discovering PDF metadata
Peter Wyatt shares his custom metadata panels – very useful for discovering ISO standards and PDF Declarations metadata! (until vendors catch up!)
BUSINESS NOTE
As the use of XMP metadata expands, presenting this information to users in a clear and meaningful way is an unmet challenge, both now and for the future of PDF.
BUSINESS NOTE
As the use of XMP metadata expands, presenting this information to users in a clear and meaningful way is an unmet challenge, both now and for the future of PDF.
With the recent announcements of the latest LibreOffice 24.2 release providing support for dual conformance levels for PDF/A and PDF/UA-1, as well as Well-Tagged PDF with its use of PDF Declarations and some example PDFs (including a dual PDF/A-4 and PDF/UA-2 conformant file), the importance of easy access to XMP metadata in PDF files is more important than ever. Soon, ISO will also publish dated revisions to both PDF/A-4 and PDF/X-6 which will be indicated via new values in existing ISO-defined XMP metadata fields.
NOTE (18 April 2014): due to PDF Errata #395, the Acrobat custom file info panels for PDF Declarations have been updated in GitHub and new versions are now available. Just overwrite existing files with the newer versions.
It is common practice for many PDF applications to provide banner-style indicators when PDF files declare conformance with certain ISO standards such as PDF/A or PDF/X. In addition, these same applications may decide to protect these PDFs by opening the files in a read-only manner to help users avoid accidental edits that may invalidate the file’s conformance.
However, some PDF applications have not yet generalized their support to detect new versions or dated revisions of ISO standards! This means that such software does not protect these files until the vendor releases updates to their software.
The design of XMP metadata for each existing family of ISO standards for subsets of PDF is both forward- and backward-compatible. This means that even if old software is accidentally used to open a newer PDF then that software can know that a file is declaring conformance with a standard, even if the software is unaware of the specifics.
PDF ISO Standard | XMP namespace | XMP field |
---|---|---|
PDF/A (ISO 19005) | pdfaid | pdfaid:part |
pdfaid:rev | ||
pdfaid:amd | ||
pdfaid:corr | ||
pdfaid:conformance | ||
PDF/UA (ISO 14289) | pdfuaid | pdfuaid:part |
pdfuaid:rev | ||
pdfuaid:amd | ||
pdfuaid:corr | ||
PDF/X (ISO 15930) | pdfxid | pdfxid:GTS_PDFXVersion |
PDF/VT (ISO 16612) | pdfvtid | pdfvtid:PDFVTVersion |
pdfvtid:GTS_PDFVTModDate | ||
pdfvtid:rev | ||
PDF/E (ISO 24517) | pdfe | pdfe:ISO_PDFEVersion |
PDF/VCR (ISO 16613) | pdfvcrid | pdfvarid:ISO_PDFVCRVersion |
Acrobat’s custom file info panels
Viewing XMP metadata is not for the uninitiated - it is complex XML data that can be voluminous and hard to navigate. Luckily at least one vendor (Adobe) provides a basic mechanism that allows for extension of their XMP dialog with custom panels, making it possible for 3rd parties to provide enhanced access to a PDF file’s XMP data.
The format for these custom XMP panels is somewhat poorly documented in Adobe's "XMP Custom Panels", dated October 2003, which I can no longer find on the Adobe website but which still works in Adobe Acrobat on both Windows and Mac. Here is a link to that documentation, which describes the custom XML and syntax that defines additional dialog panels accessible from within Acrobat’s XMP “Additional Metadata” dialog.
Based on this documentation (such as it is), I have created a custom file info panel (see this GitHub repository) that will display all ISO-defined XMP metadata for PDF subset standards (as listed in the table above), but only when the document XMP metadata uses the ISO-recommended XMP namespaces. This works even with PDF files that are not automatically recognized by Adobe Acrobat and do not present a banner when initially opened. The data shown is the raw data from the XMP and is not checked or validated by the panel.
PDF Declarations
PDF Declarations are an industry-defined specification allowing PDF files to declare their conformance to external standards or specifications, such as WCAG or HIPAA, in the document-level XMP metadata. By using more complex sets-within-sets of XMP fields, a single PDF file can contain multiple PDF Declaration claims for multiple external standards.
I have prototyped a custom file info panel for PDF Declarations that’s limited to 3 declarations, each with 2 claims, in order to reasonably fit on most displays. The PDF Declaration specification does not impose these limits. The data shown is the raw data from the XMP and is not checked or validated by the panel.
Note that due to differences in the underlying Acrobat technologies between Windows and Mac platforms, there are 2 versions of this custom panel. See the README for more information.
To install the panels, simply follow the instructions to copy the appropriate XML files to specific folders that Acrobat uses. These custom file info panels are not easily accessed (unlike the Standards navigation pane, where some ISO standards are recognized today) and need to be accessed by navigating via the menus: File | Properties… | Additional Metadata… - clunky, but still a far better experience than attempting to navigate the raw XMP!
Conclusion
As the PDF industry moves ahead we hope to see many far more intuitive and user-friendly displays of PDF files’ XMP metadata, especially for conformance levels against ISO standards and PDF Declarations. There is no need for PDF software to restrict display to only explicitly known versions, parts, or conformance levels as document-level XMP metadata was designed to be both forward- and backward-compatible. With better means of presenting XMP metadata, users can be appropriately informed about any special capabilities of their PDF documents.
In the meantime, customization of other PDF applications may be possible - please speak to your PDF software vendor.
Vendors own their respective copyrights wherever they are mentioned. Any mention of companies or products does not imply endorsement or support of any of the mentioned information, services, products, or providers.