Using TDMRep to license AI mining of PDFs
ArticleMarch 17, 2026
ArticleMarch 17, 2026
About Peter Wyatt, PDF Association
BUSINESS NOTE
We clarify the use of TDMRep’s reservation of rights in XMP metadata for PDF and PDF/A.
The W3C’s TDMRep protocol self-describes as “expressing the reservation of rights relative to text & data mining (TDM) applied to lawfully accessible Web content, and to ease the discovery of TDM licensing policies associated with such content.” Published in May 2024, this protocol also includes specific instructions for inclusion in HTML, PDF, and EPUB files.
However, the discussion of PDF support in the W3C publication is both vague and inaccurate in several respects.
PDF/A support for TDMRep
PDF/A-1 (ISO 19005-1), PDF/A-2 (ISO 19005-2), and PDF/A-3 (ISO 19005-3) files can all support W3C TDMRep simply by including a PDF/A XMP Extension Schema in the document’s XMP metadata stream.
To support TDMrep in PDF/A files, the PDF Association provides a free PDF/A XMP Extension Schema template for TDMRep. Including this schema ensures that adding TDMrep in PDF/A files will not cause a validation issue, as the necessary PDF/A XMP Extension Schema is included with the TDMRep rights declaration.
Note that, as of March 2025, W3C’s May 2024 Final Community Group Report on the use of TDMrep provides incorrect guidance on this point in clause 6.6, in that it fails to acknowledge that TDMrep is perfectly valid in a PDF/A file when the necessary XMP extension schema is present:

As we have reported to the W3C, the simple inclusion of the TDMRep PDF/A XMP Extension Schema along with TDMRep metadata creates valid PDF/A files, as demonstrated by this PDF/A-1a sample.

PDF/A-4 (ISO 19005-4) no longer requires PDF/A XMP Extension Schemas, so in this context, it is possible to simply include TDMRep in the file’s XMP metadata without any further additions and not fail PDF/A-4 validation. The same is true for “general-purpose” PDF.
The W3C TDMRep specification is vague as to what the TDM claim actually covers, since there is a lot more than just pages in a PDF document. W3C ambiguously states, “These properties cover TDM rights for every page in the document”, which does not account for embedded files, collections, or other resources that may be present in PDF files. As the global industry organization and SDO for PDF, the PDF Association has reported to the W3C and publicly clarified “that the TDMRep properties in the document catalog XMP metadata apply to all pages and all resources in a PDF file”.
Encrypted XMP metadata
As detailed in ISO 32000, XMP metadata streams may also be encrypted. For documents including TDMRep, the PDF Association recommends not encrypting the document-level XMP metadata. Please refer to the EncryptMetadata boolean entries in Tables 21 and 27 of ISO 32000, as well as this errata correction.
If it is not feasible to leave the main document metadata unencrypted, encapsulating the PDF document with encrypted metadata in an unencrypted wrapper document (see ISO 32000-2, §7.6.7) allows the TDMRep rights to be expressed unencrypted. As noted in the PDF Association’s clarification, the unencrypted wrapper document’s TDMRep rights cover all resources within the unencrypted wrapper PDF, including the PDF document containing encrypted metadata.
Per-object XMP metadata
Both PDF 1.7 (ISO 32000-1:2008) and PDF 2.0 (ISO 32000-2) define support for per-object XMP Metadata streams, so it is also possible that a TDMRep declaration could be associated with specific PDF objects, such as images, content streams, embedded files, annotations, etc. This may occur if PDF documents, pages, or content are merged from one PDF into another. W3C’s current (May, 2024) TDMRep specification fails to address the possibility of object-level TDMRep declarations, and therefore, doesn’t address how they interact - do they “override” the document-wide TDMRep declaration, or vice versa?
To overcome this ambiguity, the PDF Association strongly recommends that TDMRep metadata only be present in the document-level XMP and that PDF software should detect and remove all other TDMRep metadata from any other XMP metadata streams present in a document. This reinforces our clarification “that the TDMRep properties in the document catalog XMP metadata apply to all pages and all resources in a PDF file”.
Embedded files
Although embedded files in a PDF are represented by the enclosing document’s TDMRep declaration, it is strongly recommended that each embedded document contain its own metadata and a matching TDMRep declaration. This practice ensures that if an embedded document is extracted and thereby is disassociated from the main document, the expression of TDMRep rights is retained.
Enabling PDF software
With at least one vendor providing in-built TDMRep support, PDF viewing and editing software developers may wish to extend their XMP editing and visualization capabilities to surface TDMRep rights to end users:


As mentioned above, PDF editing software may additionally wish to check and remove TDMRep declarations from PDF object metadata streams to ensure that only a single TDMRep right is present in the document’s metadata. PDF editing software may also wish to support propagating document TDMRep information into embedded files, so that, if isolated, TDM rights remain present.
Other ways to express rights
In the fast-moving world of AI and text & data mining, other formats and protocols for expressing similar rights have been proposed, many of which are summarized in this OpenFuture 2025 report, “A vocabulary for opting out of AI training and other forms of TDM.”
As PDF’s standards development organization, the PDF Association is happy to work alongside government, regulators, publishers, and other industry associations to ensure that alternative methods for expressing TDM rights in PDF are defined in an optimal manner. Please contact us.
Conclusion
W3C TDMRep is an established, easy-to-understand, and easy-to-use means to express reservation of rights related to text & data mining (TDM).
However, the W3C’s specification is insufficient for the complexities of PDF:
- TDMRep can be used in any PDF file, including PDF 1.x, PDF 2.0, and PDF/A-4.
- TDMRep can be used in PDF/A-1, PDF/A-2, and PDF/A-3 by additionally including the necessary PDF/A XMP Extension Schema.
- The TDMRep declaration in a document’s XMP metadata applies to all pages and all resources in a PDF file.
- Other XMP metadata in the PDF should not include TDMRep declarations and should be ignored if they are present.
- Embedded files should include their own TDMRep metadata so that TDM rights remain present after extraction.


