Expressing Text and Data Mining Rights with Datalogics PDF Optimizer + TDMRep

Lindsey is a content marketing professional with 15 years of experience working with small and large companies alike. She is passionate about telling stories and connecting with others through digital channels.


As machine learning, AI, and text and data mining (TDM) become more universal, PDF developers and users face two new challenges:
- Clarifying what rights are reserved over content: is it permissible to mine or extract data?
- Expressing contact information or licensing policy for those who wish to use the content for TDM.
To help with this, the W3C’s TDM Reservation Protocol, or TDMRep provides a standard, machine-readable way to indicate whether TDM rights are reserved (or not), and, optionally, a policy or URL where permission/licensing information can be obtained.
We recently published a blog article called Defending Your PDFs: Blocking AI Models from Scraping Your Data, which examines the importance of protecting your PDF documents from AI scraping using TDMRep. That article discussed how to use TDMRep, what a TDM policy looks like, and how to use Adobe PDF Library to express TDMRep in a PDF. We now have newly supported keys (tdm-reservation and tdm-policy) for a PDF Optimizer profile, allowing users to express TDMRep using the PDF Optimizer command-line utility instead of/in addition to APDFL. Let's take a look at what that means for developers working with PDF Optimizer.
What’s New: TDMRep in PDF Optimizer
By adding the ability to include TDMRep metadata in optimized PDFs, this means that as part of PDF optimization workflows, authors or publishers can now:
- Insert TDMRep properties into a PDF’s XMP metadata, specifically the `tdm-reservation` (boolean) to signal whether TDM use is reserved or not; and
- Include a `tdm-policy` URL, pointing to licensing or contact info (or policy text) for someone who wants to request TDM usage.
TL;DR: TDMRep metadata can now be added automatically or manually during the optimization, compression, and clean-up stages, so PDF files are both efficient in size and clearly labelled for TDM purposes.
What PDF Optimizer Already Does
Before this addition, PDF Optimizer already supported a broad set of features for improving performance, accessibility, compatibility, and file size. These include image compression and downsampling, removing redundant metadata, flattening transparencies, stripping unused fonts or embedded attachments, removing private or unused object data, and more.
With TDMRep support, metadata about rights should no longer be an afterthought; it can be part of the finalized PDF output, alongside the other optimizations.
Read Maximizing PDF Performance with PDF Optimizer: Solutions Gate Case Study to see how our customers use PDF Optimizer.
How to Use TDMRep with PDF Optimizer
Here are a few ways users can leverage this with PDF Optimizer:
- Decide on your TDM policy: Do you want to reserve all TDM rights (a.k.a. deny mining), allow them under certain conditions, or explicitly permit them? If reserved, will you provide contact or licensing information via a policy URL?
- Prepare or locate your metadata: For instance, you might write a JSON-LD policy document, or have a webpage describing licensing terms.
- Use PDF Optimizer when finalizing the PDF, enabling the option (or passing the parameter) that adds TDMRep metadata. This will embed TDMRep fields into the XMP metadata of the PDF.
- Distribute the optimized PDF: Users or automated agents who inspect metadata (or follow TDM-aware tools) can detect the reservation status and find policy info.
Why This Matters
- Legal & Ethical Clarity: Since TDMRep is a W3C standard, having the metadata makes it clear to AI agents, researchers, publishers, and the public where your content stands with regard to text & data mining rights.
- Better Compliance: In jurisdictions with laws related to text and data mining (e.g. the EU’s DSM / CDSM directive, or emerging policies around AI training), having an embedded opt-out or policy address helps enforce reservation or licensing obligations in a a way that is machine-readable.
- Streamlined Workflow: Including the metadata during optimization means you won't need a separate tool or manual insertion of metadata later. This is a big win for those
Example: What TDMRep Metadata Might Look Like
In an optimized PDF, after using PDF Optimizer with TDMRep support, the XMP metadata might include entries like:
<xmp:Metadata>
<rdf:RDF xmlns:tdmrep="http://www.w3.org/ns/tdmrep/">
<rdf:Description>
<tdmrep:tdm-reservation>true</tdmrep:tdm-reservation>
<tdmrep:tdm-policy>https://example.com/my-tdm-policy.html</tdmrep:tdm-policy>
</rdf:Description>
</rdf:RDF>
</xmp:Metadata>
If tdm-reservation=false, that could indicate you are explicitly allowing TDM. The tdm-policy field would link to details, licensing, or contact info.
Best Practices & Considerations
- Be clear about what “reserved” means: It should align with your legal or institutional policy. Using “reserved” without providing a policy may be too ambiguous.
- Maintain the policy URL: If you embed a URL in TDMRep metadata, make sure it's valid, as dead links undermine the effectiveness of the metadata.
- Leverage standard vocabularies: If possible, use ODRL or other recognized rights expression languages for your policy documents so they can be parsed and understood automatically.
- Balance optimization with fidelity: While compression and removing metadata help file size, make sure you don’t inadvertently eliminate the parts needed (i.e. accessibility tags) when optimizing.
In conclusion, the addition of TDMRep support to PDF Optimizer is a significant step for PDF developers and users who want to both manage file size and performance *and* assert or clarify their TDM rights. Embedding machine-readable reservation of TDM rights directly into PDFs means policies are transparent to both human users and automated agents. If you’re distributing or archiving PDF content, consider using these features proactively as it's a way to ensure your intentions around text and data mining remain clear, legally defensible, and integrated into your PDF document workflows.
Join us on Discord | Schedule a Call with an Engineer | Ask Scout, our Friendly AI Assistant
Datalogics, Inc. provides a complete SDK for PDF creation, manipulation and management for companies around the globe. Built on Adobe source code, our flagship product Adobe® PDF Library offers a choice of programming platforms and languages along with unsurpassed customer service, proven by our 94% customer retention rate. Datalogics offers…
Read more