Planet Earth with the light of cities illuminating Europe and the Middle East.
New large-scale PDF corpus now publicly available

This new corpus – nearly 8 million PDFs totaling about 8 TB – was gathered from across the web in July/August of 2021.



Industry drives Tagged PDF forward

An overview of industry working-group activities pertaining to tagged PDF.

Names of various PDF Association communities.

Bottom-up: how standards should be made

Standards development works best when the expert community is open to diverse and dissenting views. The door is already open to the government to listen, participate and propose. If it isn’t broken don’t try to fix it.

Four hand-drawn checkmarks.

Perfecting PDF Lexical Analysis

This PDF file tests parsing valid combinations of adjacent PDF tokens both in the body of a PDF as well as in PDF content streams.

Cute beagle eating pasta.

How PDF contributes to greater sustainability

Thomas Zellmann, evangelist from the PDF Association, explains why PDF is not only “digital paper” but also one of the greenest office technologies around.

Green PDF

PDF: The document format for everything

The ISO-standardized PDF format and subset formats facilitate digital document solutions for today and tomorrow.

Animals in a boat.

Packaging email archives using PDF

EA-PDF establishes high-level requirements for using PDF technology to package email for long-term preservation.


PDF redaction – AstraZeneca EU contracts – s**t happens

As the AstraZeneca vaccine contract debacle makes clear, redacting PDF involves more than just the page; other objects have to be checked as well.

Example of the AstraZeneca contract redaction

Digitizing permanent records: the case for PDF/A-4

PDF/A-4 is essential to losslessly archiving PDF files that use current-generation PDF 2.0 technology… even including scanned documents. From modern Unicode support to interoperability with other specifications PDF/A-4 is the only way to archive PDF files conforming to PDF 2.0.

The National Archives, PDF/A-4

Process Automation in Customer Communication: More Flexibility Through API

How do companies stay agile enough in their document and output management to meet increasing customer expectations for speed and quality?

2020: the year in PDF

27 years after Adobe shipped the first PDF viewer the portable document format has replaced paper as the final format document media of choice. For many organizations 2020 – the year of COVID-19 – has become an “acid test” for …

2020 document icon