Presented at OctoberPDFest online
( 2020, Oct )

Examining PDFs as an aggregate

What can thousands of PDFs tell us?

Session description

PDFs is not a single application’s file format, it is a file format shared with thousands of applications, and each PDF producer seemingly has their own quirks for generating PDFs. What can we learn by examining a couple of largish collections of PDF as an aggregate sample of PDFs?

Within these file sets: which are the most common PDF producers? What PDF version is most common? How common are errors? What are the most common errors? How common is PDF tagging? how common is PDF/A? What do these answers mean?

Patrick Gallot
Datalogics

Slides download: https://pdfa.org/wp-content/uploads/2020/06/PDFs_Aggregate_OctoberPDFest_FinalReviewDraft.pdf

Featured articles

Discover pdfa.org

Key resources

Get involved

Examining PDFs as an aggregate

Session description