What can thousands of PDFs tell us?
PDFs is not a single application’s file format, it is a file format shared with thousands of applications, and each PDF producer seemingly has their own quirks for generating PDFs. What can we learn by examining a couple of largish collections of PDF as an aggregate sample of PDFs?
Within these file sets: which are the most common PDF producers? What PDF version is most common? How common are errors? What are the most common errors? How common is PDF tagging? how common is PDF/A? What do these answers mean?