The Arlington PDF Model
A specification-derived, machine-readable definition of PDF
Excerpt: This talk will present the Arlington PDF Model as the first open access, vendor-neutral, comprehensive, specification-derived machine-readable definition of all formally defined PDF objects and their intra- and inter-object relationships. This represents the bulk of the latest 1,000-page ISO PDF 2.0 specification in a machine-readable text-based definition of the entire PDF DOM. It establishes a state of the art “ground truth” for future PDF research efforts and implementers. Using either trivia … Read moreAbout the presenter(s)
Peter Wyatt is the PDF Association’s CTO and an independent technology consultant with deep file format and parsing expertise. A developer and researcher working on PDF technologies for more than … Read more
Description
This talk will present the Arlington PDF Model as the first open access, vendor-neutral, comprehensive, specification-derived machine-readable definition of all formally defined PDF objects and their intra- and inter-object relationships. This represents the bulk of the latest 1,000-page ISO PDF 2.0 specification in a machine-readable text-based definition of the entire PDF DOM. It establishes a state of the art “ground truth” for future PDF research efforts and implementers. Using either trivial Linux commands, or simple scripts, or more advanced programs a multitude of potential use-cases are supported, including test case generation, extant data validation, parser generation, modelling and rapid forensic analysis of PDF syntax fragments.