PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Working together at PDF Days Europe 2022; picture of Shawn Gaither

Presented at PDF Days Europe 2022
( 2022, Sep )

What’s stopping PDF from ubiquitous acceptance?

How the inclusion of explicit information will empower the PDF standard

Excerpt: PDFs have been around for nearly 30 years now and the very ability to ensure perfect representation of graphic content has led the PDF format to become a very powerful ISO standard.  The comprehensive specification of fonts, device-independent colorspaces, transparency, and numerous pre-press features has come to guarantee faithful reproduction of authored content.  When coupled with standards for compression, encryption, external files, and various ISO formats (e.g., PDF/X, PDF/A, and PDF/E), P … Read more
About the presenter(s)

Shawn Gaither has been working on PDF structure for 25 years at Adobe, and has led efforts in OCR, eBooks, accessibility, document comparison, form field detection, and most recently structure … Read more


Description

PDFs have been around for nearly 30 years now and the very ability to ensure perfect representation of graphic content has led the PDF format to become a very powerful ISO standard.  The comprehensive specification of fonts, device-independent colorspaces, transparency, and numerous pre-press features has come to guarantee faithful reproduction of authored content.  When coupled with standards for compression, encryption, external files, and various ISO formats (e.g., PDF/X, PDF/A, and PDF/E), PDF is a powerful tool to capture documents with perfect visual fidelity. However, the introduction of PDF/UA in 2012 exposed a serious shortcoming in the authoring of PDF: authoring applications often include very little explicit information despite their graphic-rich reproductions.  So much information is often lost in a PDF: font encoding information, structural information, table data, mathematical formulae, original resources, layout schemas to name just a few.  Even today’s best AI solutions often struggle with the synthesis of document information based on implicit clues. In this talk, the hidden gems in the PDF format will be discussed to show how authoring applications should create semantically rich PDFs containing explicit information that will allow for better readability, repurposing, accessibility, information extraction, and document understanding.  Examples will be provided using current authoring applications that show exemplar creation of valuable PDFs with explicit information that will extend the PDF standard into the next 25 years. Shawn Gaither has been working on PDF structure for 25 years at Adobe, and has led efforts in OCR, eBooks, accessibility, document comparison, form field detection, and most recently, structure detection for Liquid Mode on Acrobat mobile.


WordPress Cookie Notice by Real Cookie Banner