Presented at PDF Days Europe 2021
( 2021, Sep )

Deriving HTML from PDF – lessons learned

Implementation challenges in reusing PDF content

Session description

Two years after introducing Deriving HTML from PDF document, two years after implementing the core concept, after processing countless authored and un-authored pdf files we will share our experiences. To successfully adopt the idea, developers need to understand the implementation challenges, authors have to change their habits in producing pdf files.

We will discuss gaps in the design, in the nature of the whole process, in the lack of authoring tools and how to overcome them. The knowledge gatherer resulted in updates in PDF specification and additional work on standardisation level. We will talk briefly about it and how that will improve the reusability of pdf content in general.

Roman Toda
Foxit Corporation

Slides download: https://pdfa.org/wp-content/uploads/2021/06/PDFDays-2021-Derivation.pdf

Featured articles

Discover pdfa.org

Key resources

Get involved

Deriving HTML from PDF – lessons learned

Session description