Deriving HTML from PDF – lessons learned
Implementation challenges in reusing PDF content
Excerpt: Two years after introducing Deriving HTML from PDF document, two years after implementing the core concept, after processing countless authored and un-authored pdf files we will share our experiences. To successfully adopt the idea, developers need to understand the implementation challenges, authors have to change their habits in producing pdf files. We will discuss gaps in the design, in the nature of the whole process, in the lack of authoring tools and how to overcome them. The knowledge gat … Read moreAbout the presenter(s)
Roman is first and foremost a software developer. C++ expert with more than 20 years of experience with PDF. He’s been developing all major PDF features in high quality PDF … Read more
Description
Two years after introducing Deriving HTML from PDF document, two years after implementing the core concept, after processing countless authored and un-authored pdf files we will share our experiences. To successfully adopt the idea, developers need to understand the implementation challenges, authors have to change their habits in producing pdf files. We will discuss gaps in the design, in the nature of the whole process, in the lack of authoring tools and how to overcome them. The knowledge gatherer resulted in updates in PDF specification and additional work on standardisation level. We will talk briefly about it and how that will improve the reusability of pdf content in general.