PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Roman Toda

Interview with Roman Toda, CTO at Normex, about PDF Days Online 2021

Roman’s PDF Days Online presentation “Deriving HTML from PDF – lessons learned” will focus on best practices and lessons for deriving HTML from tagged PDF and cover his recommended improvements to the PDF specification.
About the author: The PDF Association staff delivers a vendor-neutral platform for PDF’s stakeholders, facilitating the development of open specifications and ISO standards for PDF technology. The staff are located in Germany, the … Read more

PDF Association: At the PDF Days Online 2021, you will be hosting a presentation titled “Deriving HTML from PDF – lessons learned” – what’s that about?

Roman Toda: In recent years, the interoperability of PDF became a thing. Everyone wants to access content. And we are not talking about simple text extraction, but whole document hierarchy with identifying more complex structures like tables, forms.

Two years ago, PDF Association published a document: “Deriving HTML from PDF”– an algorithm for producing HTML from tagged PDF. After authoring many PDFs, after implementing the algorithm in various scenarios from PDF consumption and annotation in HTML environments to data mining we decided to share our experiences with current state. We will identify where there are gaps in tooling and where we need updates in PDF spec.

PDF Association: Who is your presentation aimed at?

Roman Toda: Mainly software architects, integrators and developers willing to design their systems in an interoperable way. But in principle everyone interested in knowing how real-world use cases change our file format specifications.

PDF Association: What will the people who attend your presentation be able to take away from it?

Roman Toda: Hopefully people already know that PDF can be interoperable, and I believe they will learn a lot of details, best practices in authoring PDFs and lessons learned with implementing derivation algorithm.

PDF Association: The PDF Days Online 2021 has become the leading PDF event. What makes the PDF Days so unique in your mind?

Roman Toda: The unique combination of very business and very technical oriented talks and people that never fail to impress. The openness and friendliness of all participants makes this event so special.

PDF Association: Thank you! We look forward to seeing you at the PDF Days Online 2021.

Check out the overall PDF Days agenda and register for Roman’s session.

WordPress Cookie Notice by Real Cookie Banner