
Deriving HTML from PDF Explored in Paris
The Deriving HTML from PDF TWG’s specification is an important part of their program of work as it supports one of the most critical use cases for well-tagged pdfs – deriving html from pdf.
We are happy to invite you all to participate in the in-person meeting of our Deriving HTML from PDF Technical Working Group at the PDF Weeks 2023 Paris on May 2, 9:15 CEST.
We want to introduce you to and discuss a new draft version 1.1 of our document describing the algorithm for deriving HTML from PDF. In recent years, we’ve seen a growing demand for interoperable solutions. Various vendors and groups are adopting solutions that rely on reusing PDF content. As you all know, the document is an important part of our work and supports one of the most critical use cases for well-tagged pdfs – deriving html from pdf.
Therefore, it’s essential to make sure the algorithm is up-to-date, accurate, and perfectly aligned with the changes in other relevant standards and industry-approved techniques. Some of the proposed changes were already discussed during our regular calls, but we hope to take advantage of experts from other relevant groups like PDF/UA and PDF/UA Processor to make sure that the algorithm also addresses all accessibility needs according to the latest PDF/UA-2 standard.
We will also save some time to collect feedback from groups focusing on the reusability of PDF like the LaTeX TWG, and Form TWG. We hope we can help each other to improve the tagging techniques, and requirements for the derivation algorithm to achieve the best user experience of their work through HTML derivation.
The draft will be distributed before our meeting together with a list of issues we hope to resolve at the meeting, so you’ll have the opportunity to come prepared.
We look forward to meeting you and hearing your opinions, new ideas, and additional topics that are interesting to you.
If you are not able to come to Paris, there will be an option to attend remotely.