Presented at PDF Days Europe 2018
( 2018, May )

Structure recognition for information retrieval and layout

Mining for structure in a reliable scalable way

Session description

Tables, list and other structural elements are found in many digital articles. These elements typically allow the authors to present information in a structured manner and to communicate and summarize key results and main facts. It allows readers to get a quick overview of the presented information, to compare items and put them into context. Knowing the physical boundaries of paragraphs can aid screen-readers for visually impaired users. Having a concept of tables will help any document-processing flow. And, aside from serving as pure input, structure is a key component when performing conversion. This talk is about bridging the gap between high-level concepts and low-level document formats.

Check out the detailed programme: https://pdfa.org/pdf-days-europe-2018-schedule-of-sessions/

Presenter

Joris is a 29-year old software developer at iText, a global IT firm with a leadership position in pdf creation. Joris’ background is in machine learning, NLP, mathematics, graphs and NP-complete problems. After having worked in the supply-chain industry, he set his sights on document-processing and workflow-automation. At iText he focusses mostly on innovative research projects.

Joris Schellekens
borb

Slides download: https://pdfa.org/wp-content/uploads/2018/05/1615_schellekens.pdf

Featured articles

Discover pdfa.org

Key resources

Get involved

Structure recognition for information retrieval and layout

Session description

Presenter