Working together at PDF Days Europe 2022; picture of Elodie Tellier

Presented at PDF Days Europe 2022
( 2022, Sep )

How document understanding can leverage your PDF workflow

How to overcome the different challenges - fields of application

Session description

Document understanding is a constantly addressed topic and has become on top of the scene these last years with Deep Learning and NLP evolution. The PDF format is by nature unstructured, which implies sophisticated processes to extract and qualify information from such documents.

In this presentation, we will discuss four ways to address challenges brought by PDF (which are: layout & text understanding, hierarchy & relationships between the different structures):

Layout analysis,
OCR,
Textual content key-value association,
Natural language processing.

We will then discuss the many fields of applications of such technologies, including OCR, automatic indexing, tagging & labeling, structured layout conversion, and automatic redaction.

Elodie Tellier
Calico France

Slides download: https://pdfa.org/wp-content/uploads/2022/05/1330-Tellier.pdf

Featured articles

Discover pdfa.org

Key resources

Get involved

How document understanding can leverage your PDF workflow

Session description