How document understanding can leverage your PDF workflow
How to overcome the different challenges - fields of application
Excerpt: Document understanding is a constantly addressed topic and has become on top of the scene these last years with Deep Learning and NLP evolution. The PDF format is by nature unstructured, which implies sophisticated processes to extract and qualify information from such documents. In this presentation, we will discuss four ways to address challenges brought by PDF (which are: layout & text understanding, hierarchy & relationships between the different structures): Layout analysis, OCR, Te … Read moreAbout the presenter(s)
Elodie Tellier is the former COO of ORPALIS Imaging Technologies, an innovative French company producing imaging software, PDF processing tools, and large-scale document flow management solutions for professionals of all … Read more
Description
Document understanding is a constantly addressed topic and has become on top of the scene these last years with Deep Learning and NLP evolution. The PDF format is by nature unstructured, which implies sophisticated processes to extract and qualify information from such documents.
In this presentation, we will discuss four ways to address challenges brought by PDF (which are: layout & text understanding, hierarchy & relationships between the different structures):
- Layout analysis,
- OCR,
- Textual content key-value association,
- Natural language processing.
We will then discuss the many fields of applications of such technologies, including OCR, automatic indexing, tagging & labeling, structured layout conversion, and automatic redaction.