PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Digging for information by extracting data from a PDF document

Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple – but also tricky.
Nadine Schuppisser

Nadine Schuppisser
December 8, 2016

Member News


Print Friendly, PDF & Email

Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple - but also tricky.

Among the easiest things to extract you'll find metadata. The document metadata can usually be extracted as a short XMP stream. Even if the document contains an old fashioned information dictionary then the extraction of the key / value pairs is not a big deal. Similar are outlines (bookmarks), navigation aids such as named destinations, links and the like.

Read more on how to extract information from a PDF document in our PDF expert blog


Pdftools counts more than 5,000 companies and organizations in 70 countries among its customers, making it one of the world’s leading producers of software solutions and developer components for PDF and PDF/A products. The product range support the entire document flow, from raw materials to scanning processes through to signing…

Read more
WordPress Cookie Notice by Real Cookie Banner