Digging for information by extracting data from a PDF document
Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple – but also tricky.Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple - but also tricky.
Among the easiest things to extract you'll find metadata. The document metadata can usually be extracted as a short XMP stream. Even if the document contains an old fashioned information dictionary then the extraction of the key / value pairs is not a big deal. Similar are outlines (bookmarks), navigation aids such as named destinations, links and the like.
Read more on how to extract information from a PDF document in our PDF expert blog
Pdftools counts more than 5,000 companies and organizations in 70 countries among its customers, making it one of the world’s leading producers of software solutions and developer components for PDF and PDF/A products. The product range support the entire document flow, from raw materials to scanning processes through to signing…
Read more