PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Datalogics PDF Alchemist Now Offers JSON Support For PDF Data Extraction

We have some good news for developers who need PDF data extraction capabilities with JSON support – our PDF Alchemist tool now supports JSON outputs in our latest version (3.0)! So, what does this mean for the end-user? JavaScript Object Notation, or JSON, is the open-standard file format or data interchange format that uses human-readable text to transmit data objects consisting of attribute value pairs and array data types. If you work with JSON, you probably already know it has a diverse rang … Read more
About the author: Lindsey is a marketing professional with over 10 years of experience working with small and large companies alike. She is passionate about telling stories and connecting with others through digital … Read more

We have some good news for developers who need PDF data extraction capabilities with JSON support – our PDF Alchemist tool now supports JSON outputs in our latest version (3.0)!

So, what does this mean for the end-user?

JavaScript Object Notation, or JSON, is the open-standard file format or data interchange format that uses human-readable text to transmit data objects consisting of attribute value pairs and array data types. If you work with JSON, you probably already know it has a diverse range of applications, and it can serve as a replacement for XML in AJAX systems. Our latest PDF Alchemist release supports the conversion of PDF elements into key-value pairs in JSON (in addition to the existing output options of HTML, XML and EPUB).

The major benefits of this new feature is that you can:
• Extract your data from PDF into a format that, compared with XML, is relatively lightweight and readable.
• Parse your extracted data efficiently using JavaScript or another programming language.

Here’s a look at how these new features work with PDF Alchemist:

Choosing JSON as output will identify and parse a variety of PDF data as detected by PDF Alchemist. The data is identified by type while retaining order of appearance in the document. As a result, data such as tables, lists, and paragraphs are identified and ready to be used by further processing.

The new JSON output option supports the existing data partition parameters in PDF Alchemist. A few examples are the “tablesOnly” option to extract only table data, setting the “reflowText” option to false to preserve line break information within paragraphs, and the “ocrMode” option to extract and identify image character data.

In addition to JSON output, PDF Alchemist now accepts XSLT Stylesheets via the xsltStylesheetPath Parameter. These stylesheets are applied to the XML output. Control the way PDF Alchemist writes output by providing your own custom stylesheet as input.

Ready to get started with PDF Alchemist 3.0 with JSON support? Request a free evaluation on our website.


Datalogics, Inc. provides a complete SDK for PDF creation, manipulation and management for companies around the globe. Built on Adobe source code, our flagship product Adobe® PDF Library offers a choice of programming platforms and languages along with unsurpassed customer service, proven by our 94% customer retention rate. Datalogics offers…

Read more
WordPress Cookie Notice by Real Cookie Banner