PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

OpenDataLoader PDF v2.0 Tops Open-Source PDF Benchmarks in PDF Data Loading

Hancom releases OpenDataLoader PDF v2.0 with benchmark-leading PDF extraction, a hybrid AI engine for on-premise security, four free AI add-ons(OCR, Table, Formula, Chart), and a switch to Apache 2.0 licensing.

Member NewsMarch 17, 2026
Screenshot of OpenDataLoader dashboard
OpenDataLoader PDF v2.0 Tops Open-Source PDF Benchmarks in PDF Data Loading
Screenshot of OpenDataLoader dashboard

Hancom releases OpenDataLoader PDF v2.0 with benchmark-leading PDF extraction, a hybrid AI engine for on-premise security, four free AI add-ons(OCR, Table, Formula, Chart), and a switch to Apache 2.0 licensing.

Member NewsMarch 17, 2026

OpenDataLoader PDF v2.0 Tops Open-Source PDF Benchmarks in PDF Data Loading

About Hancom Inc.

Hancom, the South Korean software company behind the widely used Hangul word processor, has released OpenDataLoader PDF v2.0. In the company's own internal testing, OpenDataLoader PDF outperformed competing open-source tools across reading order recognition, table extraction, and heading inference. The full benchmark dataset and reproducible code are published on the official OpenDataLoader PDF GitHub repository, allowing developers to verify the results independently.

Note: Benchmark results are self-reported by Hancom based on internal testing.

Hybrid Engine for Local, Secure Extraction

Screenshot of OpenDataLoader dashboard

The headline engineering change in v2.0 is a hybrid extraction engine that pairs AI-based parsing with a direct extraction heuristic. The result is high-accuracy PDF data extraction that runs entirely on-premise, with no data transmitted outside the local environment — a significant consideration for organizations handling sensitive legal, financial, or medical documents.

Four Free AI Add-ons

OpenDataLoader PDF v2.0 ships with four AI capabilities as free add-ons:

  • OCR — improves text recognition on image-based and scanned PDFs
  • Table Extraction — handles merged cells and complex table structures
  • Formula Extraction — recognizes mathematical and scientific notation locally, without a cloud call
  • Chart Analysis — converts chart visuals into structured natural-language descriptions

All four are built for compatibility with third-party open-source models, including Docling. No formal partnership or sponsorship is in place; compatibility is purely technical, enabling developers to integrate OpenDataLoader PDF into existing pipelines without rebuilding their stack.

Apache 2.0 Licensing

OpenDataLoader PDF v2.0 transitions from MPL-2.0 to Apache 2.0 — one of the most permissive open-source licenses available. The change reduces friction for commercial adoption, making it easier for developers and enterprises to build downstream applications, including SaaS and WebApp products, without license compatibility concerns.

Ecosystem Integrations

LangChain integration shipped in 2025. The 2026 roadmap targets Langflow, LlamaIndex, and Gemini CLI, alongside MCP (Model Context Protocol) support for agentic AI workflows. The roadmap positions OpenDataLoader PDF as infrastructure for the autonomous AI agent era, not just a standalone parsing tool.

Later in 2026, a commercial AI add-on is planned — described as a concentration of Hancom's proprietary document AI technology.

Accessibility Roadmap: AI-Generated Tagging for PDF/UA

With the European Accessibility Act now in effect and accessibility regulations expanding globally, Hancom will release an AI-based auto-tagging feature targeting Tagged PDF and PDF/UA compliance. It will be the first open-source implementation of AI-generated accessibility tagging for PDF.

What Hancom's CTO Said

"OpenDataLoader PDF v2.0 has evolved into an open PDF data platform that anyone can freely use and build upon, through its AI hybrid engine and transition to Apache 2.0. With upcoming commercial AI add-ons and accessibility solutions, we aim to lead the global ecosystem — making PDF documents not only AI-ready, but accessible to everyone."

— Jihwan Jeong, CTO, Hancom

Availability

OpenDataLoader PDF v2.0 is available now.

Source code, benchmark datasets, and documentation are published at the OpenDataLoader PDF GitHub repository.

Also, you can get more information in our OpenDataLoader PDF 

 

This post was drafted with the assistance of AI.


Founded in 1990, Hancom is a leader in creating innovative ecosystems that will lead the world through the convergence of technology. Hancom is a part of Hancom Group with 26 affiliated companies covering AI, metaverse, data analysis, robotics, drones, satellite, social security, healthcare, and digital finance, the mission of Hancom…

Read more

WordPress Cookie Notice by Real Cookie Banner