iText Suite 9.3: Smarter Validation, Enhanced OCR, Smaller Files!
Apryse, formerly PDFTron, is a global leader in document processing technology, committed to making work better and life simpler. Apryse gives developers, enterprise customers, and small businesses proven tools to reach their document goals faster and easier. Our product portfolio includes Apryse SDK, WebViewer, iText, and XODO, offering solutions across … Read more


This iText Suite release from Apryse includes iText Core 9.3.0, which introduces support for the official EU List of Trusted Lists (aka LOTL) digital signature validation, along with improvements to thread safeness and further work on the .NET MAUI support we introduced in version 9.2.0.
Elsewhere in the iText Suite, we have significant updates for the pdfOCR and pdfOptimizer add-ons. For pdfOCR we've extended its capabilities beyond the existing Tesseract 4.x implementation by introducing support for Machine Learning-based OCR pre-trained models which follow the Open Neural Network Exchange (ONNX) standard.
For pdfOptimizer, optimization of embedded fonts has been upgraded, leading to significantly smaller file sizes with no loss in document quality.
EU Trusted Lists Support
Digital signatures is a key use case for iText, and recent releases have focused on simplifying the digital signing and signature validation process for the open-source iText Core library. The introduction of support for the retrieval, validation, and usage of the EU’s List of Trusted Lists (LOTL) is another big step towards this goal.
The LOTL is an official resource which identifies the organizations authorized by European governments to provide safe and reliable digital identity and electronic signature services. It consists of an XML file of country-specific trusted lists, which in turn contain various certificates considered to be trustworthy for use during PDF signature validation.
Using these lists iText can confirm whether digital signatures or electronic seals are legitimately trustworthy, and legally valid for the relevant country or region under EU laws. You can be sure that signatures validated with iText will meet the stringent European standards for trust and authenticity defined by the eIDAS regulations.
New ONNX-Based OCR Engine
pdfOCR is our add-on for iText Core to perform OCR on documents and images. This release adds the pdfocr-onnxtr module, which provides a high-performance machine learning-based OCR implementation.
Note that this module is not a replacement for the existing Tesseract 4 module. The pdfOCR add-on was designed to easily allow alternative OCR engines to be used, and this demonstrates that aim. Tesseract is a well-established OCR engine with excellent support for global languages, and so it’s easy to switch between engines if you run into issues with certain documents.
Based on the open-source OnnxTR library, it is powered by ONNX pre-trained models. ONNX (Open Neural Network Exchange) is a new but widely-used open format for machine learning models whch defines both a common set of operators and a common file format to enable AI developers to use models with various frameworks, tools, runtimes, and compilers. A big advantage of ONNX is that anyone can get access to high-quality OCR without spending a ton of time and money on AI hardware and finding datasets to train their own models.
Supporting ONNX means a big upgrade in iText’s OCR capabilities. Our internal testing shows many ML-based models deliver faster and more accurate recognition than traditional OCR engines, such as Tesseract. This initial release uses an ONNX implementation of the popular and high-performance docTR library, which comes with ONNX versions of pre-trained docTR models for fast and efficient text detection and recognition.
Future releases will further leverage the ONNX standard, enabling iText to utilize not only additional ML-based OCR engines, but also take full advantage of the growing number of pre-trained models in the community.
Additionally, pdfOCR now supports PDF documents as input, not just image files. This enables direct text recognition for scanned PDFs, with no external conversion required.
Font Subset Consolidation for pdfOptimizer
New for the pdfOptimizer add-on is the consolidation of duplicate font subsets for horizontal fonts, preserving the visual fidelity of documents while significantly reducing file sizes in many cases.
Bug Fixes and Miscellaneous Improvements
As mentioned, we’ve improved thread safety across key iText components, making it more robust in multithreaded environments.
In .NET-specific changes, we improved compatibility with .NET MAUI for cross-platform mobile and desktop development. Another nice change is in XML-based metadata parsing, which will significantly speed up the processing of large XMP collections.
A bug in PDF 2.0 structure destinations has been fixed, improving how tagged content is linked and navigated when converting from HTML. This is now more in line with the PDF 2.0 and PDF/UA-2 specifications and could be particularly useful for accessibility and structured document workflows.
Speaking of HTML conversion, we’ve implemented some nice fixes for the pdfHTML add-on when it comes to layout calculations and tagging. Also improved is the handling of anchor tags with object references for better accessibility and structure, which can be handy when generating accessible PDF from HTML/XML and CSS templates.
An issue related to color depth support in PDF image data streams, which would result in an IO exception is fixed. We also resolved a stack overflow exception which could occur in invalid PDFs with cyclic references in the trailer dictionary.
Showcase PDF
Continuing the tradition of previous releases, the iText Core release notes are also available as a PDF document on the iText Knowledge Base. This showcase document not only conforms to the latest PDF/UA-2 and PDF/A-4 standards, it’s also digitally signed.
It also acts as a demonstration of the high-level HTML-to-PDF/UA API we introduced in the previous release, making the creation of accessible PDF documents easier than ever before.
You can find full details of what’s new in iText Suite 9.3 in the release announcement. Alternatively, check the release notes on the iText Knowledge Base for more technical information, download links and related documentation/examples.
Apryse, formerly PDFTron, is a global leader in document processing technology, committed to making work better and life simpler. Apryse gives developers, enterprise customers, and small businesses proven tools to reach their document goals faster and easier. Our product portfolio includes Apryse SDK, WebViewer, iText, and XODO, offering solutions across…
Read more


