New large-scale PDF corpus now publicly available

This new corpus – nearly 8 million PDFs totaling about 8 TB – was gathered from across the web in July/August of 2021.



PDF/raster: An Overview

Curious about how PDF and PDF/raster are related? Here’s a quick overview from Datalogics CTO and Chairman of the PDF Association, Matt Kuznicki.

ActivePDF Releases DocSight™ OCR 2017 R2.0

ActivePDF announced major enhancements to their DocSight OCR product. This update to ActivePDF’s premier optical character recognition (OCR) tool, provides organizations expanded ability to convert files into searchable and editable text PDF with ease, speed and accuracy.

Summer Release of Java PDF Library with Optimizer Audit, PadES Signatures, Page Resize

v2017R1 summer release of Qoppa’s Java PDF library suite adds improvements to PDF manipulation, PDF optimizer, electronic signatures and preflight features. Qoppa’s advanced Java PDF optimizer library reduces significantly the file size of PDF documents by removing unused objects, compressing images and streams. …

Qoppa Summer Release of Java PDF Component adds 4k Support, Annotation Unicode & Page Labels

This summer release of Qoppa’s Java PDF Component suite (v2017R1) adds 4k high resolution display support and tons of improvements to the PDF annotation feature. Qoppa’s PDF components user interface was revised to automatically scale depending on the user screen resolution, …

Redaction with Overlay Text using Datalogics PDF Java Toolkit

When publishing documents online, you have to operate under the assumption that someone, somewhere, has made a copy of it and that it will exist forever. Because of that, we need to take extra care to remove sensitive data from …

VIP Event to provide expert print & publishing advice

callas software is inviting all interested parties to a VIP Event for the print and publishing sector. The event will be held on November 6-8, 2017 in Vienna. The first two days of the event will consist of a number …

Synchronizing Documents with PSPDFKit Instant

Nowadays people work from multiple devices expecting their data to be available everywhere. Teams need to be able to collaborate no matter where their members are located and what devices they are using. App developers need to provide their users …

PSPDFKit for Web 2017.5

With the release of PSPDFKit for Web 2017.5, we are introducing an all new standalone mode. Previously, PSPDFKit for Web only supported server-backed deployments. As of today, you now have the option of deploying PSPDFKit for Web without a server, …

PDF and PDF/A functionality in a document management system

Rottal-Metzg AG needed to add a document management system to its myBica ERP solution.

pdfToolbox update 9.3 is now available!

callas software, market leader for automated PDF quality control and archival solutions, today releases version 9.3 of its showpiece pdfToolbox. This update contains small but significant new features to make your life easier.