Understanding Brotli PDF Compression

Patrick Gallot, has been working with software developers and PDFs since 2000. At Datalogics, he is the lead technical support Engineer for the Adobe PDF Library and Datalogics PDF Java Toolkit, helping our customers resolve their complex questions and challenges. He is also active in the PDF community, presents at … Read more


In today's world of instant global document sharing, PDF file size directly impacts performance. Whether you need faster downloads, reduced storage costs, or improved user experience, smaller PDF files can deliver big benefits.
PDF has long relied on Flate compression for general-purpose data compression, but Brotli has emerged as a promising new approach that could significantly improve compression ratios.
Originally developed by Google for web content compression, Brotli has proven its effectiveness in reducing web page load times. Now, as outlined in the PDF Association's recent article on Brotli compression, this advanced compression algorithm is making its way into the PDF specification, promising to squeeze PDF files more tightly than ever before.
PDF's Current Compression Landscape
PDF currently supports multiple compression methods, each optimized for different types of data. Among the general-purpose options—including RunLengthEncoding and LZW encoding—Flate compression is the most reliable for consistent results across various content types. Flate compression, also known as Deflate, combines two compression techniques: the Lempel-Ziv algorithm (developed in the 1970s) and Huffman coding (from the 1950s) to create an efficient, lossless compression method. Flate works as a dictionary coder: it scans through binary data searching for repeating patterns, then replaces these patterns with shorter references to entries in a dynamically built dictionary. During compression, Flate builds this dictionary from scratch for each data stream, embedding the dictionary information within the compressed output. When decompressing, the algorithm reverses this process, using the embedded dictionary to reconstruct the original data. This approach works well because it adapts to the specific characteristics of each data stream, but it also means that every compressed stream must carry its own dictionary information.
Brotli: A Smarter Approach to Compression
Brotli takes a different approach that builds upon the same foundation as Flate, but it adds something new: a predefined dictionary. Rather than building a dictionary from scratch for each stream, Brotli starts with a dictionary that was synthesized using context modeling techniques to handle the most common data patterns found in web content (based on analysis conducted between 2013-2016). This predefined dictionary is Brotli's secret weapon. Since it already contains patterns commonly found in text, HTML, CSS, JavaScript, and other web content, Brotli can immediately recognize and compress these patterns without needing to "learn" them first. For patterns not covered by the main dictionary, Brotli supplements this with a sliding window dictionary that detects other repeating patterns, similar to the way Flate operates. The key advantage is efficiency: the more a data stream can leverage the predefined dictionary, the less dictionary information needs to be stored in the output, resulting in better compression ratios.
What This Means for PDF Compression
Brotli's effectiveness will largely depend on the type of content being compressed and how well it aligns with the patterns in its predefined dictionary. Content streams and font data are likely to be the biggest winners. Text content, font definitions, and structured data within PDFs often contain patterns similar to web content—repeated keywords, common phrases, standardized formatting codes, and similar structural elements. Image data is more complex. Losslessly compressed images contain data that's unlikely to match patterns in Brotli's web-optimized dictionary. For these types of content, Brotli may perform similarly to Flate, or potentially less efficiently because it is attempting to match its predefined dictionary.
Strategic Implementation Approaches
Given these characteristics, two implementation strategies are possible:
- Use Brotli as a drop-in replacement for Flate across all general-purpose compression needs. This approach banks on Brotli's average performance being better than Flate’s. The simplicity of this approach makes it ideal for many use cases.
- Implement logic to choose the optimal compression method for each data stream based on content type. This approach requires more sophisticated implementation but could yield maximum compression benefits by using Brotli where it excels and falling back to Flate or other methods where they perform better.
Looking to the Future
Brotli compression represents an exciting evolution in PDF optimization technology. By leveraging a predefined dictionary optimized for common content patterns, it offers the potential for significantly improved compression ratios, especially for text-heavy and structured content within PDF files. Of course, in order to get the maximum benefit from using Brotli in PDFs, it has to be verified as an interoperable extension of PDF, formally standardized, and widely adopted.