PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Brotli compression coming to PDF

March 25, 2025
Brotli compression is coming soon to PDF 2.0, enabling smaller file sizes. Sample PDF files are available to kickstart development!
Robin Watts
About Robin Watts

Long-time PDF developer. Works on Ghostscript and MuPDF.

Peter Wyatt
About Peter Wyatt

A developer and researcher working on PDF technologies for more than 20 years, Peter is the PDF Association’s CTO and an independent technology consultant.


Background

The PDF format is over 30 years old. All things considered, it’s standing up pretty well.

Of course, technology has advanced massively since those early days. Today, the PDF community is working to incorporate some of those advancements into the PDF world.

At the Symposium on Advancing the PDF Imaging Model in October 2023, the PDF Association launched a public request for stakeholder feedback regarding what aspects of PDF could be usefully updated. At that event various subjects were explored, including HDR, new color capabilities, fonts, animation, accessibility, new 2D and 3D graphical formats, and improved compression methods.

While improvements in several areas are the subject of active work in the PDF Association’s Imaging Model TWG, this article deals with one method for optimizing the size of stored PDF files.

BUSINESS NOTE

A new Brotli compression filter is coming soon to PDF 2.0, enabling smaller file sizes. Sample PDF files are now available to kickstart development!

Compression in PDF today

The PDF standard allows various methods for compressing data streams in PDF files, which are known as ‘filters’ (ISO 32000-2, §7.4).

PDF already includes several 2D image-specific compression filters; JPEG (the DCTDecode filter), JPEG 2000 (the JPXDecode filter), and JBIG2 (the JBIG2Decode filter). Other filters provide lossless general-purpose compression for any kind of data. As PDF originated in PostScript, it inherited the two general-purpose compression schemes defined for that language: LZW (LZWDecode) and Deflate (FlateDecode).

Old as they are, LZW and Deflate remain reasonably fast and effective for a wide range of data. Indeed, the Deflate algorithm is the basis of the zip and gzip utilities, the most popular compression tools today. Nonetheless, compression technology has advanced, and there are now better alternatives for specific cases, such as PDF.

Does size still matter?

Wait, why do we care about PDF file size? Hard drives are cheap. Network connections are fast. Cloud storage is everywhere. Memory sizes are huge compared to 30 years ago. Isn’t this all less important now?

All that can be true, but PDF is nonetheless used in various resource constrained environments. Typical consumer devices such as scanners, home printers, e-readers, and kiosks still benefit from small file sizes. Processing a PDF file directly from memory provides significant performance benefits over processing from secondary storage or mechanical hard disks.

In addition, the size of PDF files is increasing. As camera and computer display resolutions and fidelity improve, illustration and authoring tools become more capable, and 3D and rich media content become more prevalent, the size of content in PDF documents increases.

The graph shows files trending larger over time.
Scatter plot of the 8M file CC-MAIN-2021-31-PDF-UNTRUNCATED corpus illustrating growth in PDF file size over time.

Many organizations have already accumulated huge “data lakes” containing millions and even billions of PDF files, as illustrated in the slide below from the PDF Days Europe 2016 keynote address “PDF State of the Union” (YouTube video). Thus, even a small reduction in file size can yield enormous savings. The world produces even more data today than back in 2016, so reducing the space required to preserve documents securely in the future has important sustainability benefits.

A slide from the deck referenced in the caption providing statistics about the number of PDF files held by large organizations.
Credit: PDF Days Europe 2016, “PDF State of the Union” keynote address by Leonard Rosenthol, (YouTube video)

Requirements for a new PDF compression scheme

In the decades since the introduction of PDF, many new compression schemes have been invented, but sadly, not all of them are suitable for use in PDF. To be successful in PDF, any new compression scheme must be:

  • Modest memory and hardware requirements, especially for reader software. Any algorithm that requires large amounts of memory or high-performance CPUs for decompression is instantly unsuitable. Ideally, compression should be similarly cheap, but creation typically happens just once, whereas consumption happens multiple times across all devices and platforms.
  • No licensing or IP requirements. As PDF is an open international standard implemented across all platforms and devices without any licensing requirements, a patent- and license-free compression scheme is critical.
  • A well-specified, open specification. Any algorithm chosen for PDF must be well-defined and not based only on a single implementation. To ensure that PDF can operate as a reliable archival format, it’s unacceptable that implementations might differ in how they decompress data!

With these requirements in mind various compression schemes, including ZStd, XZ, and Brotli. Experts in the PDF Association’s PDF TWG undertook theoretical and experimental analysis of these schemes, reviewing decompression speed, compression speed, compression ratio achieved, memory usage, code size, standardisation, IP, interoperability, prototyping, sample file creation, and other due diligence tasks.

The following table shows the results of using Brotli compression with the Adobe PDF 1.7 specification, a large 30.95MB, 1,310 page PDF file.

Table 1: Brotli compression on a single file
Method * File size (bytes) Peak Memory Usage  (for compression) Compression time (seconds)
Original PDF (Flate) 32,453,129 n/a n/a
Uncompressed 65,991,269 216,326,567 1.7
Uncompressed + ObjStm 58,260,962 229,985,606 1.7
Flate (Default) + ObjStm 16,836,472 216,326,567 2.3
Flate (Max.) + ObjStm 16,632,995 231,985,129 5.1
Brotli (4) + ObjStm 16,282,222 252,000,242 2.2
Brotli (5) + ObjStm 14,910,533 254,018,508 2.5
Brotli (7) + ObjStm 14,855,977 260,320,401 2.9
Brotli (8) + ObjStm 14,846,516 268,698,321 3.0
Brotli (9) + ObjStm 14,830,857 287,540,729 9.5
Brotli (10) + ObjStm 13,184,281 291,439,441 17.2
Brotli (11) + ObjStm 13,073,110 296,751,433 47.9

* “+ ObjStm” means the output PDF utilizes compressed cross-reference streams and compressed object streams to maximize benefits of compression. Numbers in brackets indicate the compression level setting used – the larger the number, the higher the compression level.

While it might be tempting to consider adding support for all of these various compression models to PDF, this is a bad idea. Forcing every PDF reader to cope with one new compression format is enough of an imposition — asking them to cope with multiple formats is unreasonable.

In the event, Brotli emerged as the clear winner.

What is Brotli?

Brotli is a compression algorithm developed by Google. It was initially intended for web use and is now a web standard defined by RFC 7932. Brotli is a development of the LZ77 compression scheme, which underlies Deflate, ZStd, and XZ. Since support for Brotli is now required for various web standards (including web fonts), many systems already contain a decoder, minimizing the risks and incremental cost of adding Brotli to PDF.

Brotli decompression speed and memory use are about the same as Deflate, the de facto standard compression used with PDF. Compression speed tends to be slightly better for the same achieved compression ratio, but extreme compression settings (where the output size is significantly smaller) can be noticeably slower. Thanks to adoption by the web, various implementations exist in multiple programming languages.

As such, Brotli can be seen as “a better Deflate”.

Public prototype implementation

As part of researching alternatives and conducting due diligence, several PDF Association members prototyped Brotli support. This work enabled the creation of sample files, proof of interoperability between vendors, and experimental data that supported the benefits of Brotli compression.

As of March 2025, the current development version of MuPDF now supports reading PDF files with Brotli compression. The source is available from github.com/ArtifexSoftware/mupdf, and will be included as an experimental feature in the upcoming 1.26.0 release.

Similarly, the latest development version of Ghostscript can now read PDF files with Brotli compression. File creation functionality is underway. The next official Ghostscript release is scheduled for August this year, but the source is available now from github.com/ArtifexSoftware/Ghostpdl.

These prototypes, and the example files (ZIP file download), represent the current version of the forthcoming specification, which may evolve further before release. As such, they should be used as indications of the potential benefits, not guarantees of a final specification.

Those interested in PDF internals and the potential benefits of Brotli compression may be interested in the new Mutool audit facility. This tool analyzes PDFs and produces a report on how space within them is used.

Concrete benefits

To assess the concrete benefits and establish interoperability, two independent implementations were prototyped to support the Brotli compression filter for both file creation and reading.

File size optimization often involves other changes that do not affect the PDF’s visual appearance, but contribute to reducing file size. These changes may include the use of cross-reference streams, the use of object streams, removing whitespace, enabling object reuse, inlining objects, image recompression, choice of compression filter for each stream, etc. These processes are often based on proprietary algorithms. Accordingly, a side-by-side comparison between an exact feature set of optimizations across prototypes was not feasible.

For this comparison, Prototype A’s implementation was modified such that the side-by-side benefit of Brotli versus Flate compression schemes could be assessed. Both implementations successfully processed all PDF files with Brotli compression, demonstrating full interoperability and compatibility.

Table 2: Prototype A compression results
Test file description Original size Prototype A Brotli size Percentage Reduction Prototype A Flate size
Single A4 page color detailed town map. 325,723 294,377 9.6% 310,718
A multipage monochrome document containing vector art and text. 2,660,844 1,305,278 50.9% 2,059,294
A graphically rich multipage color document with transparency, advanced compositing and graphics. 1,086,400 871,896 19.7% 929,112
A 1,310-page book represented by the Adobe PDF 1.7 reference manual. 32,453,129 13,075,019 59.7% 16,645,395
Table 3: Prototype B compression results
Test file description Original size Prototype B Brotli size Percentage Reduction
Colored 52 A4 page technical specification with tables and various fonts. Tagged PDF. 455,840 405,213 11.1%
Colored 13 A4 page PDF/UA-1 and PDF/A-2A compliant document with lots of large tables. 348,147 335,827 3.5%
Colorful 8 page spreadsheet-like document with tables and graphics. Tagged PDF 802,818 785,257 2.2%
Colored 70 A4 page PDF/UA-1 technical specification with tables and various fonts. 468,031 424,382 9.3%
Colored 57 A4 page technical specification with tables and various fonts. Tagged PDF 769,457 583,899 24.1%
A dense full-page A3 spreadsheet with shaded cells. Tagged PDF. 368,189 189,723 48.5%

Note: some PDF files cannot be shared due to licensing and/or copyright.

Prototype files

A ZIP file containing 3 prototype PDFs that use Brotli compression via the BrotliDecode filter are provided (ZIP file download). These PDFs are representative samples created by the current prototype implementations and can be used to ensure that current PDF implementations robustly handle unexpected filters on different kinds of stream objects.

Table 4: Representative example files
Example PDF version Number of Brotli compressed streams Total number of streams
Brotli-Prototype-FileA.pdf 2.0 30 30
Brotli-Prototype-FileB.pdf 2.0 70 150
Brotli-Prototype-FileC.pdf 2.0 111 258

Conclusion: a breaking change

Introducing a new compression filter to PDF is a significant breaking change as there is no backward compatible fallback - if a PDF processor does not support BroltiDecode then data such as compressed cross-reference streams, compressed object streams, content streams, or any other type of resource will not be readable and the document will fail to be processed. Thus, it is critically important to communicate our intentions to the marketplace so that developers may prepare - “Preparing for PDF file “from the future”” and “Don’t risk losing users’ trust: future-proof your PDF implementations” both spoke in general terms, while this announcement is far more specific.

The formal technical specification for the new Brotli filter (BroltiDecode) in PDF 2.0 is still under development but will be completed soon. However, a selection of very simple PDF sample files using the new Brotli filter is provided now to help developers get a head start (ZIP file download).

Please join the PDF Association if you would like to get involved in helping shape the future of PDF.

WordPress Cookie Notice by Real Cookie Banner