PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Figurative PDF file thinking about other file formats.

PDF-UX: Collections

Triggered by our own experience in preparing the cheat sheet collection, this article explores the capabilities of PDF collections and strengthens the case for all PDF implementations to do a better job at supporting collection creation and viewing.
About the author: Peter Wyatt is the PDF Association’s CTO and an independent technology consultant with deep file format and parsing expertise. A developer and researcher working on PDF technologies for more than … Read more
Peter Wyatt
Peter Wyatt
January 21, 2025

Article


Peter Wyatt


January 21, 2025

Article



A PDF portable collection (or, more simply, a “collection”) is a special container-like PDF file that packages multiple embedded files. Commonly referred to by terms such as “portfolios”, “packages” or “binders” in PDF applications, collections are a more user-friendly solution than ZIP or similar archives of files as the author can:

  • Define a cover document with custom content
  • Define a preferred layout and initial view for optimal user navigation
  • Define additional custom attributes of each file in the collection
  • Define the sort order of files

The files contained in a PDF collection can be anything – they do not have to be other PDF files although this is a common scenario - such as the PDF Association’s recent cheat sheet collection. The collection feature is designed in a backwards-compatible manner so that if a given PDF viewer doesn’t support collections, the author can have some assurance that a file list will be presented to end users.

Even with these clear end-user benefits over simple ZIP or similar archives, PDF collections are under-utilized. Today, users are often forced to perform multiple independent downloads of closely related PDFs instead of a single PDF collection to download, use, manage, share, and archive.

Triggered by our own experience in preparing the cheat sheet collection, this article explores the capabilities of PDF collections and strengthens the case for all PDF implementations to do a better job at supporting collection creation and viewing. PDF collections are already common in applications such as AEC (Architecture, Engineering and Construction), but the broader use of collections is growing. Just to take two examples:

PDF collections are built on existing capabilities

PDF collections rely on the inherent ability of PDF (starting with PDF 1.3 in 1998) to include embedded files of any type.

Although this feature made it possible to include a large group of embedded files in a PDF, there was no way to support or navigate a hierarchical folder structure of files.

The collections feature (introduced with Adobe PDF 1.7 and Acrobat 8 in 2006) effectively introduced a special kind of PDF file in which a “container” file encapsulates the intended content; a hierarchical structure of folders and files. The page(s) of the container PDF allows the author to provide any content or contextual information they wish, as well as facilitate a simple, backward-compatible mechanism accommodating software that did not know how to support collections, ensuring that all users enjoyed a clean experience and good UX, no matter what PDF software they had. Today, however, some PDF software does not even provide this most basic of experiences in supporting embedded files!

Thus, when a PDF collection is opened by an interactive viewer that supports collections, a different display, determined by the collection author, is presented. Typically, this display is some form of file list, for example, a detailed view or thumbnail images. Software that doesn’t understand the collections feature will not recognise the collection and thus fall back to displaying the container PDF’s pages, with the embedded files often made available via a flat file list in a dedicated navigation pane. This fallback drops support for all collection-specific features such as folders, custom schema attributes, navigators, advanced sort capabilities, etc.

A messy history

As mentioned above, PDF collections were first defined in 2006 (Adobe’s PDF 1.7 specification) and therefore were included in the first ISO standard defining PDF, ISO 32000-1:2008. Adobe subsequently defined additional proprietary extensions in Adobe Supplement to ISO 32000-1 BaseLevel 1.7, Extension Level 3 (available via our PDF Specification Archive), some of which were later adopted by the responsible ISO working group into ISO 32000-2 (PDF 2.0).

The Adobe PDF 1.7 extensions also included FLASH (SWF-based) navigators – a proprietary technology allowing authors to define powerful, highly customised user interfaces for their collections. With the proprietary nature and complete demise of FLASH technology, PDF collections created using this feature will no longer display the navigator, but will still be usable via the embedded file list fallback presentation described above.

Screenshot from an old SWF-based navigator showing custom navigation elements.
Carousel navigator (credit: Kelly Media Group)
Screenshot from an old SWF-based filmstrip-style navigator.
Filmstrip navigator (credit: Kelly Media Group)
Animated GIF of the SWF navigator options, including Click-through, Freeform, Grid, Linear and Wave.
The Acrobat 8 Wizard interface shows the different Flash-based navigators (credit: Adobe Acrobat promo video from 2011).

There are 3 standard collection viewing modes defined in ISO 32000-2:2020, Table 153, View entry):

  • D (for details mode): all details about each embedded file in the collection, including any custom attributes, are presented in a multi-column (spreadsheet-like) format. This mode provides the most detailed information to the user but may be very crowded if there are lots of files or many attributes.
  • T (for tile mode): each file in the collection is represented by a small icon and a subset of the detailed file schema information. This mode provides top-level information about the file attachments to the user.
  • H (for hidden mode): the collection view is initially hidden, but can be revealed via an explicit action. If a default document is also defined (either the container PDF or one of the embedded files), the capable viewer displays the default document.

Collections in PDF 2.0

PDF 2.0 introduced a new concept of navigators designed to avoid any external technology dependency. In addition to the previously defined hidden, detailed, and tile views, PDF 2.0 defines a list of standard navigator appearances (referred to as “named layouts”) that PDF viewers supporting the navigator feature in PDF 2.0 need to implement (see ISO 32000-2:2020, Table 160, Layout entry):

FilmStrip

A layout that displays a strip of thumbnails, providing an index to the file attachments within the collection. The selected attachment should be previewed alongside the index.

Illustration of a filmstrip interface.

FreeForm

A layout that places thumbnails of the file attachments within the collection randomly in the view.

Illustration of a freeform interface.

Linear

A layout that provides a large-size preview of one file attachment in the collection and displays alongside the preview the metadata for the file attachment, including the name, description and other collection schema entries.

Illustration of a linear interface.

Tree

A layout presenting the contents of the collection in a tree view, showing the folder structure and the files as leaf nodes of the tree, akin to a traditional file system folder view.

Illustration of a Tree interface.
The PDF specification avoids more precise definitions of these layouts to allow for and encourage differentiation and innovation (e.g. to support small screens, to aid accessibility/AT, etc). Additionally, PDF 2.0’s navigator Layout entry supports an array format, allowing authors to influence PDF applications as they fall back to named layouts or viewing modes until something known is supported.

The collection views defined in PDF 1.7 are still available with PDF 2.0 collections. Using either details mode (D) or tile mode (T) provides the best compatibility with legacy (PDF 1.7) software. Hidden mode (H) is defined as implementation-dependent, and thus may not provide a consistent experience if that is important.

Authoring Collections

When to use collections

With some exceptions, any closely related set of files is ideal for a PDF collection, regardless of the file format. Embedded files in PDF collections can always be compressed so collections will often be smaller than the same set of independent files. In addition, interlinked sets of PDF files in a collection can be made to work seamlessly thanks to Embedded Go-To actions (ISO 32000-2:2020, clause 12.6.4.4)

If the anticipated end users of the collection may not have up-to-date software, then using a PDF collection with appropriate encapsulated cover page content may be easier than providing independent instructions.

In one typical use case, such as in the PDF Association’s cheat sheets collection, all the embedded files are PDF documents. When every file contained in a given PDF collection is a PDF, users get a unified application-like experience. All-PDF collections can also support convenient inter-document bookmarking and linking (links in one embedded PDF referencing content in another embedded PDF) without a live internet connection.

PDF 2.0 also introduced a specialised use of collections known as an unencrypted file wrapper (ISO 32000-2:2020, clause 7.6.7). This feature mitigates an unpleasant user experience (e.g., generic errors such as “Cannot open PDF”) when encountering proprietary encryption algorithms. By encapsulating such encrypted PDFs within an unencrypted “wrapper” PDF, the content in the wrapper can be presented to users to provide guidance without impacting the integrity or security of the encrypted payload document(s). In addition, the unencrypted wrapper PDF can also contain different metadata to avoid PII or other information leakage, if the encrypted document metadata needs to stay protected.

Adding files to a collection

Adding files to an arrangement of hierarchical folders is the principal task of collection authoring. All PDF applications that we tested that supported the creation of collections provided this support, with many providing a simple interface for the additional step of adding file descriptions and viewing file details, such as file size, creation dates, compressed size, etc.

PDF has supported file descriptions since the introduction of the embedded files feature in 1998, so any legacy software that supports embedded files is very likely to display file descriptions even if collections are unsupported. This is especially important to a good fallback user experience if files in the collection have the same name but are in different collection folders since the fallback presentation is always a flat file list without folder names.

Cover content

Screen-shot of the PDF Association's Cheat Sheets collection cover-page.Like any other PDF document, the container PDF file that houses the collection has its own pages. The content of this page (or pages) is typically a single cover page telling end-users that the PDF is a collection and that software that supports collections may be required.

In our experience, PDF applications automatically create a single generic cover page for new collections that include the software vendor’s identity and brand. However, collection authors are free to replace or supplement this generic content with any content appropriate to their collection. For accessibility, all cover content should always be a Tagged PDF, but unfortunately, as of our testing, every vendor’s generic cover page seemed to ignore this very important aspect!

For example, the PDF Association’s collection of cheat sheets includes a custom cover page with our branding (not that of the tooling we used) with additional instructions relevant to our content, in a vendor-neutral manner. This cover page is not intended to be printed, so a custom page size was used with the container PDF configured to display the full page to reduce the need for scrolling. Additionally, we ensured that this custom cover page is Tagged PDF.

Navigators

Our experience with configuring navigators was disappointingly limited to the PDF 1.7 options (details mode, tile mode, or hidden mode that displayed the cover page). However, we fared better when defining a custom schema to capture details about embedded files, altering the detailed view columns, configuring sorting, etc.

Defining schema

The ability to use custom schema in PDF collections provides capabilities that are not matched by ZIP or similar archives. Without a custom schema, files added to a collection from an operating system will include only file name, modification date, and file size. Custom schemas allow additional custom attributes to be added to all embedded files in the collection, which then allows users to sort or filter in certain viewing modes, depending on the precise capabilities of their collection-aware PDF software.

In our cheat sheet collection, we defined a custom schema with a new attribute called “Topics”. The value of this attribute is a text string that lists the individual topics included in each cheat sheet, since some cheat sheets include topics that may not be immediately apparent from the file name.

Screenshot of the PDF Association's collection of cheat sheets.

Authoring other PDF features

Although PDF applications make file addition to collections trivially easy (via drag’n’drop), the same cannot be said for the pages and other features of the container PDF. Unfortunately, based on our recent experience across a range of applications, a lot of standard PDF functionality is disabled as soon as a collection is opened so, today, authoring cover pages for collections is far more difficult than it should be.

The container PDF is, after all, just a regular PDF with some additional information. PDF editor software should therefore make PDF features and functionality available for use with it, including tagging, accessibility support and checking, editing, the ability to digitally sign or password-protect the entire collection, etc.

When not to use collections

Because the embedded files in a collection are inside a container PDF file, collections may not be practical if users are expected to update or modify any of the contained files. Of course, collections can be used to distribute sets of editable files, but in-situ editing of the files inside the container is impractical.

For cyber-security reasons, PDF applications often constrain the embedded file types that can be accessed from within PDF files, including collections. Thus authors should avoid including zip files (let the PDF do the compression instead!), executable files, scripts, or other files posing potential security risks.

For encrypted documents with non-standard algorithms (such as DRM or eBooks), PDF 2.0’s “unencrypted wrapper file” feature, which was specifically designed for this use case, is likely preferable to generic collections.

End-user experience of Collections

Unfortunately, as of late 2024, comprehensive support for the full gamut of PDF 2.0 collection capabilities in popular end-user software appears limited, with support for the original PDF 1.7 collection feature also limited.

Fallback behaviour

If PDF applications do not support PDF collections, they will present users with the cover document from the container PDF and hopefully a simple flat (non-hierarchical) list of all the embedded files. For this reason, authors and applications should also set the container PDF’s document catalog PageMode entry to UseAttachments to best ensure that any embedded files navigation pane will be presented automatically.

Potentially critical information such as folder names and custom schema attributes will not be visible, so understanding the context of larger collections, or collections that include multiple files sharing the same name, will not be possible. Some PDF viewing software may not even show the list of embedded files!

Conclusion

For vendors of collection creation software:

  • Ensure your generic cover page is correctly tagged!
  • Ensure that the container PDF  document catalog PageMode is set to UseAttachments
  • Allow authors to set PDF 2.0 named layout navigators, as well as the PDF 1.7 viewing modes
  • Ensure that standard PDF functionality is available for collection cover pages so that collection authors can easily support accessibility, editing, annotations, digital signatures, password encryption, etc.

For vendors of interactive PDF viewers supporting collections:

  • Ensure your embedded file navigation pane is sufficiently resizable – authors may add many custom schema attributes with sorting!
  • Follow the collection author’s preference for the navigation viewing mode
  • Differentiate your software by supporting all the named layout navigators defined for PDF 2.0!

For authors:

  • Always add meaningful descriptions to each file in the collection – these should be visible even in legacy software that does not support collections;
  • Be sure that your cover page provides useful information to users with older software - it’s your content and your brand!;
  • Tag the cover page(s) of your collection to ensure it is accessible;
  • Add custom schema attributes relevant to your collection to make it more than “just a ZIP file with a cover page”

For users:

  • Choose capable software that supports PDF 2.0 collections;
  • Give constructive feedback to authors on how they can utilize PDF collections for convenient packaging of their publications;
  • Enjoy the advantages that PDF collections offer over ZIP and similar archives, including interlinked documents that don’t need internet connections and aren’t subject to web “link rot”.
WordPress Cookie Notice by Real Cookie Banner