a hardcopy book with bookmarks in it

PDF Parallelism – why and how to modify bookmark structures?

Dietrich von Seggern // July 14, 2023

Member News

Print Friendly, PDF & Email

PDF is more than digital paper. It has many features that are not always used. These features are based on optional entries in the internal PDF format. Two of these are designed to provide direct access to pages in a PDF file based on additional information related to these pages: Bookmarks (in the internal PDF format called "outlines") and document part metadata (DPart).

Bookmarks are usually used interactively and associate one or more (bookmark) names with pages.

Document part metadata (DPart) is more powerful and is typically used in automated processes. It is not limited to a single name, but uses metadata that consists of arbitrary Key-Value pairs. In addition it is possible to associate it with page ranges instead of just a single page.

Bookmarks and DPart are both hierarchical, but this means something completely different in each case. Bookmarks only refer to a single page and their hierarchy simply determines the way the bookmarks are presented to the user, i.e. on which level an item is displayed. In the document parts, hierarchy means something different: You may associate some DPart metadata with a range of pages and other metadata with a sub range of these pages.

In the internal PDF format, both features use a node structure that is parallel to the page structure. That enables software to access pages via these "alternative" structures without having to analyze all pages in their "regular" order. E.g. the bookmark structure can be displayed in a PDF viewer even if it has has not read all the pages, and the page it refers to can be accessed immediately when the user clicks on the bookmark.

For document parts that is even more useful since they have been designed for PDFs with several thousand pages, e.g. individualized documents such as postcards where each postcard has a different recipient. It is useful to encode the ZIP code in the document part metadata, which makes it easy for a processor program to select all postcards for a specified ZIP code, allowing you to print them in the right order for optimized mailing.

While these parallel structures are very helpful when present, they create issues when the page order is modified or if pages are added or deleted. You would then have to update these parallel structures too and that is not always straightforward. If you merge PDF files, where each of it has bookmarks, it is not even fully clear what should happen: Do you want to create new top level bookmarks that e.g. use the names of the original files, or would you rather keep the hierarchies as they are? In fact, many programs do not update these parallel structures when the page structure is modified.

Is it possible for another application to "repair" these structures later? Of course, you would have to understand what happened and in which way the structure is invalid. This might be the case because the page structure has been modified programmatically in automated or integrated workflows. In such cases, it would be helpful to have access to the parallel structures inside of the PDF file in order to adjust them to the "new" page structure.

To make this possible, we have updated our free Acrobat plug-in callas pdfDPartner 2 to not only export DPart structures to JSON, but also to import a (modified) structure back into the PDF file. The same is possible with pdfToolbox 14.3, which also supports importing and exporting "JSONized" bookmarks.


callas software finds simple ways to handle complex PDF challenges. As a technology innovator, callas software develops and markets PDF technology for publishing, print production, document exchange and document archiving. callas software helps agencies, publishing companies and printers to meet the challenges they face by providing software to preflight, correct…

Read more

ABOUT THE AUTHORS

Dietrich von Seggern
Dietrich von Seggern

Dietrich von Seggern received his┬ádegree as a printing engineer, and in 1991 started his professional career as head of desktop prepress production in a reproduction house. He became involved in research projects for digital transmission of print files, and moved to the German Newspaper Marketing Organisation (ZMG). There Dietrich was responsible for a project to enable the digital transmission of … Read more

ABOUT THE AUTHORS

Dietrich von Seggern

Dietrich von Seggern

Dietrich von Seggern received his┬ádegree as a printing engineer, and in 1991 started his professional career as head of desktop … Read more