PDF-UX: Page Labels

August 26, 2024

Does your PDF viewer fully support page labels? Does your PDF authoring software? Support for page labels can be a key factor in choosing a PDF viewer, especially for users dealing with complex documents.

About Peter Wyatt

A developer and researcher working on PDF technologies for more than 20 years, Peter is the PDF Association’s CTO and an independent technology consultant.

“Any person viewing a PDF [filing] must be able to enter a page number in the page search field on the tool bar or ‘Go To Page’ field under ‘View’ and have that number match the page number displayed in the filer’s PDF.”

From the website of the United States Court of Appeals for the 2nd Circuit.

BUSINESS NOTE

Comprehensive page label support can be a distinguishing factor in selecting a PDF viewer, as it is a critical consideration for end-users navigating complex documents.

When navigating or discussing a document, users commonly want to “go to page ix” or are told “see page 12” on the assumption that everyone shares a common understanding of the document’s pagination. Although PDF has unmatched capabilities for reliably displaying pages, facilitating the navigation of long or complex documents is often overlooked, resulting in a frustrating end-user navigation experience when specific content is not easily locatable.

End users rightfully expect that by entering a page identifier into their PDF viewer the corresponding page will be shown - it should not have to be a guessing game to locate a page! The ISO standards for PDF focus on the file format and in most cases provide no requirements, recommendations, or guidance to PDF software developers or document (content) authors/publishers. PDF’s Page labels feature (ISO 32000-2:2020, 12.4.2) defines the means by which WYSIHYN (What You See is How You Navigate) can be easily achieved.

This article considers how direct PDF page navigation is intended to work across modern desktop and mobile platforms and a wide range of PDF viewers. By following this guide, authors/publishers can ensure that their documents are not frustrating end-users, that developers of PDF viewers support descriptive page identifiers, and application developers ensure that the necessary PDF navigation data structures match the author's content.

Background

Screenshot of a legal document showing "iv" as the printed page number while the viewer shows a page label of "5". — Credit: One Legal blog post by Richard Heinrich.

In the US, several states have changed regulations to require the use of Arabic numerals precisely due to the lack of support for PDF Page Labels! However, the use of Roman numerals for labeling front matter or other forms of page identifications is still widespread and commonly required by many professions such as the publishing industry.

In longer or more complex documents with distinct sections, it is not uncommon to use Roman numerals for front matter page identifiers, descriptive page labels for back matter such as annexes or appendices, or even start a document with a page identifier other than 1. However, if the author/publisher fails to align the visible page identifiers on each page with the navigation data when users attempt to go to page “ix” or “12” they may end up at entirely different locations! Is that the 12th logical page, or the page with “12” in the footer, or somewhere else?

Page Labels in PDF

PDF’s Page labels (ISO 32000-2:2020, 12.4.2) feature supports Arabic numerals, upper- and lowercase Roman numerals, and upper- and lowercase alpha-based page sequences starting at any value greater than or equal to 1. An optional Unicode prefix string makes it possible to use any counting system representable by Unicode (such as Chinese numerals) to explicitly define a page label for each page.

Page labels in ISO publications

Typically, ISO’s front matter uses Roman numerals before switching to Arabic numerals. In preparing the no-cost sponsored edition of ISO 32000-2:2020 with Errata Collection 2, the PDF Association took less than a minute to add the necessary PDF page labels to improve the navigation experience of this 1,000+ page reference, as follows:

The two cover pages are labeled “Cover-A” and “Cover-B”;
The ISO front matter is identified with lowercase Roman numerals that match the page identifier visible in the footer of each page;
The main body of the ISO document has Arabic page labels that match the page number visible in the footer of each page;
The additional Errata pages appended to the end of the ISO document are labeled “Errata-1”, “Errata-2”, etc.

Best practice is to ensure that a PDF’s page labels always match the visible page identifiers, such as commonly occur in page headers or footers. This might seem obvious, but making it so is not always the case, resulting in significant frustration for users, especially if other navigation aids are lacking.

When navigating this document in a capable viewer, these page labels now ensure WYISIHYN!

Authoring

Ensuring that users can easily navigate their way through a document begins with authoring, with authors deciding (among other things) the appropriate page identifiers for their content.

Office suite and authoring application developers supporting export to PDF should ensure that any author-chosen scheme for page identification exports equivalent PDF Page Label data. Cases include:

Starting at pages other than “1”
Including Unicode prefixes
Blending Arabic, Roman and/or alphabetic page sequences.

Ideally, authoring applications should also warn users against long prefixes and duplicate page identifiers.

As an immensely powerful authoring system it is no surprise that LaTeX generates PDF Page Labels as necessary. This includes capabilities such as page labels for Chinese numerals and alternate text for non-Unicode systems such as Cistercian numerals as demonstrated by the provided LaTeX file “PageLabels.tex” (this file can be loaded into the online Overleaf system).

\pagenumbering{roman}
\section{Lowercase Roman page numbers}
Lowercase Roman 1 \newpage
Lowercase Roman 2 \newpage
Lowercase Roman 3 \newpage

\pagenumbering{alph}
\section{Lowercase alphabetic indicators}
Lowercase alpha 1 \newpage
Lowercase alpha 2 \newpage
Lowercase alpha 3 \newpage

\setmainfont{FandolSong-Regular}
\pagenumbering{zhnum}
\section{Traditional Chinese page numbers starting from 7}
\setcounter{page}{7}
Chinese 1 \newpage
Chinese 2 \newpage
Chinese 3 \newpage

LaTeX code demonstrating its flexibility in handling page numbering systems.

Your authoring app might use page labels, but does it export them to your PDFs?

In our GitHub repository we’ve provided simple Microsoft Word (“PageLabels.docx”) and Apple Pages (“PageLabels.pages”) test documents that utilize the built-in capabilities of these word processing formats for different kinds of page numbering across different sections for testing export to PDF capabilities.

As shown in the dialogs below, both Word and Pages include capabilities matching those of PDF Page Labels. Unfortunately, 5 of 5 popular Microsoft Word-to-PDF exporting engines and LibreOffice and Apple Pages all failed to export any PDF Page Label information resulting in complete WYISIHYN usability failure!

Microsoft Word page number format management dialog.

Apple Pages page number format management dialogs.

Post-creation editing

Various professional PDF applications provide means to apply PDF page labels to ranges of pages in existing PDF documents via menu options such as “Page Labels” or “Number Pages”. Various applications make it quick and easy to update page labels on contiguous ranges of pages.

A different screenshot of a page-label editing UI. The vendor is not identified.

Screenshot of page-label editing UI showing various options. The vendor is not identified.

PDF viewers

While most PDF viewers support arbitrary page navigation by entering a page number into an edit box, this often only works for those PDF documents that use Arabic numerals and start at page “1”.

Some PDF viewers do not consider the possibility of page labels beyond a few Arabic numerals (e.g., 9,999 pages). In many cases, the user is only allowed to enter Arabic numerals (“0”-”9”). This is problematic for pages identified with Roman numerals, alphabetic identifiers, or descriptive page labels that have prefixes (such as “Annex C-1”, “Annex C-2”, “Annex C-3”, etc., where only “Anne” might only be visible). Long page label prefixes may even block users from seeing the complete page identifier.

User-centered PDF viewers will accommodate page identifiers that include a prefix, may not start at “1”, and might use alternate numbering or alphabetic labeling instead. Capable viewers will allow users to input a broad range of page identifiers so that navigation naturally matches the visible page descriptors. If this matching fails, some PDF viewers will elect to fall back by treating the input as a PDF page number. This sort of behavior allows users to enter either Roman numerals or a prefixed page identifier, as both will then navigate to the expected page → WYISIHYN!

Although PDF page label prefixes can use Unicode, the ease of page identifier data entry must also be a consideration. Although a fancy prefix “§A-” (so pages are “§A-1”, “§A-2”, “§A-3”) may look good, this symbol is difficult for users to type (e.g., on mobile platforms) - and with less capable software entering Unicode might not even be possible! Mobile viewers with limited screen space also face challenges in presenting page identifiers with long prefixes. However, users of Chinese documents with Chinese numeral page identifiers might be expected to know how to enter Chinese numerals on their systems so considering the context of each document is important. The best practice is to keep page label prefixes short and appropriate for end users so that WYISIHYN is achieved across all platforms and as many viewers as possible.

Page index, number and label: assessing PDF viewers

Internally, PDF uses 0-based page indices, but humans (end users) expect page numbers to start at 1. Thus, when no page label data is present, viewers default to displaying the PDF page number. Fragment identifiers also rely on the 1-based page numbering scheme.

PDF Page Index (0-based)	PDF Page Number (1-based)	PDF Page Label (not including quotes)	Description
0	1	“i”	Lowercase Roman numerals starting from 1 (directly supported by PDF: /S /r)
1	2	“ii”
2	3	“iii”
3	4	“iv”
4	5	“1”	Western Arabic numbers starting at 1 (directly supported by PDF: /S /D)
5	6	“2”
6	7	“3”
7	8	“4”
8	9	“٤” (U+0664)	Eastern Arabic numerals starting at 4 are not directly supported by PDF (thus each page is individually labeled with /St n /P (…))
9	10	“٥” (U+0665)
10	11	“٦” (U+0666)
11	12	“٧” (U+0667)
12	13	“ Long Label - 11“	Lengthy prefix text followed by a Western Arabic number starting at 11 (directly supported by PDF: /S /D /St 11 /P (…))
13	14	“ Long Label - 12“
14	15	“ Long Label - 13“
15	16	“ Long Label - 14“

The test file “PageLabelsTest.pdf” contains 16 pages with a combination of lowercase Roman numbers (simulating front matter), Western Arabic numerals (simulating the normal page numbering system used in the body of many documents and that is predefined by PDF), Eastern Arabic numerals (simulating a counting system some authors may use and that is not predefined by PDF), and an inconveniently long page label prefix, as shown in the table above.

A user interface demonstrating support for various page numbering schemes.

End users will naturally expect that entering the Roman numerals “i” to “iv” will navigate directly to PDF page numbers 1-4 respectively while entering Western Arabic numerals “1” to “4” will navigate directly to the corresponding PDF page numbers 5-8. Entering a larger Western Arabic number (e.g. “9”) than any labeled page might also be expected to navigate to the corresponding PDF page.

Checking the test file’s overly-long page label prefix (in any viewer) helps to explain why such things should be avoided! Best practice is to keep page label prefixes short and simple!

Screenshot of thumbnail images of pages showing various types of page labelling.

Screenshot of a PDF forensic tool showing how page labels are encoded in PDF.

In July 2024, the PDF Association conducted an informal survey of popular desktop PDF viewers providing support for WYISIHYN navigation. Our assessment was based on a single test of whether entering the Roman numeral “iv” resulted in the page changing to PDF page 4.

The shortlist of viewers in our survey that support WYISIHYN navigation:

Adobe Acrobat
Apple Preview
Apryse PDF Studio Viewer
Apryse Xodo PDF Studio
FireFox (Mozilla pdf.js)
Foxit
PDFextra
PDF Reader Pro (Mac)
PDF XChange

Conclusion

Although the use of non-trivial page identifiers (meaning anything other than starting with a Western Arabic “1”) has been used in publishing for centuries and supported in PDF since PDF 1.3, there is still a long way to go to improve the user experience of arbitrary page navigation with PDF. This includes both office suite application developers adding the required PDF Page Label data when exporting to PDFs and PDF viewers providing WYISIHYN support for descriptive page labels.

Featured articles

Discover pdfa.org

Key resources

Get involved