PDF Fragment Identifiers
Peter Wyatt’s latest article focuses on improving the user experience for website visitors needing to access specific content in longer PDF documents.This article is for web designers, content creators, webmasters, and web browser developers. It focuses on improving the user experience for website visitors needing to access specific content in longer PDF documents. Websites can use URI fragments for PDF references so that when visitors need to interact with PDF content, the precise content can be referenced, for a quick and helpful experience instead of a generic and unfriendly “it’s somewhere in this long PDF document - work it out for yourself”.
Contents
- Background
- What can you do about it?
- What is a URI fragment?
- Do I need special software?
- Where are PDF fragment identifiers defined?
- Browser support for PDF fragment identifiers
- How do today’s most popular browsers stack up?
- How can PDFs be authored for optimal URL referencing?
- Case study of PDF fragment identifier use in GitHub source code
- Conclusion
Background
When referencing PDF documents from web pages, it is common to be linked to large PDFs where finding referenced information might be complex. This is typical across a variety of situations, including FAQs, referencing manuals, product information catalogs, references to specific chapters in books, articles in collections, etc.
What can you do about it?
A poor user experience is easy to avoid; both PDF’s specification and RFC 8118 have included a standardized set of fragment identifiers for decades that can be used to target specific content within a PDF. This is the identical concept and feature used in HTML to link to targets within a webpage using anchor tags and IDs.
So why are PDF fragment identifiers not used more widely on user-facing websites? As we describe below, all the most popular browsers support them to various degrees.
What is a URI fragment?
BUSINESS NOTE
PDF fragment identifiers added to links to PDF documents allowing the targeting of specific content (such as a page or heading), delivering users directly to the information they need. Most popular browsers support at least some fragment identifiers, including page and named destination referencing.
BUSINESS NOTE
PDF fragment identifiers added to links to PDF documents allowing the targeting of specific content (such as a page or heading), delivering users directly to the information they need. Most popular browsers support at least some fragment identifiers, including page and named destination referencing.
A URI fragment occurs after the URL and starts with a # character. Technically speaking, it refers to a subordinate resource of the primary resource identified by the URL. URI fragments are extremely common with HTML as this is how intra-page navigation works using the anchor tag and IDs. For example, the following HTML link references the above heading on this webpage:
https://pdfa.org/pdf-fragment-identifiers/#what-is-a-uri-fragment
In the case of PDF, the main resource is the PDF file itself, while subordinate resources can be specific pages, destinations, and other types of targets.
PDF URI fragments are typically key/value pairs separated by & (AMPERSAND) and with an = (EQUAL SIGN) between the key and the value. If the value requires multiple arguments then they are , (COMMA) separated. All keys and parameters also need to use percent-encoding. For example:
- https://site.org/book.pdf#page=6
- https://site.org/book.pdf#page=7&zoom=800,78,350
- https://site.org/book.pdf#search=%20multimedia%203D%22
Do I need special software?
No - all web browsers already understand URI fragments because they are a core part of navigating the web. It is simply a matter of augmenting the URL to a PDF by appending the desired fragment identifiers.
In their default configuration (without 3rd party extensions), as of early June 2024 the following browsers support PDF with different core underlying technologies so it is disappointing (but not unsurprising) that there are differences in the level of support for PDF fragment identifiers:
- Google Chrome uses pdfium (fragment identifier source code)
- Apple Safari uses WebKit technology with a custom PDF plugin (fragment identifier source code)
- Microsoft Edge uses Adobe technology
- Mozilla Firefox uses pdf.js (fragment identifier source code).
Note that this pdf.js wiki documentation is also out-of-date!
The above only applies to browsers in a standard configuration, without any third-party PDF plugins or extensions that may alter browser behavior. For example:
- the Adobe plugin for Google Chrome appears not to support PDF fragment identifiers;
- if Mozilla Firefox is configured to “Always ask” how to open a PDF file then PDF fragment identifiers are not supported;
- if Microsoft Edge is configured to download first, then PDF fragment identifiers are not supported.
Where are PDF fragment identifiers defined?
All standardized fragment identifiers for PDF are formally defined in Annex O of ISO 32000-2 (available at no cost) and duplicated in RFC 8118, Section 3 (did you notice that this link to RFC 8118 as HTML uses a URI fragment? in this case directly to the heading for Section 3!).
ISO 32000 defines two sets of fragment identifiers - the object identifiers (page, nameddest, structelem, comment, ef) and a set of open parameters (zoom, view, viewrect, highlight, search, fdf). The object identifiers define a specific object in a PDF file, while the open parameters refine the behavior or view that is presented.
Prior to ISO standardization of PDF 1.7 and widespread built-in browser-based PDF support, Adobe published a document called “PDF Open Parameters” (link to Web Archive) in April 2007 defining URI fragments for their technology. Most, but not all, of these fragments were adopted by ISO during the standardization process. In addition to the ISO-standardized fragment identifiers, Adobe also publicizes a fragment identifier consisting of only a named destination after the #, without any parameter name or = (EQUALS) sign.
Browser support for PDF fragment identifiers
Today’s popular desktop web browsers all provide built-in PDF viewing capabilities used by many people. By examining the current level of support for PDF fragment identifiers across common web browsers we can help both web designers and content authors understand what browser-independent experiences are already available for referencing PDF content and what would be possible if web browsers fully implemented standardized PDF fragment identifiers.
Below, we examine the PDF URI fragment support of the four most popular browsers on the desktop according to the Wikipedia article on “Usage share of web browsers” in their standard configuration. Each of these browsers uses different PDF technologies to provide its built-in PDF experience, however, configuration options and/or 3rd party extensions can be used to alter or replace this default behavior.
Note: this article does not attempt to evaluate the underlying PDF technologies used by these web browsers as some may be more capable and/or reconfigurable.
Navigation Feature | Google Chrome | Microsoft Edge | Mozilla Firefox | Apple Safari (MacOS) |
---|---|---|---|---|
Jump to a page | Y | Y | Y | - |
Outline (bookmarks) navigation pane | Y | Y | Y | - |
Thumbnails navigation pane | Y | Y | Y | - |
Embedded file (attachments) navigation pane | - | - | Y- | - |
Layers (optional content) navigation pane | - | - | Y | - |
Shows annotations | Y | Y | Y | Y |
Zoom | Up to 500% | Y (% not stated) | Up to 1000% | Y (% not stated) |
Document properties | Y | Y | Y | - |
XMP metadata | - | - | - | - |
Rotate view | Y | Y | Y | - |
Two-page view | Y | Y | Y+ | - |
Full-screen view | Y | Y | Y | Y |
Text selection | Y | Y | Y | Y |
Basic editing & markup | - | Y | Y | - |
- As of early June 2024, Mozilla Firefox has limitations in showing all file attachments (as we reported) until all pages have been processed. The workaround is to navigate to the last page and force the list to be refreshed.
- Mozilla Firefox also provides additional options.
PDF fragment identifier tests
The following URLs reference PDFs with various features that can be used by, or influence, PDF fragment identifiers.
Note that for comprehensive testing, it’s important to test with different-sized browser windows, as some PDF viewers are limited in their zoom factors, or may alter their user interface layout based on browser window size, etc. Mobile devices, of course, present a similar set of complexities, but we didn’t review mobile viewers for this article.
With most of these tests, we’ve also provided an object path to a specific PDF object in the PDF document object model (DOM) referenced by the URL’s fragment identifier. This object path can be used by developers using PDF forensic tools such as the Apache PDFBox Debugger to directly navigate to the related PDF object when viewing the internal structure of the PDF.
For example, when using Apache PDFBox Debugger first switch to the “Internal structure” tree view via the View menu, paste the object path into the textbox as indicated in the screenshot below, and then press RETURN:
https://labs.pdfa.org/FragmentTest.pdf - This URL has no PDF fragment identifiers; it’s included merely as proof that a PDF exists at this URL. Note that this PDF has PageMode explicitly set to show file attachments, so the embedded file navigation pane should be shown by default (if the implementation provides such a navigation pane).
Object path: Root/PageMode
https://labs.pdfa.org/FragmentTest.pdf#page=2 - open to page 2. Note that fragment identifiers use 1-based numbering (i.e. the first page is 1, not 0), however forensic tools such as Apache PDFBox Debugger will use zero-based indexing.
Object path: Root/Pages/Kids/[1]
https://labs.pdfa.org/FragmentTest.pdf#nameddest=Table - open to the named destination “Table” (case sensitive) using the standardized ISO 32000 and RFC 8118 defined parameter.
Object path: Root/Names/Dests/Names/[4] (as shown in screenshot above)
https://labs.pdfa.org/FragmentTest.pdf#Table - open to the named destination “Table” (case sensitive) using Adobe’s non-standardized method (not using a parameter name).
Object path: Root/Names/Dests/Names/[4] (as shown in screenshot above)
https://labs.pdfa.org/FragmentTest.pdf#page=2&zoom=400 - open to page 2 and set the zoom to 400%. Try with different-sized windows to confirm the zoom level.
Object path: Root/Pages/Kids/[1]
https://labs.pdfa.org/FragmentTest.pdf#page=3&view=FitH - open to page 3 and fit the width of the page to the browser window. Try with different-sized windows to ensure that resizing occurs.
Object path: Root/Pages/Kids/[2]
https://labs.pdfa.org/FragmentTest-IDtest2.pdf#structelem=red - targets the structure element ID “red” on page 3. This fragment identifier should show page 1 if no structure element with this ID is present.
Object path: Root/StructTreeRoot/K/[0]/K/[23]/ID
https://labs.pdfa.org/FragmentTest.pdf#page=3&comment=7037660c-033e-41c9-a8cb-f974bb5fca55 - open a file attachment annotation on page 3. The page parameter is required as annotation NM entries are not guaranteed unique in a document.
Object path: Root/Pages/Kids/[2]/Annots/[0]/NM
https://labs.pdfa.org/FragmentTest.pdf#page=1&comment=379f6100-9002-4bc6-8b72-b1dda08e3548 - open a strikethrough annotation on page 1. The page parameter is required as annotation NM entries are not guaranteed unique in a document.
Object path: Root/Pages/Kids/[0]/Annots/[2]/NM
https://labs.pdfa.org/FragmentTest.pdf#ef=cat.pdf - reference the embedded PDF file called “cat.pdf” listed in the EmbeddedNames name tree of the Document Catalog. Note that support for embedded files and the ef parameter is subject to cybersecurity concerns as documented in both ISO 32000 and RFC 8118.
Object path: Root/Names/EmbeddedFiles/Names/[0]
https://labs.pdfa.org/FragmentTest.pdf#navpanes=0 - do not show any navigation panes, such as bookmarks, thumbnails, layers (optional content) or file attachments. Note that this PDF has the Document Catalog PageMode key set to show attachments by default (since the PDF contains 2 embedded files) so this tests whether the PDF viewer prioritizes the PDF setting over the URL fragment.
Object path: Root/PageMode
https://labs.pdfa.org/FragmentTest.pdf#navpanes=0&toolbar=0 - do not show any navigation panes or toolbars. Note that this PDF has the Document Catalog PageMode key set to show file attachments by default (since the PDF contains 2 embedded files).
Object path: Root/PageMode
https://labs.pdfa.org/FragmentTest-NoPageMode.pdf - this PDF does not have a PageMode entry in the Document Catalog.
Object path: Root
https://labs.pdfa.org/FragmentTest-NoPageMode.pdf#pagemode=bookmarks - select the outline (bookmarks) view in the navigation pane (not overridden by any PageMode setting in the PDF).
Object path: Root/Outlines
https://labs.pdfa.org/FragmentTest-NoPageMode.pdf#pagemode=thumbs - select the thumbnail view in the navigation pane (not overridden by any PageMode setting in the PDF). The PDF file does not contain thumbnail images (Thumb entry on Page objects).
Object path: Root/Pages/Kids/[1]
https://labs.pdfa.org/FragmentTest-NoPageMode.pdf#pagemode=none - do not show any navigation panes, such as bookmarks, thumbnails, layers (optional content), or file attachments.
https://labs.pdfa.org/FragmentTest.pdf#search=%22pretium%22 - perform a search (there are 7 hits in this test PDF for the pretend word “pretium” with 4 hits on page 1).
https://labs.pdfa.org/FragmentTest.pdf#page=2&highlight=400,400,500,500 - highlight a specific rectangular area on page 2. Note that the nature of the highlighting is defined as implementation-dependent.
https://labs.pdfa.org/FragmentTest.pdf#page=2&viewrect=10,10,100,100 - view (zoom) a specific rectangular area on page 2
https://labs.pdfa.org/FragmentTest.fdf - proof that the FDF file exists at this URL. This FDF file contains a “sticky note” annotation on page 1 (the other annotations such as the highlight and strikethrough on page 1 and the file attachment annotation on page 3 are in the PDF).
https://labs.pdfa.org/FragmentTest.pdf#fdf=https%3A%2F%2Flabs.pdfa.org%2FFragmentTest.fdf - open the PDF and merge the FDF file from an absolute URL.
https://labs.pdfa.org/FragmentTest.pdf#fdf=FragmentTest.fdf - open the PDF and merge the FDF file from a relative URL (same as the PDF).
Notes:
- Caching by browsers can cause issues for both display and for processing URI fragments. Try closing the browser and reopening it before clicking the URLs above to ensure a fresh test each time. In some cases, clicking or interacting with the PDF in the browser is needed to force a screen refresh.
- Some browsers do not support navigation panes or have limited options as to which navigation panes they support.
- Some browsers do not support PDF annotations.
- We reported an issue that Mozilla Firefox (pdf.js) had with not listing all files in its file annotation navigation pane. This is an unfortunate consequence of their architecture and design decisions.
How do today’s most popular browsers stack up?
The following table summarizes the results of testing modern PDF fragment identifier support in the most popular browsers in their default configuration without 3rd party plug-ins or extensions across both Windows and Mac OS (no platform differences were identified for those browsers that are multi-platform). As mentioned above, mobile platforms were not evaluated.
Fragment Identifier | Where defined? | Google Chrome | Microsoft Edge | Mozilla Firefox | Apple Safari (MacOS) |
---|---|---|---|---|---|
page | ISO | Y | Y | Y | Y |
nameddest | ISO | Y | - | Y | Y |
structelem | ISO | - | - | - | - |
comment | ISO | - | - | - | - |
ef | ISO | - | - | - | - |
zoom | ISO | Y | Y | Y | - |
view | ISO | Y | Y | - | - |
viewrect | ISO | - | - | - | - |
highlight | ISO | - | - | - | - |
search | ISO | - | - | Y | - |
fdf | ISO | - | - | - | - |
Only a named destination | [1] | Y | Y | Y | - |
navpanes | [2] | Y | - | - | - |
toolbar | [2], [3] | Y | - | - | - |
pagemode | [2], [3] | - | - | Y | - |
- https://helpx.adobe.com/acrobat/kb/link-html-pdf-page-acrobat.html
- Adobe “PDF Open Parameters” (web archive), April 2007 document from Adobe Acrobat SDK 8.1 that added comment, collab, statusbar, messages, navpanes, and fdf parameters, and removed the help parameter.
- Adobe “Technical Note #5438” (web archive) May 2003 original document for Adobe Acrobat 6.0 detailed PDF fragment parameters. The proprietary help parameter is only documented in this TechNote and was not carried forward into the 2007 update.
As described above, the level of basic PDF navigation features in these default web-browser viewing environments varies and, consequently, their support for some PDF fragment identifiers will also vary. For example, zoom magnification levels may be limited, FDF/XFDF import may not be supported (so logically the fdf fragment identifier would be unsupported), etc.
In addition, some of the browsers have open feature requests to complete their support for PDF fragment identifiers, such as this issue for the view parameter for Mozilla’s pdf.js.
As web browsers are constantly updated, support for PDF Fragment Identifiers may change (hopefully for the better!).
When authoring documents that will be directly referenced from URLs, authors might consider the following steps to directly position end users at the appropriate location in the PDF files they access over the web. As mentioned above, unsupported fragment identifiers are ignored, so there is never a downside to using PDF fragment identifiers to aid users with more capable software:
- For the most portable experience across today’s web browsers, separate core content by page so that page referencing is improved. PDF file size is typically not significantly impacted by increased page counts, however, if future updates to those documents change the pagination the fragment identifiers will also require updating.
- Use meaningful named destinations. The use of consistently named destinations can ensure existing URLs will continue to work with future changes to a document’s layout, thus avoiding the churn of updating webpages every time the PDF changes. Adding such rules to authoring or house style guidelines, especially for longer technical documents, can ensure that technical writers (who write the PDF documents) and web developers (who write the web pages with the URLs) can share a common understanding, reducing the maintenance of links between web pages and PDF content.
- Keep content in a single column - this reduces the need to use zoom, view, or viewrect which are both not widely supported and have limitations (such as zoom level and readability on small screen devices). Try to keep the use of URI fragments uncomplicated, since unsupported parameters are ignored and fallback behavior should just open to the correct PDF page.
- Set viewer preferences and page mode settings appropriately in all PDFs. This ensures that the majority of viewing environments can display the most appropriate user interface for your documents, even if the non-standard URI fragments are not supported.
- Ensure that PDF documents include an outline (bookmarks). This feature assists end-user navigation, and when combined with named destinations, can allow a URL to target a heading or other location in your document.
Remember also that unsupported URI parameters are ignored so adding parameters supported by a single web browser will still work with those less capable browsers. Thus, it is always best to define the best possible set of fragment identifiers for the best possible experience.
Case study of PDF fragment identifier use in GitHub source code
As software developers are both technically savvy and must often reference highly specific information in source code and related documentation, we were curious to know if code on GitHub reflected any existing awareness of PDF fragment identifiers.
Using GitHub’s basic search capabilities we found clear evidence that PDF fragment identifiers are known and used, but (so far) this use tends to be limited to page-level referencing:
- “.pdf#page=” → 88.1K code files
- ".pdf#search=" → 1.4K code files
- “.pdf#nameddest=” → 856 code files
- “.pdf#view” → 844 code files (note that this search term purposely does not include an EQUALS sign so that both view= and viewrect= parameters can be located)
- “.pdf#structelem=” → no hits!
Even when we search for the original Adobe-specific PDF fragment identifiers first documented in Adobe Technical Note #5438 for Acrobat 6.0, May 2003 (web archive) or the Adobe Acrobat SDK 8.1 “PDF Open Parameters”, April 2007 (web archive) but later dropped are searched, usages are found (presumably in legacy code or documentation):
- “.pdf#toollbar=” → 3.5K code files
- “.pdf#pagemode=” → 114 code files
- “.pdf#scrollbar=” → 17 code hits
- “.pdf#highlight=” → no hits!
Conclusion
Adding appropriate PDF fragment identifiers to the end of URLs to target specific locations in longer PDF documents can provide a far better and immediate user experience, including for users who are less savvy at navigating PDF files.
Given the rapid growth in applications generating PDF logical structure (content semantics), it makes long-term sense to define business rules for key content locations in documents that can persist across multiple updates to that document. Referencing by page number can change if content is added, deleted, or moved. But by using URLs with the nameddest parameter and a controlled value, URL maintenance can be reduced.
Example of a PDF fragment identifier that might change if the target PDF is altered and content moves:
https://site.org/documentation.pdf#page=37
Example of a content-specific PDF fragment identifier that is more robust against document iterations:
https://site.org/documentation.pdf#nameddest=section5.1.2
Given the richness of modern web technologies, it is not usual for websites to indicate preferred browsers or browser versions for the best possible experiences - and this extends to PDF. Unfortunately support for the full standardized set of PDF fragment identifiers is lacking across the most popular browsers as seen in the table above; today, only page-level targets are widely supported. However, the design of URI fragments means that you can still add refining URI fragment parameters for those users who proactively choose to use more capable technology - it doesn’t hurt to help!
If you can control the browser choice in your IT environment, then your ability to leverage PDF URI fragment identifiers can be vastly improved. Alternatively, your internet or intranet website might provide a hosted PDF viewing experience where you can more closely control and configure the end-user experience - speak to PDF vendors to learn what is possible.
The PDF Association’s free Color Cheat Sheet also includes a condensed summary of PDF fragment identifiers on page 2. (PDF fragment Identifiers are not color-related - this placement is merely a consequence of an optimized layout across all cheat sheets!).
We hope that the browser development teams at Google, Microsoft, Mozilla, and Apple will pay closer attention to the needs of their end-users when accessing web-delivered PDF content. This includes fully supporting a broader set of ISO-standardized PDF fragment identifiers in their default configurations.