Glossary of PDF terms
This resource provides end users and non-technical readers with a glossary of the acronyms and terms with lay-person definitions commonly encountered when discussing or describing the Portable Document Format (PDF). Technical readers should always refer directly to the appropriate ISO publication for precise technically accurate definitions. Additional terms are also defined in many PDF ISO standards which can be previewed in the ISO Online Browsing Platform (OBP).
The PDF Association also maintains a glossary of accessibility terminology specific to PDF technology.
A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z
Action | An action refers to PDF features that enable automatic behaviours triggered by a user interaction or event, such as displaying a different page in a document when a bookmark is clicked, performing a calculation with form data, or playing a sound or video. For technical details see clause 12.6 in ISO 32000-2:2020. |
Annotation | Annotations are a special PDF feature most commonly associated with commenting and reviewing a document, such as highlighting text, text strikethrough, or sketching on top of a document. However PDF 2.0 defines 28 different kinds of annotations which provide a far richer feature set including URLs (Link annotations), watermarks, form widgets, interactive 3D content, redaction, sound, movies and other rich media. For technical details see clause 12.5 in ISO 32000-2:2020. |
Associated File (or AF) |
PDF/A-3 (ISO 19005-3:2012) introduced the concept of associated files to PDF 1.7. This associates one or more embedded file streams with any PDF object using an AF array entry, along with a semantic relationship defined by the AFRelationship key in the file specification dictionary. The semantic relationship supports concepts such as Source, Schema, FormData, etc. See PDF 2.0 Application Note 002: Associated Files. |
Attached file |
This term usually refers to files embedded in a PDF file, especially in the context of File Attachment annotations. See PDF 2.0 Application Note 002: Associated Files. |
AT | AT is commonly referred to in the context of PDF/UA, and means assistive technology. Assistive technology supports those users with disabilities to access and navigate PDF documents, such as via screen readers, color and contrast adjustment, screen magnifiers, etc. |
Bookmarks | Bookmarks are an informal term for PDF’s Document Outline feature. Bookmarks are commonly displayed in a separate navigation pane to aid document navigation and are a technically distinct feature from headings in content. For technical details see clause 12.3.3 in ISO 32000-2:2020. |
Conformance Level | PDF Conformance Levels are represented by letter designators with a PDF ISO subset acronym, such as PDF/A-1b, PDF/A-4e, PDF/X-5pg, PDF/VT-2s. Each Conformance Level relates to a specialized definition in the corresponding PDF ISO subset with its own very specific set of rules and requirements. For example, PDF/A-4 is the PDF-for-archival standard supporting PDF 2.0, with PDF/A-4e being a highly specialized refinement of PDF/A-4 supporting engineering workflows with 3D content (hence the "E" designator) while PDF/A-2b (basic) and PDF/A-2u (Unicode) differ in requirements related to Unicode text extraction capabilities. Not all PDF ISO subsets use conformance levels. |
Collection |
PDF Collections (or "PDF portable collections") were introduced with PDF 1.7 and are also known by several informal terms such as "portfolio" or "package". These special PDF files contain a collection of embedded files in folders and allow the author basic control over the presentation of the collection of files - they can simplistically be thought of as a ZIP container with a user interface. |
Conventional PDF |
A colloquial term referring to PDF files that do not use cross-reference streams or object streams. Such PDFs will therefore always include the xref , trailer and startxref keywords. |
COS | COS is the acronym for "Carousel Object Syntax", which is the syntax used by PDF and FDF files and is fully described in ISO 32000-2:2020. It is what you see if you look inside a PDF file. "Carousel" was the codename for Acrobat 1.0 when this syntax was first introduced by Adobe. It has also been described as a recursive definition for "COS Object Syntax". |
Cross-reference table |
The common colloquial usage of the term "cross-reference table" refers to information stored in a PDF file below the xref keyword in a conventional PDF, or in a cross-reference stream (PDF 1.5 and later).However, the formal definition in ISO 32000 defines "cross-reference table" to be a data structure comprising information from all cross-reference sections and cross-reference streams in a PDF file that contains information that permits random access to all indirect objects within the PDF file (see 7.5.4 "Cross-reference table"). |
Cross-reference entry |
A "cross-reference entry" is a formal term of art defined in ISO 32000 for conventional PDFs. It refers to the fixed-length 20-byte lines that define in-use (n keyword) or free objects (f keyword) in each cross-reference subsection. |
Cross-reference section |
A "cross-reference section" is a formal term of art defined in ISO 32000 for conventional PDFs. It begins with a line containing the keyword xref followed by one or more cross-reference subsections. See 7.5.4 "Cross-reference table". The common colloquial usage of "cross-reference table" is often incorrectly used to describe what is technically a "cross-reference section". |
Cross-reference stream |
Cross-reference streams were introduced in PDF 1.5 as a more compact way to define the cross-reference data in PDF. Cross-reference streams use binary data and, because they are standard PDF stream objects, they can use filters and be compressed. Cross-reference streams are widely used with object streams. |
Cross-reference subsection |
One or more "Cross-reference subsections" exist within each cross-reference section of a conventional PDF. Each subsection starts with a line containing a pair of integers (defining the first object number and number of objects in the cross-reference subsection respectively), followed by zero or more lines containing the fixed length, 20-byte entries for a contiguous range of object numbers. See 7.5.4 "Cross-reference table". |
Direct object |
A direct PDF object is an object that occurs inline where it is defined and that does have its own object identifier. In contrast to indirect objects, direct objects cannot be directly referenced since they do not have their own object identifiers. |
DPM, DParts, Document Part Metadata |
Terms related to a new PDF feature called "Document Part Metadata" that was originally introduced by PDF/VT and later added to the core PDF 2.0 specification (ISO 32000-2). It provides non-rendering information about page ranges and is commonly used in the graphic arts industry with variable data printing (PDF/VT) and print product metadata (PPM) files. DPart and DPM are the names of specific PDF dictionaries but are often used to refer to document part metadata in general. |
Embedded file |
An embedded file stream object that is embedded into a container PDF file. See PDF 2.0 Application Note 002: Associated Files. |
Fast web view | "Fast web view" is an informal term for the Linearized PDF feature that enables the first page of a PDF file to be available for rapid display before the rest of the PDF file is fully downloaded (such as while downloading from the internet). |
FDF Forms Data Format |
Forms Data Format (FDF, application/fdf ) is a specialized file format, expressed in the same COS syntax that PDF uses, used for interactive form data that was introduced in PDF 1.2. FDF can be used when submitting form data to a server, receiving the response, and incorporating it into the interactive form. It can also be used to export form data to stand-alone files that can be stored, transmitted electronically, and imported back into the corresponding PDF interactive form. In addition, beginning in PDF 1.3, FDF can be used to define a container for annotations that are separate from the PDF document to which they apply. For technical details see clause 12.7.8 of ISO 32000-2:2020. |
File attachment |
This term usually refers to files embedded in a PDF file, especially in the context of File Attachment annotations. See PDF 2.0 Application Note 002: Associated Files. |
Form (AcroForm) | PDF supports both interactive and non-interactive forms. For technical details see clause 12.7 in ISO 32000-2:2020. Interactive forms were introduced in PDF 1.2 as a collection of fields for gathering information interactively from the user and are sometimes referred to as "AcroForms". A PDF document may contain any number of fields appearing on any combination of pages, all of which make up a single, global interactive form spanning the entire document. Arbitrary subsets of these fields can be imported or exported from the document as FDF or XFDF. Non-interactive forms (introduced in PDF 1.7) are a static representation of form fields. Such forms may have originally contained interactive fields such as text fields and radio buttons but were converted into non-interactive PDF files, they may represent form fields and/or data converted from external sources, or they may have been designed to be printed out and filled in manually. |
Fragment Identifier |
Annex O in ISO 32000-2 defines PDF-specific fragment identifiers that can be added to the end of URLs that provide anchors to specific content or influence the display of a linked PDF file. Fragment identifiers are defined by the W3C and appear after the # symbol in a URL. A simple example is that a URL can refer to a specific page in a PDF by appending page=n (where n starts from 1) to a URL: https://pdfa.org/wp-content/uploads/2019/09/PDF-Association-flyer-A4.pdf#page=2 opens to the 2nd page of this PDF. This article describes PDF Fragment Identifiers and provides test files. |
Generation Number |
A PDF generation number is a non-negative decimal integer: its syntax requirements are identical to those of a PDF Object Number except that the single digit "0 " is also permitted. In a newly created file, all indirect objects will have generation numbers of 0. Non-zero generation numbers may be introduced when the file is later updated. The definition of Generation Numbers was clarified by an errata. |
Hybrid-reference PDF file |
A Hybrid-reference PDF file is a PDF 1.5 (or later) file containing objects referenced by standard cross-reference tables in addition to objects in object streams that are referenced by cross-reference streams. Only PDF 1.5 and later files can be hybrid-reference PDFs because cross-reference streams were introduced in PDF 1.5. Refer to clause 7.5.8.4 Compatibility with applications that do not support compressed reference streams in ISO 32000-2:2020. |
Incremental update | A PDF file can be updated incrementally without rewriting the entire file. When updating a PDF file incrementally, changes are appended to the end of the file, leaving the original contents unchanged. For example, a PDF-based document review tool may write PDF annotations as incremental updates, ensuring that a digitally signed original document is not invalidated by the addition of comments. Such technical details are typically not visible to end-users. For technical details see ISO 32000-2:2020. |
Indirect object |
A PDF indirect object is an object that is defined in the body section of a PDF file with an object identifier (comprising an object number and generation number). It will be referenced elsewhere in the PDF file by using an indirect reference (keyword R) with its object identifier. |
Integer page index |
In PDF the integer page index is a 0-based index of the pages in a PDF file, with the first page having an integer page index of zero. It is commonly used by internal PDF data structures. In contrast, Fragment Identifiers use a 1-based counting system. |
Layers | Layers is an informal term for Optional Content Groups (OCGs) in PDF. Layers can typically be individually toggled on and off in interactive PDF viewers. Examples include architectural drawings where floors, plumbing, electrical wiring, foundations, etc. might each be represented on separate layers. |
Linearized PDF | Linearized PDF is the formally defined feature in PDF feature that enables the first page of a PDF file to be available for rapid display before the rest of the PDF is fully downloaded. It is often referred to as "Fast web view". For technical details see Annex F in ISO 32000-2:2020. |
Object Identifier (Object ID) |
A PDF Object Identifier (or Object ID) is a pair of integers formed by an Object Number and a Generation Number separated by a single SPACE character. Object IDs are used with the R and obj keywords to unambiguously define an object. |
Object Number |
A PDF Object Number is a positive (non-zero) decimal integer comprised only of digits. It does not have a leading PLUS SIGN ("+ ", 2Bh) and does not start with leading zeros ("0 "). PDF objects are not required to be numbered sequentially within a PDF file; object numbers may be assigned in any arbitrary order. The definition of an Object Number was clarified by an errata. |
Object stream |
Object streams were introduced in PDF 1.5 as a more compact method to represent indirect objects. Object streams are standard PDF stream objects (and thus can use compression filters) and do not contain the keywords obj or endobj . Object streams are very commonly used with cross-reference streams. |
OCG | Optional Content Groups are the formally defined feature in PDF which enable selectable layers in interactive PDF viewers. For technical details see clause 8.11 of ISO 32000-2:2020. |
OCR | Optical Character Recognition is the process of recognizing text from an image (photo) of text. It is typically referenced in relation to scan-to-PDF functionality. The accuracy of OCR results can vary depending on the quality of the page image and other factors. PDF does not constrain or limit OCR accuracy in any way. |
Outline |
Refers to the PDF feature called Document Outline, which is commonly known as Bookmarks. |
Package |
An informal term used to refer to a PDF Collection |
Packaged document |
Typically, a “packaged document” includes several embedded and/or associated files. See PDF 2.0 Application Note 002: Associated Files. |
Page labels |
As documents can be long and have many pages, humans have invented conventions to label pages more descriptively to assist with navigation. We are used to seeing front matter labelled with Roman numerals: i, ii, iii, iv, etc.; appendices prefixed with uppercase letters such as A.1, A.2, etc. or even chapter/page combinations such as 1-1, 1-2, 2-1, 2-2. In PDF terminology this is what is referred to as a page label - an optional descriptive label of a page that is commonly presented on-screen. This is in contrast to the integer page index used internally in PDF files. |
The Portable Document Format is a random access, binary file format for device-independent, paginated documents that defines an accurate appearance model for rendering fully typeset text, images and vector graphics. Over time PDF has also expanded to include many interactive and specialized features supporting a wide variety of use cases and electronic documents with rich experiences beyond that of "digital paper". It is formally defined by the ISO 32000 family of international standards. | |
PDF 2.0 | PDF 2.0 is the latest version of the PDF specification and is the first PDF specification entirely developed under the ISO consensus-based process. It is formally defined by ISO 32000-2:2020. |
PDF/A | PDF/A is an ISO-defined formal subset of PDF designed to support long-term preservation and digital archiving. PDF/A focuses on the accurate preservation of the static visual representation of page-based electronic documents over time and is defined by the ISO 19005 family of standards. "A" stands for archival. |
PDF/E | PDF/E is an ISO-defined formal subset of PDF 1.6 defined to support the engineering sectors with support for interactive 3D models. For technical details see the PDF/E ISO standard ISO 24517-1:2008 Document management — Engineering document format using PDF — Part 1: Use of PDF 1.6 (PDF/E-1). PDF 2.0 support for engineering workflows is now provided via the PDF/A-4e conformance level - see PDF/A. "E" stands for engineering. |
PDF/R | PDF/R is a small subset of PDF targeting multi-page raster image documents, such as scanned documents. It is based on the PDF Association's PDF/Raster 1.0 specification and is specifically designed to be easy to create in low-end, low-memory embedded devices such as scanners. It is defined by ISO 23504-1:2020 Document management applications — Raster image transport and storage — Part 1: Use of ISO 32000 (PDF/R-1). "R" stands for raster. |
PDF/UA | PDF/UA is the ISO-defined formal subset of PDF to support universal access, enabling high levels of accessibility for electronic documents. It is defined by the ISO 14289 family of standards. "UA" stands for universal access. |
PDF/VCR | PDF/VCR enables variable data printing applications using PDF template-based variable content substitution whereby a PDF template file containing pages with variable content substitution fields (placeholders) is delivered ahead of a print production run and may be reused across multiple print production runs, and PDF-based variable data substitution content is provided during print production and merged with the PDF template to produce final form variable content page output. "VCR" stands for variable content replacement. It is defined by ISO 16613-1:2017 Graphic technology — Variable content replacement — Part 1: Using PDF/X for variable content replacement (PDF/VCR-1). |
PDF/VT |
PDF/VT is the ISO-defined formal subset of PDF supporting variable data printing and transactional documents, that builds on the capabilities of PDF/X. PDF/VT is defined by the ISO 16612 family of standards. "VT" stands for "Variable Transactional". |
PDF/X | PDF/X is defined by the ISO 15930 family of standards which supports the graphic arts and professional printing sectors. The "X" in "PDF/X" is for eXchange, indicating specialized support for the exchange of digital data targeting professionally printed products. |
PDF Declarations |
PDF Declarations is an industry-defined specification for declaring conformance to 3rd party standards in XMP metadata. A common use case is declaring conformance to a specific WCAG level in PDF/UA or Well-Tagged PDF documents. |
PDF version | PDF versions are 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0, with each version defined by its own PDF specification document. PDF files are generally backward- and forward-compatible, enabling modern software to reliably display old PDFs. Every PDF file identifies a version via the first line file header %PDF-x.y, but may also update the version via a special key in the Document Catalog dictionary (Version entry) when an incremental update is applied. Later PDF versions define additional features. |
Portfolio |
An informal term for a PDF Collection |
PPM |
Print Product Metadata as defined by ISO 21812-1:2019 Graphic technology - Print product metadata for PDF files - Part 1: Architecture and core requirements for metadata for use with PDF/VT and PDF/X-6 files, using DParts. |
Referenced file |
An external file referenced from a PDF file. An Associated File may be either embedded or referenced (external to the PDF). Note 1 in ISO 32000-2, 14.13.2 states: "A file specification dictionary allows for both embedded data and referenced/external data. Both types are allowed for associated files but the embedded form is recommended." See PDF 2.0 Application Note 002: Associated Files. |
Revision |
A term used to refer to an Incremental Update. |
Rolemap |
Rolemaps are a core Tagged PDF concept that allows any structure type to be conceptually mapped between namespaces in a manner that enables all PDF processors to understand the basic intention of structure types. For example, a custom structure type called Foo might be role-mapped to a paragraph in the standard structure namespace, indicating that semantically Foo is "best matched" as a paragraph. Rolemaps thus support the use of custom structure types in PDF. Rolemaps and their use are described in clause 14.7 Logical Structure and clause 14.8 Tagged PDF in ISO 32000. |
startxref |
startxref is a reserved PDF keyword that occurs just before the %%EOF end-of-file comment marker along with the byte offset (expressed as a decimal integer in ASCII) to the start of the cross-reference section for conventional PDF files. This keyword is not used in PDFs that only use cross-reference streams. |
Tag |
An informal term for a Structure Element as defined in clause 14.8 of ISO 32000-2:2020. Example tags include P (paragraph), Hn (heading), L (list), etc. |
Tagged PDF | PDF 1.4 introduced "Tagged PDF" to represent the logical reading order (structure) of a document. It defines a set of standard structure elements and attributes that allow page content (text, graphics and images, as well as annotations and form fields) to be extracted and reused for other purposes. PDF/UA uses Tagged PDF to ensure electronic documents are fully accessible. For technical details see clause 14.8 of ISO 32000-2:2020. |
trailer |
trailer is a reserved PDF keyword and defines the start of the trailer dictionary for conventional PDF files. The trailer enables a PDF processor to quickly find certain special objects and data, such as the largest object number in the PDF (Size entry), the Document Catalog (Root entry) and the optional encryption dictionary (if the PDF is encrypted, Encrypt entry). It is an essential part of every PDF file.In PDF 1.5 and later, with the use of cross-reference streams, the trailer keyword does not exist and the trailer dictionary entries are merged in the cross-reference stream dictionary. |
Well-Tagged PDF (or WTPDF) |
Well-Tagged PDF is an industry-defined specification for PDF 2.0 that is also fully aligned with PDF/UA-2 (ISO 14289-2:2024). It allows tagged PDF files to be both reusable and accessible across a wide spectrum of possible use cases. |
Widgets | A PDF widget is a specialized type of PDF annotation used with interactive forms and represents the GUI widgets through which data entry by the user is done. |
XFA | XFA stands for "XML Forms Architecture" which is a family of proprietary XML specifications supporting both static and dynamic forms. As a proprietary format with limited support in PDF processors, XFA was deprecated in PDF 2.0 (ISO 32000-2:2020) but was permitted in PDF 1.5 - 1.7. |
XFDF | XFDF is the XML equivalent of FDF. It is defined by ISO 19444-1:2019 Document management - XML Forms Data Format — Part 1: Use of ISO 32000-2 (XFDF 3.0). |
XMP | XMP stands for the eXtensible Metadata Platform which is an XML-based standard for metadata used in PDF and required by all ISO PDF subset standards. XMP is defined by ISO 16684-1:2019 Graphic technology — Extensible metadata platform (XMP) — Part 1: Data model, serialization and core properties. |
xref |
xref is a reserved PDF keyword used to identify the start of conventional cross-reference sections. It is also commonly used colloquially in place of the phrase "cross-reference". PDF 1.5 and later files that only use cross-reference streams do not use this keyword. PDF files that have incremental updates may have multiple instances of this keyword. |