Understanding Techniques for Accessible PDF

This page provides additional information regarding the various types of Techniques for Accessible PDF. As of April 2025 this information is specific to ISO 32000-1 and PDF/UA-1.

Fundamental 1: Basic technical rules are followed

Software that creates a PDF must follow basic technical rules for tagged PDF, so that other software can process the PDF for accessibility.

PDF’s rules are defined in the respective PDF specification documents, including ISO 32000, the PDF specification, and ISO 14289 (PDF/UA), the PDF specification for accessible PDF documents, which adds requirements to those of ISO 32000.

A PDF/UA-compliant file is also required to conform to ISO 32000. In general, basic technical rules can be unambiguously checked by software commonly known as a “validator”.

Test

Procedure

Check that the basic technical rules of ISO 32000 and PDF/UA are followed by running a PDF/UA validator.

Expected results

Check #1 is true.

Fundamental 2: Text content is machine-readable

Most information in typical documents is textual (visible text, alternative text or bookmarks). In order to understand the challenges related to machine-readable textual information in PDFs, it is necessary to distinguish between text content and non-text content intended to be consumed as text.

Requirements for text content:

Each character has a corresponding Unicode value
All text has a specified language

Non-text content intended to be consumed as text will have an equivalent text content using one of the following mechanisms:

Creating invisible text (by using OCR tools, for example)
Adding ActualText to a Span marked content sequence
Adding ActualText to a tag

Test

Procedure

Check that the related characters are present as extractable characters for content intended to be consumed as text.
Check that Unicode can be derived for any text content.
Check that the extractable characters match their visual appearance.
Check that the Natural Language is set for any text content.
If invisible text is used, check that the related images are artifacted.
If ActualText on a marked-content sequence (also called a container, not to be confused with a tag) is used to make visual characters extractable, check that the container is a Span.
If ActualText on a marked-content sequence (also called a container, not to be confused with a tag) is used to make visual characters extractable, check that the container contains the related visual characters.
Check that extractable text within block elements includes space characters between words, even at line breaks.

Expected results

Checks #1 to #8, if applicable, are true.

Note: The Span tag is the recommended choice if ActualText is required because some software interprets ActualText as removing the semantics expressed by tags. In spite of the previous statement, ActualText may also work when the tag is one of the following: Lbl (outside of List structures), P, BibEntry, BlockQuote, Caption, Code, Note, Quote. However, ActualText usually should not be assigned to other tags.

Fundamental 3: Real content and Artifacts are distinguished

Not all content in a PDF document is essential for understanding. To enable software to distinguish between relevant content (e.g., headings and paragraphs) and non-relevant content (e.g., decorative lines or running headers), PDF files provide the option of distinguishing relevant content, known as real content, from the rest, known as artifacts. In accessible PDF documents, content is marked as either real content or artifact, and can never be both at the same time.

Test

Procedure

Check that all real content is tagged.
Check that all other content that is not real content is marked as artifact.
If artifact content requires a type and/or subtype, check that they are present.
If present, the artifact type and/or subtype are appropriate.

Expected results

Checks #1 to #4, if applicable, are true.

Fundamental 4: Logical Content Order

In addition to appropriately tagging real content, the tags must appear in the structure tree in an appropriate order to enable correct presentation by software (including assistive technology). This is referred to as the logical content order. The logical content order must reflect the order of real content as intended by the author.

Test

Procedure

Check that the logical content order of the tags matches the author's intent.
Check that the order of the content within each tag matches the author's intent.
If the page contains annotations such as form fields, links or notes, check that the tab order is set to follow the logical content order.

Expected results

Checks #1 to #3, if applicable, are true.

Fundamental 5: Appropriate Semantics

In order for real content to be correctly understood by a user, the most semantically appropriate tag must be used, and each unit of real content must be tagged with a single appropriate tag, even if parts of the real content are visually separated.

Examples of semantically appropriate tagging

A single paragraph spanning two pages or columns, or visually divided by an image, is nonetheless enclosed by a single P tag.
A level one heading is tagged using H1.
A data table is tagged using Table, TR, TH and TD tags.
A list is tagged using L, LI, Lbl and LBody tags.

Examples of semantically inappropriate tagging

A single paragraph is tagged using two or more P tags.
A heading is tagged using a P tag.
A single table that spans more than one page is tagged using two or more Table tags.
A single list with several list items is split into two or more lists.

Test

Procedure

Check that the role of all real content is reflected by its tag.
Check that each tag contains all the content that corresponds to the tag.
Check that each tag contains only the content that corresponds to the tag.

Expected results

Checks #1 to #3 are true.

Headings

Headings play a vital role in organizing document content for all users. In assistive technology, the correct use of headings allows what would otherwise be an undifferentiated stream of text to become a navigable document in which a reader can quickly find the content they want to read.

PDF files can be very long documents, possibly including deeply-nested headings. PDF files can also contain multiple documents and / or subsections of documents. Accordingly, techniques for headings in PDF files differ somewhat from techniques that are more applicable to other technologies, especially regarding the document's title. Read Klaas Posselt's article for recommendations on tagging titles in PDF documents.

Test

Procedures

Check that all real content that the author intends as a heading is tagged as a heading.
Check that heading tags are not used when the author does not intend a heading.
Check that either:
- only Hn tags are used, or
- only appropriately-nested H tags are used.
Check that the first Hn tag, if any, is an H1.
Check that the heading levels are appropriate to the content hierarchy of the document.

Expected Results

Checks #⁠1 to #⁠5 are true.

Featured articles

Discover pdfa.org

Key resources

Get involved

Understanding Techniques for Accessible PDF

Fundamental 1: Basic technical rules are followed

Test

Procedure

Expected results

Fundamental 2: Text content is machine-readable

Test

Procedure

Expected results

Fundamental 3: Real content and Artifacts are distinguished

Test

Procedure

Expected results

Fundamental 4: Logical Content Order

Test

Procedure

Expected results

Fundamental 5: Appropriate Semantics

Examples of semantically appropriate tagging

Examples of semantically inappropriate tagging

Test

Procedure

Expected results

Headings

Test

Procedures

Expected Results