PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Understanding Techniques for Accessible PDF

This page provides additional information regarding the various types of Techniques for Accessible PDF. As of November 2024 this information is specific to ISO 32000-1 and PDF/UA-1.

Fundamental 1: Basic technical rules are followed

Software that creates a PDF must follow basic technical rules for tagged PDF, so that other software can process the PDF for accessibility. 

PDF’s rules are defined in the respective PDF specification documents, including ISO 32000, the PDF specification, and ISO 14289 (PDF/UA), the PDF specification for accessible PDF documents, which adds requirements to those of ISO 32000. 

A PDF/UA-compliant file is also required to conform to ISO 32000. In general, basic technical rules can be unambiguously checked by software commonly known as a “validator”.

Test

Procedure

  1. Check that the basic technical rules of ISO 32000 and PDF/UA are followed by running a PDF/UA validator.

Expected results

Check #1 is true.

Fundamental 2: Text content is machine-readable

Most information in typical documents is textual (visible text, alternative text or bookmarks). In order to understand the challenges related to machine-readable textual information in PDFs, it is necessary to distinguish between text content and non-text content intended to be consumed as text.

Requirements for text content:

  • Each character has a corresponding Unicode value
  • All text has a specified language

Non-text content intended to be consumed as text will have an equivalent text content using one of the following mechanisms:

  1. Creating invisible text (by using OCR tools, for example)
  2. Adding ActualText to a Span marked content sequence
  3. Adding ActualText to a tag

Test

Procedure

  1. Check that the related characters are present as extractable characters for content intended to be consumed as text.
  2. Check that Unicode can be derived for any text content.
  3. Check that the extractable characters match their visual appearance.
  4. Check that the Natural Language is set for any text content.
  5. If invisible text is used, check that the related images are artifacted.
  6. If ActualText on a marked-content sequence (also called a container, not to be confused with a tag) is used to make visual characters extractable, check that the container is a Span. 
  7. If ActualText on a marked-content sequence (also called a container, not to be confused with a tag) is used to make visual characters extractable, check that the container contains the related visual characters. 
  8. Check that extractable text within block elements includes space characters between words, even at line breaks.

Expected results

Checks #1 to #8, if applicable, are true.

Note:  The Span tag is the recommended choice if ActualText is required because some software interprets ActualText as removing the semantics expressed by tags. In spite of the previous statement, ActualText may also work when the tag is one of the following: Lbl (outside of List structures), P, BibEntry, BlockQuote, Caption, Code, Note, Quote. However, ActualText usually should not be assigned to other tags.

Fundamental 3: Real content and Artifacts are distinguished

Not all content in a PDF document is essential for understanding. To enable software to distinguish between relevant content (e.g., headings and paragraphs) and non-relevant content (e.g., decorative lines or running headers), PDF files provide the option of distinguishing relevant content, known as real content, from the rest, known as artifacts. In accessible PDF documents, content is marked as either real content or artifact, and can never be both at the same time.

Test

Procedure

  1. Check that all real content is tagged.
  2. Check that all other content that is not real content is marked as artifact.
  3. If artifact content requires a type and/or subtype, check that they are present.
  4. If present, the artifact type and/or subtype are appropriate.

Expected results

Checks #1 to #4, if applicable, are true.

Fundamental 4: Logical Content Order

In addition to appropriately tagging real content, the tags must appear in the structure tree in an appropriate order to enable correct presentation by software (including assistive technology). This is referred to as the logical content order. The logical content order must reflect the order of real content as intended by the author.

Test

Procedure

  1. Check that the logical content order of the tags matches the author's intent.
  2. Check that the order of the content within each tag matches the author's intent.
  3. If the page contains annotations such as form fields, links or notes, check that the tab order is set to follow the logical content order.

Expected results

Checks #1 to #3, if applicable, are true.

Fundamental 5: Appropriate Semantics

In order for real content to be correctly understood by a user, the most semantically appropriate tag must be used, and each unit of real content must be tagged with a single appropriate tag, even if parts of the real content are visually separated.

Examples of semantically appropriate tagging

  • A single paragraph spanning two pages or columns, or visually divided by an image, is nonetheless enclosed by a single P tag.
  • A level one heading is tagged using H1.
  • A data table is tagged using Table, TR, TH and TD tags.
  • A list is tagged using L, LI, Lbl and LBody tags.

Examples of semantically inappropriate tagging

  • A single paragraph is tagged using two or more P tags.
  • A heading is tagged using a P tag.
  • A single table that spans more than one page is tagged using two or more Table tags.
  • A single list with several list items is split into two or more lists. 

Test

Procedure

  1. Check that the role of all real content is reflected by its tag.
  2. Check that each tag contains all the content that corresponds to the tag.
  3. Check that each tag contains only the content that corresponds to the tag.

Expected results

Checks #1 to #3 are true.

WordPress Cookie Notice by Real Cookie Banner