PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

BETA release

G2_06 ActualText provides correct extractable characters in place of OCR errors

PDF106

Pass PASS

Use case(s): Fundamental 2: Text

Last updated on January 3, 2025

Description

The objective of this technique is to show how to handle scanned text with OCR. Having invisible text is the prerequisite for tagging the text content and making it machine-readable. Because the invisible text doesn’t match its visual appearance, ActualText is necessary. The ActualText is set on the tag level on a separate Span tag. The image of the scanned text is an artifact.

 

Note:  Span is always the recommended choice if ActualText is required because ActualText removes the semantics of tags. In spite of the previous statement, ActualText may also work when the tag is one of the following: Lbl (outside of List structures), P, BibEntry, BlockQuote, Caption, Code, Note, Quote.  However, ActualText should never be assigned to other tags. 

 


Download(s)

These minimal examples are designed to express a single Technique. Effective use requires software that supports Tagged PDF.

Test(s)

Expected Results

Checks #⁠1 through #⁠5 are all true.

Procedure

  1. Check that the related characters are present as extractable characters for content intended to be consumed as text.
  2. Check that Unicode can be derived for any text content.
  3. Check that the extractable characters match their visual appearance.
  4. Check that the Natural Language is set for any text content.
  5. If invisible text is used, check that the related images are artifacted.

Application to WCAG 2.x

This Technique addresses the following WCAG 2.x Success Critieria:

Matterhorn Protocol

The Matterhorn Protocol 1.1 provides an algorithm for conformance with PDF/UA-1. Matterhorn checkpoint(s) (human or machine) relevant to this use-case:

  • Human check08-001
  • Machine check10-001
  • Machine check11-001
  • Machine check11-006
  • Human check13-007

Accessibility Technique Support Finder

Accessibility Technique Support Finders allows you to quickly locate software and services that claim to support a given Technique. Simply search the internet with a given technique’s finder together with the name of your product.

The technique finder for this Technique is: UA1_Tpdf-G2_06

NOTE: the “technique support finder” concept was introduced in January 2025; please allow time for adoption.

WordPress Cookie Notice by Real Cookie Banner