BETA release
G2_06 ActualText provides correct extractable characters in place of OCR errors
PDF106
Use case(s): Fundamental 2: Text
Last updated on January 3, 2025
Description
The objective of this technique is to show how to handle scanned text with OCR. Having invisible text is the prerequisite for tagging the text content and making it machine-readable. Because the invisible text doesn’t match its visual appearance, ActualText is necessary. The ActualText is set on the tag level on a separate Span tag. The image of the scanned text is an artifact.
Note: Span is always the recommended choice if ActualText is required because ActualText removes the semantics of tags. In spite of the previous statement, ActualText may also work when the tag is one of the following: Lbl (outside of List structures), P, BibEntry, BlockQuote, Caption, Code, Note, Quote. However, ActualText should never be assigned to other tags.
Download(s)
These minimal examples are designed to express a single Technique. Effective use requires software that supports Tagged PDF.
Test(s)
Expected Results
Checks #1 through #5 are all true.
Procedure
- Check that the related characters are present as extractable characters for content intended to be consumed as text.
- Check that Unicode can be derived for any text content.
- Check that the extractable characters match their visual appearance.
- Check that the Natural Language is set for any text content.
- If invisible text is used, check that the related images are artifacted.
Application to WCAG 2.x
This Technique addresses the following WCAG 2.x Success Critieria:
Matterhorn Protocol
The Matterhorn Protocol 1.1 provides an algorithm for conformance with PDF/UA-1. Matterhorn checkpoint(s) (human or machine) relevant to this use-case:
- 08-001
- 10-001
- 11-001
- 11-006
- 13-007
Accessibility Technique Support Finder
Accessibility Technique Support Finders allows you to quickly locate software and services that claim to support a given Technique. Simply search the internet with a given technique’s finder together with the name of your product.
The technique finder for this Technique is: UA1_Tpdf-G2_06
NOTE: the “technique support finder” concept was introduced in January 2025; please allow time for adoption.