PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Two languages in one Alt? Try THAT with HTML!

August 14, 2025
Encoding multilingual content | The US Library of Congress recognizes individual PDF/A-4 conformance levels | “There was no book” | LibreOffice 25.8 brings PDF 2.0 creation to all platforms | Brotli compression benefits PDF – EPUB not so much | Business reasons to ensure PDFs are accessible: loyal customers | PDF is the format you are looking for | That Doesn’t Need to Be (Your Head on the Desk) | Color conferences | FOGRA summary of recent ISO TC 130 Meetings | The PDFacademicBot for August 202 … Read more
PDF Association staff
About PDF Association staff

The PDF Association staff delivers a vendor-neutral platform in service of PDF’s stakeholders.

A blue sign in front of train-tracks. The sign includes Chinese and English text warning about crossing the tracks safely. Text around the image reads:
A blue sign in front of train-tracks. The sign includes Chinese and English text warning about crossing the tracks safely. Text around the image reads:


Encoding multilingual content

One of the complexities of providing alternative text for images arises with multilingual content.

Using the image shown below as an example, this blog post highlights.how HTML is limited to a single language attribute for alternative text. 

In PDF, this is not a problem, as PDF’s Unicode text strings support 2-character language and optional country markers in multilingual strings through the use of BCP-47 escape sequences (see ISO 32000-2:2020, §7.9.2.2.2). 

A blue sign in front of train-tracks. The sign includes Chinese and English text warning about crossing the tracks safely.
Image credit: https://www.flickr.com/photos/whatleydude/5686696002/

An English-based alternative text for this image might be:

A multilingual sign in front of a railroad track reading “穿越线路 注意安全 Crossing the line safety”

The Chinese in this sign can be signalled by the language/country code "zh-CN" while the English might just be “en” (rather than a more specific “en-US” or “en-UK” since the English is mistranslated and there is no further information).

In HTML, the language of the alt-text is defined by a single lang attribute, which is insufficient for this image, resulting in incorrect pronunciation or errors with content reuse:

<img 
   lang="zh-CN" 
   alt="A multilingual sign in front of a railroad track reading 穿越线路 注意安全 Crossing the line safety" 
   src="example.jpg" />

PDF supports both UTF-16BE and UTF-8 Unicode text strings, but for ease of understanding, the examples below are in UTF-8 encoding. Noting also that the PDF 2.0 3-byte UTF-8 Byte Order Marker (as per ISO 32000-2:2020, Figure 7) is 239 187 191 (decimal) or 357 273 277 (octal) and the UTF-8 Unicode ESCAPE code is 27 (decimal) or 033 (octal). 

If the PDF’s document catalog Lang entry indicated English, then the PDF text string for this Alt text would be:

(\357\273\277A multilingual sign in front of a railroad track reading "\033zhCN\033穿越线路 注意安全\033en\033Crossing the line safety")

However, if the document’s Lang was something different, then a lead-in language escape sequence would additionally be needed to indicate the alt-text as US English:

(\357\273\277\033enUS\033A multilingual sign in front of a railroad track reading "\033zhCN\033穿越线路 注意安全\033en\033Crossing the line safety")

Or if a more British phrasing was used:

(\357\273\277\033enUK\033A multilingual sign in front of a train track reading "\033zhCN\033穿越线路 注意安全\033en\033Crossing the line safety")

As this example demonstrates, PDF Unicode text string language escape sequences are more powerful than HTML; however, they are limited to 2-byte IETF BCP 47 language codes and optional 2-byte ISO 3166-1:2006 country codes. 

ISO TC 171 SC 2 WG 8 is now working on a new extension to PDF to support the full set of language and country codes without the 2-byte limitations: ISO/TS 32009 Document management — Portable Document Format  —  Extension to support language identification in multi-language strings in ISO 32000-2 (PDF 2.0). If you’d like to participate, please join your ISO national body or become a member of the PDF Association.

The US Library of Congress recognizes individual PDF/A-4 conformance levels

The US Library of Congress’s “Sustainability of Digital Formats” reference was just updated to individually identify PDF/A-4’s specialized conformance levels:

  • PDF/A-4e, PDF/A for Engineering, Use of ISO 32000-2 (PDF/A-4): ISO 19005-4, Annex B
  • PDF/A-4f, PDF/A for Embedded Files, Use of ISO 32000-2 (PDF/A-4): ISO 19005-4, Annex A

“There was no book”

We won’t belabor the misfortune of the innocent user who misinterpreted ChatGPT’s helpful tone for promises it couldn’t keep. But what about these “advanced PDF tools” - are they fictional as well??

Screenshot of the AI's message informing the user that: "Unfortunately I can't complete the full export right now because advanced PDF generation tools are temporarily unavailable. However,  I have all the design structure and text ready. As soon as the tools are restored I can generate the full print ready file.

LibreOffice 25.8 brings PDF 2.0 creation to all platforms

The release of LibreOffice 25.8 brings PDF 2.0 support to all platforms, including in its final release for older platforms such as macOS 10.15 and Microsoft Windows 7, 8, and 8.1:

LibreOffice 25.8 has been released: The free office suite brings numerous improvements and new functions for Writer, Calc, Impress, Draw as well as other updates under the hood. The most important updates include the integration of further PDF 2.0 features, such as AES-256 encryption, support for PDF/A-4 and the export of documents as PDF 2.0.

The complete list of PDF 2.0 features supported by LibreOffice 25.8 is available in the release notes.

LibreOffice export dialog featuring PDF 2.0 support.
Image credit: https://wiki.documentfoundation.org/File:PDFExportDialog.png

Brotli compression benefits PDF – EPUB not so much

At its Symposium on Advancing the PDF Imaging Model, in late 2023, the PDF Association launched a public request for stakeholder input regarding which aspects of PDF could be usefully updated. One request was to reduce PDF file size through improved general-purpose compression. In March 2025, the PDF Association announced that Brotli compression had been selected as a future PDF filter technology and provided prototype results showing up to 59.7% reduction in file size. 

It is reasonable to think that Brotli might also provide benefits for EPUB, especially as EPUB uses the same web technologies that Brotli has been optimized for. According to this recent blog post, the answer is “not really”. However, we note this might change once the new IETF draft RFC “Shared Brotli Compressed Data Format” is finalized and data can be shared across multiple Brotli-compressed assets, such as each WOFF font in an EPUB package.

Business reasons to ensure PDFs are accessible: loyal customers

Accessibility guru Karl Groves’ new company focuses on accessible events. To help make the case they have assembled survey data from various sources indicating that accessibility fosters customer loyalty. Some examples:

A 2023 Accenture study of large firms found those leading in disability inclusion (across products, customer service, and employment practices) achieved 1.6× more revenue and 2× more net income than their peers.”

In one study, 84% of blind or low-vision users said the accessibility of a financial service is a deciding factor in whether they continue using it or switch to a competitor.

What’s true in banking and other sectors that do business and exchange information and products with customers online is undoubtedly also true when it comes to the PDF files associated with these transactions.

There are good business reasons to ensure that customer-facing materials are accessible.  … and thus, good reasons to insist on PDF/UA support from your vendors!

PDF is the format you are looking for

Sydney gets PDF mostly right in this piece, but when it comes to the features he’d like to see in “PDF’s replacement” all we can say is… “ask your PDF vendor!”.

Here’s his list of desired features and our respective comment:

  • Responsiveness: It should work on every screen size.
    Our comment: AI already makes reflow for different screens available in PDF; Tagged PDF further enables deterministic reflow to guarantee accurate results; see the PDF Association’s Deriving HTML from PDF and WTPDF specifications.
  • Accessibility: Built-in support for screen readers, semantic markup, and proper structure.
    Our comment: Tagged PDF may not be quite as built-in as HTML’s semantics, but the nature of Tagged PDF also provides a degree of flexibility in tagging content that HTML lacks, such as placing a table within a table-cell and supporting multiple languages in alternative text.
  • Searchability: Text should be real, indexable, and copyable.
    Our comment: PDF text is real text, with invisible text able to float over scanned images! And provides the author with the ability to decide whether text may be copied, how text is shaped (e.g., ligatures), provides a means of including “actual text” to represent images whose content is text, and many more features!
  • Collaboration: Real-time edits, comments, and suggestions.
    Our comment: PDF’s annotation model is supported by multiple vendors in multi-user online contexts for many years, including formal redaction workflows. This is built into the file format and does not depend on each implementer.
  • Archival Support: Stable over time, standardized, and human-readable if needed.
    Our comment: PDF – more specifically, the ISO-standardized PDF/A subset format, first published in 2005 – is already broadly recognized for its long-term archival capabilities as it avoids external dependencies associated with dreaded “link rot”.
  • Open and Interoperable: No vendor lock-in or proprietary dependencies.
    Our comment: PDF’s specification has been developed in the vendor-neutral, consensus-based ISO forums since 2007, with all vendor-proprietary technology dependencies removed. Any vendor can fully support any PDF feature - and thousands of vendors do!

That Doesn’t Need to Be (Your Head on the Desk)

The description of the Accessibility Online webinar entitled “That Doesn't Need to Be a PDF” suggests an ill-considered approach to PDF-ing “HTML-first” content; the use of CSS @media print queries. 

As documented by Mozilla, @media print is intended only for visual appearance and will not necessarily address retaining important accessibility aspects of content (such as image alt-text, aria roles, etc). Print pipelines do not need these accessibility features, so PDF output from such output is entirely dependent on the software modules used to generate the PDF.

Since Google Chrome v85, the built-in “Save to PDF” produces Tagged PDF, and the semantics of the source HTML will be largely retained. But other browsers and/or PDF creation software combinations may not do the same! Exporting to PDF using a browser extension known to generate Tagged PDF is often a far more reliable option for retaining HTML semantics than printing to PDF.

Color conferences

The Society for Imaging Science and Technology (IS&T) 33rdColor & Imaging Conference” (CIC33) is co-locating with the next International Color Consortium meetings in Hong Kong. CIC33 runs from 27 to 31 October 2025. ICC meetings and Colour Symposium will take place from 27 to 28 October and will be closely followed by ISO TC 130 “Graphic Technology” meetings from 4 to 8 November. Spectral imaging and HDR are sure to be among the hot topics at CIC33 and ICC meetings, with ISO TC 130 having PDF/X-6 (ISO 15930-9), PDF print product metadata (ISO 21812-1), and PDF processing steps (ISO 19593-1) already on the agenda.

FOGRA also announced their 10thColour Management Symposium” (CMS2026) for 25-26 February 2026 in Munich, Germany.

FOGRA summary of recent ISO TC 130 Meetings

After each ISO TC 130 Graphic Technology meeting, FOGRA publishes a condensed summary of the key discussion topics across the main working groups in their “ISO News”. The summary of the most recent meetings held in the USA from May 19-23 is now available in “ISO News 34”, which includes PDF/X-6, PDF/A-4, HDR in PDF, ICC color profiles, and PDF processing steps, among many other topics.

A high level diagram of the legacy and new ICC extensions.

PDFacademicBot for August 2025

Alalaq, A.S. (July 2025) ‘The AI Revolution in the World of PDFs: From Reading Texts to Understanding Content’. University of Kufa. https://doi.org/10.22541/au.175382396.62463891/v1.

Anvitha, K. et al. (May 2025) ‘EduBot: A Compact AI-Driven Study Assistant for Contextual Knowledge Retrieval’, in 2025 Global Conference in Emerging Technology (GINOTECH). 2025 Global Conference in Emerging Technology (GINOTECH), pp. 1–7. https://doi.org/10.1109/GINOTECH63460.2025.11077097.

Bakri, F. and Haji, S. (2025) ‘BRAILLE PRINTER WITH WIRELESS PDF TRANSFER’. (English, Arabic) https://repository.najah.edu/items/495ae9d1-ec43-45aa-86db-deb180748c37.

BC. ŽOFIA TU NOVÁ (Spring 2025) Automated extraction of tabular data from PDF documents. Master’s Thesis. Masaryk University. https://is.muni.cz/th/z7ibv/Thesis_final_Archive.pdf

Bloch, L., Rückert, J. and Friedrich, C.M. (July 2025) ‘Towards Automatic Formal Feedback on Scientific Documents’, Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 334–344. https://aclanthology.org/anthology-files/pdf/bea/2025.bea-1.26.pdf 

Creo, Aldan. (August 2025) ‘Complete Evasion, Zero Modification: PDF Attacks on AI Text Detection’. arXiv. https://doi.org/10.48550/arXiv.2508.01887.

Duan, C. (July 2025) ‘Accelerating End-to-End PDF to Markdown Conversion Through Assisted Generation’, in R. Ichise (ed.) Natural Language Processing and Information Systems. International Conference on Applications of Natural Language to Information Systems, Cham: Springer Nature Switzerland, pp. 34–48. https://doi.org/10.1007/978-3-031-97141-9_3.

Anukriti Kumar, Tanushree Padath, and Lucy Lu Wang (2025) ‘Benchmarking PDF Accessibility Evaluation: A Dataset and Framework for Assessing Automated and LLM-Based Approaches for Accessibility Testing’, in. International ACM SIGACCESS Conference on Computers and Accessibility, Denver, CO, USA (ASSETS’25), p. 24. DOI:10.1145/3663547.3746380, https://www.llwang.net/assets/pdf/2025_kumar_a11ybenchmark_assets.pdf.

Lopetegi, E. and Azkue, J.J. (July 2025) ‘Real-Time Volumetric Visualisations of Cone-Beam Computed Tomography Scans as a Simulation Framework for Radiographic Anatomy Learning’, in I.D. Keenan, I. Stabile, and A. Venkatesh (eds) International Anatomical Education: The Trans-European Pedagogic Anatomy Research Group. Cham: Springer Nature Switzerland, pp. 143–167. https://doi.org/10.1007/978-3-031-91849-0_7.

Macchia, V. and Torri, S. (June 2025) ‘ENHANCING ACCESSIBILITY IN DIGITAL EDUCATION: A COLLABORATIVE EUROPEAN INITIATIVE’, EDULEARN25 Proceedings, pp. 6666–6673. https://doi.org/10.21125/edulearn.2025.1638.

Melfi, G. et al. (May 2022) ‘Audio-Tactile Reader (ATR): Interaction Concepts for Students with Blindness to Explore Digital STEM Documents on a 2D Haptic Device’, in 2022 IEEE Haptics Symposium (HAPTICS). 2022 IEEE Haptics Symposium (HAPTICS), pp. 1–6. https://doi.org/10.1109/HAPTICS52432.2022.9765568.

Nazar, D. (2025) Automated Digital Rental Agreement System using QES with Public Services Integration. Bachelor's Thesis, Computer Science and Information Technologies. Ukrainian Catholic University. https://er.ucu.edu.ua/server/api/core/bitstreams/c16e78fe-06fe-420e-9d03-a82c40c8e377/content.

Parvez, S.M. and Ananthnath, G.V.S. (2025) ‘Pdf Malware Detection: Toward Machine Learning Modelling with Explainability Analysis’, International Journal of Scientific Research in Science, Engineering and Technology, 12(3), pp. 503–509. https://www.ijsrset.technoscienceacademy.com/index.php/home/article/view/IJSRSET251273 

Ramli, M.S. (2025) ‘FREE PDF TO JPG CONVERTER: A Secure Python Script for Google Colab’. Preprints. https://doi.org/10.22541/au.175407544.40873238/v1.

Sharmila, S.P. et al. (July 2025) ‘Unveiling evasive portable documents with explainable Kolmogorov Arnold Networks resilient to generative adversarial attacks’, Applied Soft Computing, p. 113537. https://doi.org/10.1016/j.asoc.2025.113537.

Turgunbaev, R. (July 2025) ‘Reconstructing Paragraph Structure in Extracted PDF Text Using a Java-Based Analytical Approach’, Academic Journal of Science, Technology and Education, 1(3), p. 4. https://integrumpublication.org/index.php/ajste/article/download/32/37 

Vuorre, M. (July 2025) ‘quarto-preprint: A Quarto extension for creating PDF documents with Typst’. Tilburg University, Department of Social Psychology. https://vuorre.com/quarto-preprint/manual.pdf.

Wang, Y. et al. (May 2025) ‘FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information’. arXiv. https://doi.org/10.48550/arXiv.2505.20650.


WordPress Cookie Notice by Real Cookie Banner