
How to read ISO publications about PDF
All ISO publications are written according to a set of rules known as the ISO Directives. In this article, the PDF Association’s CTO, Peter Wyatt explains how to understand the formal terminology in ISO’s standards for PDF technology.
All ISO publications are written according to a set of rules known as the ISO Directives. In particular, ISO/IEC Directives Part 2 “Principles and rules for the structure and drafting of ISO and IEC documents” explains the details of what many refer to as “ISO-ese” - the somewhat stilted international English, sentence structure, and document conventions that ISO enforces for ISO standards.
ISO standards are not tailored to their domain - the same set of rules applies to standards for quality processes (ISO 9000 family), environmental management standards (ISO 14000 family), physical standards (such as ISO 216 for paper size), and software technology standards such as the core PDF 2.0 standard (ISO 32000), PDF/A (ISO 19005), PDF/X (ISO 15930), PDF/UA (ISO 14289) and others. For those unfamiliar with “ISO-ese” nuances can be overlooked, which in the case of technical standards for PDF technology, can result in malformed files, problematic implementations, and interoperability issues.
It is therefore important that developers engaging with ISO standards - especially those for whom English is not a primary language - understand how to interpret “ISO-ese”. Many sentences in ISO-standardized PDF specifications are finely crafted in an attempt to accurately communicate a very specific meaning; such nuances may be difficult to recognize if the reader does not have a good understanding of the language of ISO standards.
Other organizations originally created many specifications for PDF technology but did not use ISO’s specific guidelines. As these specifications evolve under ISO their terminology also evolves to comply with ISO’s latest set of Directives and style guidance, which are themselves evolving, so there is always some level of subjectiveness when reading an ISO publication. Thus, having a set of hard and fast rules for interpreting “ISO-ese” for all standards created at different times is not really possible - instead, a general understanding needs to be applied.
This article does not use ISO-ese, as it is written for those who need to read and understand the technical details of ISO standards for PDF. Unsurprisingly (if somewhat ironically), ISO Directives Part 2 is itself written in ISO-ese and can be difficult to interpret, so links back to specific sections in ISO Directives Part 2 are provided below. Beyond the Directives, ISO also defines an easier-to-read but more subjective “ISO House Style” as a companion to the Directives.
A short list of key things to understand
ISO’s Directives define the nature and order of the sections in ISO standards. The clause 1 Scope statement is of particular importance, which defines what is and is not in the publication. All ISO standards for PDF technology define rules for valid PDF files and rules for software processing those valid files, but do not include information on error handling, file recovery, user interface design, or robust parsing and programming practices. In a world where cyber-security practices are critical - the ISO standard for the PDF file format is not where you should look to find such information!
Shall vs. should vs. may vs. can
So-called normative statements document the rules for the standard - in the case of ISO 32000, that define valid PDF files and interoperable PDF software. Normative content consists of mandatory requirements using the trigger word “shall” (or “shall not”), recommendations that use the trigger word “should” (or “should not”), and permissions that use the trigger word “may” (or “may not”).
“Shall” requirements must always happen whereas “should” recommendations might be included to allow corner cases or other situations when a “shall” is not technically possible, or might be simply a general recommendation. There are no degrees of significance to recommendations; no concept of “should”, “really should” or “really really should do this”. Both “shall” and “should” are critically important words in ISO publications and are never used informally - if you ever see these words pay very close attention!
Unlike in many web standards, ISO’s trigger words are not visually highlighted such as always appearing in uppercase. The choice of trigger words is also slightly different between ISO publications and standards that use the conventions of RFC 2119. In particular, ISO standards also do not use the word “must”.
Informative statements are additional content that cannot alter the rules. This includes all notes, all examples, and any Annexes (appendices) explicitly noted as informative. All examples are always informative - this means that there are no “hidden” rules that only appear in examples. If all notes and examples were deleted from an ISO publication, nothing technically changes - notes and examples exist only to further illustrate or demonstrate something. Informative statements thus never use the words “shall” or “should”.
Permission statements express “consent or liberty (or opportunity) to do something”, such as “PDFs can have images” since PDF certainly does not require that every PDF must contain an image.
Do not confuse the rules of an ISO standard, as defined by its requirements (“shall”) and formal recommendations (“should”), with factual or possibility/capability (“can”, “cannot”) statements. Possibility/capability expresses a potential outcome, such as “the annotation can be printed” since this depends on the setting of annotation flags. No-one, person or ISO standard, would ever say “1 plus 1 shall be 2”, but rather “1 plus 1 is 2” as this is a simple fact.
References to other documents
Don’t assume that a single ISO publication contains all the information necessary for conformance with that publication - ISO standards are not intended as textbooks, tutorials, or implementation guides!
The Normative references (always clause 2 of an ISO standard) list all documents that “are referred to in the text in such a way that some or all of their content constitutes requirements of this document”. ISO standards thus “inherit” requirements from other documents by referencing other documents normatively (as in “shall be a JPEG…” versus “is a JPEG” which is non-normative). It is therefore important to obtain and understand those documents in order to arrive at a complete understanding of all applicable requirements.
When a normative reference is explicitly dated then only that specific version of the document is intended as a reference - using a different version is incorrect and may result in invalid PDF files. ISO requires the use of dated references whenever a specific reference in that document is used such as a table or clause number (since an update to the document can cause clause renumbering). An “undated reference” may be used when referencing a specific document is not essential. Undated references are also used when “it is understood that the reference will include all amendments to and revisions of the referenced document”. “Amendments”, “corrigenda” and “dated revisions” are all ISO-specific terms that refer to various methods of updating an ISO publication, however other standards development organizations (SDOs) may use different terminology.
In contrast, the Bibliography in an ISO publication lists documents that are purely informative in some way and do not establish any requirements. This is where you might locate additional explanatory information such as background theory, but the bibliography is unlikely to be fully comprehensive.
File format or software requirement?
For ISO’s standards for PDF, most rules govern the file format (i.e. the bytes in a PDF file), but some rules address the software that reads and/or writes PDF files. For example, a file, being a passive store of bytes, does not “ignore”, “process” or perform any action - when encountered, such verbs are indicative of a rule for software. However such software rules often sit alongside file format rules (such as “the dictionary key X shall have the value Y”) which most software will likely need to check before processing.
As ISO 32000-1 (PDF 1.7) was being developed, Adobe wrote a short document describing how “ISO-ese” would change wording but not the intent of their PDF 1.7 reference specification. This document (available from the PDF Association website) provides some illustrative examples of where casually worded statements were transformed to meet the ISO Directives. Here are two examples (the changed portion is underlined in these quotes):
In the Adobe PDF Reference the following sentence occurs:
The carriage return (CR) and line feed (LF) characters, also called newline characters, are treated as end-of-line (EOL) markers.
In the ISO document this was changed to:
The carriage return (CR) and line feed (LF) characters, also called newline characters, shall be treated as end-of-line (EOL) markers.
This change makes use of the more precise and well defined ISO writing style using the word "shall".
Another example is where "should" was used in the Adobe Reference to have the same meaning as "shall" using ISO definitions. In the Adobe PDF Reference the following sentence occurs:
Note: No two entries in the same dictionary should have the same key
In the language of the Adobe Reference this actually means that two entries in the same dictionary are not permitted to have the same key. In the ISO document this was changed to:
Multiple entries in the same dictionary shall not have the same key.
Although worded differently in the context of their respective document’s styles these two sentences have the same technical meaning.
In addition, in the Adobe Reference notes are not a means of denoting informative content as is the case using the ISO style, so the Note: was removed because this is actually a normative statement.
Key takeaways
Understanding “ISO-ese” can be complex for newcomers or those without English as a primary language, so here is a short list of things to remember when encountering ISO standards for PDF:
- Understand the Scope. ISO standards for PDF only define what is a valid PDF file and what software must do to process valid PDF files. They do not define details such as error handling, file recovery, good programming practices, etc.;
- Always pay very close attention to every use of “shall” and “should” - these are the basic rules of valid PDF and interoperable software!;
- Understand why the author chose a permissive term (“may”, “may not”) or possibility/capability term (“can”, “cannot”) instead of “shall” - there is possibly some related subtlety;
- Never infer any new rules from notes or examples, as these cannot affect the normative requirements, and can be ignored without changing the document’s meaning. If you think that a note or example implies some rule that is not stated elsewhere, you’ve likely misunderstood it;
- Always refer to the correct normative references, since the rules defined in these documents apply equally and are included by inheritance (this also cascades into the normative references of normative references, etc). Do not expect to find all the rules in an ISO standard;
- For undated references, don’t forget to check for amendments, corrigenda or errata;
- Consider whether rules apply to the bytes in a PDF file, to PDF reading or writing software, or both. The use of certain verbs may indicate that the rules apply to software;
- Don’t expect to find any tutorial information, application guidance, or background explanations in ISO standards.
If in doubt about something in an ISO standard for PDF technology, please create an Issue in the pdf-issues GitHub repository and we will attempt to address it. As a product of human endeavor expressed with the ambiguities of natural language, no ISO standard is ever perfect but the PDF Association works hard to improve the common understanding of PDF.