PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Two red buttons one with Convert to PDF/A-1 and the other with Convert to PDF/A-2 and a finger getting ready to tap one of the buttons

Making PDF/A conversion easier

PDF/A – the ISO standard for long-term archiving – now has four sub-standards. This raises an obvious question: which one should businesses use for document conversion and archival? Dietrich von Seggern from callas software gives the answer.
About the author: Dietrich von Seggern received his degree as a printing engineer, and in 1991 started his professional career as head of desktop prepress production in a reproduction house. He became involved in … Read more
Dietrich von Seggern

Dietrich von Seggern
March 16, 2023

Article


Print Friendly, PDF & Email

PDF/A has been the international ISO standard for long-term archiving since 2005. It guarantees reliable reproduction of documents for years to come, regardless of any technological, hardware or software innovations that may arise. It enables homogeneous archives of both born-digital and scanned documents.

Two cartoon-like frames - first frame has two red buttons first button is labeled Convert to PDF/A-1 and the second button is labeled Convert to PDF/A-2. In the second frame is a man using a handkerchief to wipe sweat from his brow.

How do you tell the PDF/A variants apart?

Today, there are four variants of PDF/A, namely -1, -2, -3 and -4. Among these, PDF/A-3 stands out because, while the PDF file is still subject to design limitations (as with the other variants), it is also possible to embed any other file format into it. PDF/A-4 also permits this, but only in a special level of compliance known as PDF/A-4f. There is no doubt that PDF/A-3 and PDF/A-4f should only be used in very specific contexts, since a diverse range of archive formats is to be avoided. But more on this later. For now, we will concentrate on the other variants.

PDF/A-1 vs. PDF/A-2 and PDF/A-4

The PDF/A-1, PDF/A-2 and PDF/A-4 sub-standards have some features in common: in general, they limit what features can be used in a PDF file. This means, for example, that no external content, JavaScript, encryption or videos are permitted. Fonts must be embedded and colors must be defined independently of any device (using ICC profiles for instance). But how do these variants differ, and which should you use when? The key factor in this decision is the base PDF standard used in the variant. PDF/A-1 is based on PDF 1.4 (2001), PDF/A-2 on PDF 1.7 (2006), and PDF/A-4 on PDF 2.0 (2017). Each of these base standards introduced new features when it was launched. For example, layers and transparent objects can therefore not be used in PDF/A-1. This means that when converting to PDF/A-1, these kinds of objects need to be modified (flattened) and that means that information is permanently lost.

Another consideration that can have even more serious consequences is related to the fact that each new version of the base PDF standard also expanded the range of permitted internal values within a PDF file. A PDF 1.4 file had to be processable at a reasonable speed with the kind of hardware that was used in 2001, and the specification therefore applied some limits to internal structures. We all know how hardware performance has increased since then, which allows for much wider ranges for such structures in newer PDF files. However, when converting a contemporary PDF file to PDF/A-1, you need to meet the values specified for PDF 1.4. In rare cases, that means making changes at the PDF’s low-level internal structure. Such low-level changes are sometimes only possible by replacing all content on a page with an image. This leads, of course, to a loss of information: for example, text stops being text and is now only an image of text.

A good converter will only take such drastic measures in very, very rare cases. However, the question is: why does this have to happen at all? There is no chance that future hardware will drop back to a level of performance comparable with the year 2001.

So the easy answer to the question “Which archive format is the best?” is to make sure that the base PDF standard for the PDF/A variant is not older than the version of the archived PDF file. In most cases these days, this means PDF/A-2.

It is worth noting at this point that PDF/A-1 remains a valid sub-standard and has not been replaced by PDF/A-2, nor will that ever be the case with PDF/A-4 and PDF/A-2. PDF/A-1 files can remain to be in the archives forever; they don’t need to be converted to a more recent standard. However, it is strongly advisable to archive new files in PDF/A-2.

As of yet, it is not possible to give a strong recommendation for PDF/A-4, as the base PDF format, PDF 2.0 is still rare as of today.

Regardless of when PDF/A-4 will become the dominant variant, though, we don’t consider it a good goal to have an archive only made up of PDF/A-1, PDF/A-2 or PDF/A-4 files. These variants build upon one another, so the better strategy is to adjust the variant used as needed—to the PDF/A variant that corresponds to the newest base PDF version of the files to be archived.

Use cases for PDF/A-3 and PDF/A-4f

Before we go, one final word on those special cases we mentioned before – PDF/A-3 and PDF/A-4f. From an archiving perspective, it is essential to limit the variety of file formats used; these variants therefore require a framework of additional rules. But there are striking use cases for them, that all have in common that there is a specific, defined relationship between the actual archive file and the files that are embedded within it. An example are embedded source files – say, saving a spreadsheet alongside a PDF copy, digital invoices containing machine-readable datasets embedded into a human-readable PDF/A-3 file (for example ZUGFeRD invoices). Email archiving is another classic use case for PDF/A-3 and -4f. Here, the original email can be embedded in EML or MSG format, along with attachments.

Conclusion

In short, we can say that:

  • Existing PDF/A-1 files don’t need to be converted to a newer standard.
  • It is a good idea to convert new files to PDF/A-2.
  • PDF/A-4 is worth keeping an eye on, and:
  • PDF/A-3 and PDF/A-4f should only be used in contexts where the nature of the embedded files is defined.
WordPress Cookie Notice by Real Cookie Banner