Mixologist at work – live coding session
Michaël Demey, head of research and crack iText developer, leads you through the creation of PDF/A and PDF/UA files in this recap of his session from iText’s “Shake It, Make It” customer event.The iText Suite is a comprehensive PDF SDK which includes iText Core and optional add-ons to give you the flexibility to fit your needs. iText Core is an open source … Read more
After learning about the history of both the company and our solutions in the first session from the Shake It! Make It! iText customer event, I was excited to see the iText 7 PDF library in action. So, it was fascinating for me to watch the live coding session on creating PDF/A and PDF/UA documents hosted by Michaël Demey, iText’s Head of Research.
Michaël has been with iText for over 10 years, and under his lead the Research Department pushes both iText and PDF technology into the future. So, who better to demonstrate how easy iText makes it to create PDFs which meet the standards for archiving and universal accessibility?
As someone new to both the organization and the field, it was really enlightening to see the work that must happen behind the scenes to allow the creation of PDFs suitable for archiving, ensuring the information they contain can always be accessed, whether it be now or 50 years in the future. Michaël also went into detail about how you can create PDFs that are universally accessible for users with disabilities, and the importance of both the PDF/A and PDF/UA standards in modern document workflows.
For those not in the know—like me before this session—PDF/A and PDF/UA are two different standards built on top of the PDF specification. Each has its own requirements and limitations, and this can lead to confusion.
The A in PDF/A stands for “archiving”, and it’s designed for the long-term storage of documents. PDF/A files must contain all the resources necessary to correctly render the document, ensuring that a PDF can always be opened and read consistently. For example, by embedding font data you don’t have to worry about installing the correct font to display the document correctly.
PDF/UA on the other hand stands for “universal accessibility”. It’s designed to work alongside assistive technologies such as screen-readers and alternative navigation methods. The use cases don’t really overlap with PDF/A – but the file requirements sometimes do, and that’s where confusion often comes in.
Michaël started his session by creating a new Java project using the iText layout and PDF/A dependencies. He then went on to mention that there are numerous ways to work with iText, but that he would use the high-level API because it would allow him to work with familiar constructs like paragraphs, images, and tables as opposed to low-level PDF elements.
With just a few lines of code he defined a Paragraph
in a PdfDocument
with some text to pass on to the PdfWriter
class.
For reference, you can find Michaël's code example from the session in the original blog post.
Running this created a basic PDF file, which when opened in a PDF reader (such as Adobe Acrobat) displayed the text he had added. However, the file did not have any standards information, which is one of the requirements for PDF/A conformance.
To have this document meet the PDF/A standard, the first step is to decide which part of the PDF/A standard to target, and which conformance level to use. These depend on the use case and there is no “best one”, but for the purpose of this demonstration Michaël used the PDF/A-1b conformance level since it is the simplest to attain. After specifying the level in the code, you then need to include the output intent which tells the document how to interpret the colors used in the document.
Michaël then ran the code to generate the PDF, however, iText generated an error. However. this was intentional since he wanted to demonstrate how iText performs checks when creating a PDF/A document and it told him he “forgot” to embed the font. The PDF/A-1b standard requires you to embed any fonts that are being used because there is no guarantee that someone viewing the file in the future will have the correct font available. After adding the name of the chosen font and encoding strategy, and assigning it to the paragraph, iText was able to embed the font file and create the PDF successfully.
When he opened the new file in Acrobat the software was automatically notified that it was compliant with the PDF/A-1b standard. Michaël then verified this was true by clicking on the standards icon in the toolbar on the left. He suggested that you can also use veraPDF SDK for this, which can be found on GitHub.
For your convenience, you can find Michaël’s complete PDF/A code example in the original blog post.
After creating a PDF/A compliant document Michaël then showed us how to change that document to meet the PDF/UA standards. The first step he took was removing the PDF/A specific lines from the code. When working with the PDF/UA standard there is no need for font embedding or conformance levels. Instead, you set the correct properties for the PdfWriter
by using WriterProperties
. Michaël explained that this allows you to manipulate the output to specify things such as compression levels and the PDF version. For the sake of this demonstration though, he needed to ensure that the file had the correct XMP metadata for PDF/UA.
Once those changes were made, he generated a new PDF. This time Acrobat not only showed the text, it automatically enabled the screen reader to it read out loud. However, to confirm the file was properly compliant Michaël used Acrobat’s Preflight tool. While it was recognized as a PDF/UA file, there was a warning that elements were missing for it to fully comply with the standard. Michaël explained that the first issue was there was no language specified for the text. This is needed to make sure that the correct voice for the text-to-speech system is used.
Michaël then returned to his code and added a couple of lines to pass the language to the PdfWriter
. He also went on to show how to give the document a title, and in addition specified that the document should be tagged. This means that it shouldn’t just store raw text and data, but also information on how it is structured. This information is essential for software like screen readers to understand the correct reading order for documents.
After generating the PDF again, Michaël checked the file in Preflight to confirm that the document was now fully compliant with the PDF/UA standard. He also opened the tag view in Acrobat to show the document’s structure, with the text logically contained inside a paragraph within the document.
As before, here is Michaël’s complete PDF/UA code example from the demonstration.
After creating a PDF/A compliant document Michaël then showed us how to change that document to meet the PDF/UA standards. The first step he took was removing the PDF/A specific lines from the code. When working with the PDF/UA standard there is no need for font embedding or conformance levels. Instead, you set the correct properties for the PdfWriter
by using WriterProperties
. Michaël explained that this allows you to manipulate the output to specify things such as compression levels and the PDF version. For the sake of this demonstration though, he needed to ensure that the file had the correct XMP metadata for PDF/UA.
Once those changes were made, he generated a new PDF. This time Acrobat not only showed the text, it automatically enabled the screen reader to it read out loud. However, to confirm the file was properly compliant Michaël used Acrobat’s Preflight tool. While it was recognized as a PDF/UA file, there was a warning that elements were missing for it to fully comply with the standard. Michaël explained that the first issue was there was no language specified for the text. This is needed to make sure that the correct voice for the text-to-speech system is used.
Michaël then returned to his code and added a couple of lines to pass the language to the PdfWriter
. He also went on to show how to give the document a title, and in addition specified that the document should be tagged. This means that it shouldn’t just store raw text and data, but also information on how it is structured. This information is essential for software like screen readers to understand the correct reading order for documents.
After generating the PDF again, Michaël checked the file in Preflight to confirm that the document was now fully compliant with the PDF/UA standard. He also opened the tag view in Acrobat to show the document’s structure, with the text logically contained inside a paragraph within the document.
And there you go. Two skilfully mixed and standards-compliant PDF cocktails, prepared in less than 20 minutes by an expert mixologist! Even as a non-developer, I found Michaël’s coding session very easy to follow, with clear explanations of all the necessary steps to create both types of PDF document. It’s also important to note that everything you need to create these documents is built into the iText 7 Core library; you don’t require any extra software.
Of course, in only 20 minutes there’s only so much Michaël could cover. So, for more in-depth information and background on the PDF/A and PDF/UA standards, you can read our free eBooks on both topics which are linked below.
PDF/A: digital documents to withstand the sands of time
PDF/UA: the inclusive document format
This article was based on a talk given at iText’s 2022 Customer Event.
Still have questions?
If you are interested in learning more or have additional questions.
If you are interested in learning more about iText 7 suite.
Original post: https://itextpdf.com/blog/itext-news/mixologist-work-live-coding-session
The iText Suite is a comprehensive PDF SDK which includes iText Core and optional add-ons to give you the flexibility to fit your needs. iText Core is an open source PDF library that you can build into your own applications and is a reimagining of the popular iText 5 engine…
Read more