PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Displaying text in different languages in a single PDF document

Earlier versions of the iText PDF library were already able to render Chinese, Japanese and Korean glyphs in PDF documents, but to correctly display right-to-left scripts like Hebrew and Arabic, we needed the information provided by OpenType fonts to help with handling the complexities of all the world’s writing systems. So, for iText 7 we went back to the drawing board to provide OpenType support for advanced font features in PDF documents. However, we decided to go a step further and created p … Read more
The iText Suite is a comprehensive PDF SDK which includes iText Core and optional add-ons to give you the flexibility to fit your needs. iText Core is an open source … Read more
Displaying text in different languages in a single PDF document

iText
October 16, 2019

Member News


Print Friendly, PDF & Email

Earlier versions of the iText PDF library were already able to render Chinese, Japanese and Korean glyphs in PDF documents, but to correctly display right-to-left scripts like Hebrew and Arabic, we needed the information provided by OpenType fonts to help with handling the complexities of all the world's writing systems. So, for iText 7 we went back to the drawing board to provide OpenType support for advanced font features in PDF documents.

However, we decided to go a step further and created pdfCalligraph, a commercially licensed add-on module for the iText 7 library which was specifically designed to support many more languages and writing systems, and like iText 7 it's available for both the Java and .NET (C#) platforms. For detailed information about the inherent difficulties of supporting multiple languages and writing systems in the PDF standard, and the powerful and unique solutions pdfCalligraph provides, we recommend reading the pdfCalligraph white paper.

In this article, we’ll demonstrate how you can use pdfCalligraph with iText to create a PDF containing text using different languages. But first, here’s a short explanation of how pdfCalligraph works.

How does pdfCalligraph integrate with iText 7?

The iText layout module will automatically look for pdfCalligraph in its dependencies if text if a language or writing system that requires it is encountered by the Renderer Framework. For example, when iText encounters text that contain Indic texts, or a script that's written from right to left, iText checks if pdfCalligraph is available and will then use its functionality to provide the correct glyph shapes to write to the PDF file. However, as the typography logic is complex and can be resource-heavy even for documents that don’t require this functionality, iText won't attempt any advanced shaping operations if the pdfCalligraph module has not been loaded as a binary dependency.

pdfCalligraph features

  • Automatic detection of writing systems
  • Wider language support
  • Right-to-left support
  • Ligatures
  • Kerning
  • Glyph substitution
  • Available for Java and .NET (C#)

Using pdfCalligraph

To use pdfCalligraph you simply load the correct binaries into your project, make sure your valid license file is loaded, and iText 7 will automatically use the pdfCalligraph code when it is required by a document. If you don't have a commercial license for pdfCalligraph, you can get a free trial of the iText 7 Suite which includes the iText 7 Core library, plus all the add-ons.

Usage example

For this example, we'll demonstrate using pdfCalligraph to correctly render text in different languages. First, let’s start with a simple English sentence, which we've translated into three different languages using Google Translate.

This is an example sentence.

Let's see how that looks in Arabic:

.هذه هي الجملة المستخدمة في المثال

Now let’s see how it looks in Hindi:

यह एक उदाहरण वाक्य है।

And finally in Tamil:

இது ஒரு எடுத்துக்காட்டு வாக்கியம்.

Those of you familiar with any of the languages in question may notice the translations are not perfect, but they will be sufficient for our purposes here.

Now we’ll save each piece of text as separate XML files, english.xmlarabic.xmlhindi.xml and tamil.xmlNote that because Arabic is written right-to-left, in order for pdfCalligraph to display the Arabic text starting from the right side of the PDF you’ll need to specify this in the XML. However, the languages will be detected and handled by pdfCalligraph automatically.

To render the text correctly in our PDF, we’ll be making use of the Google Noto fonts which you can download using the link. You may have noticed that sometimes when text is rendered by a computer, certain characters are displayed as little boxes. This indicates your device doesn’t have a font that's able display the text, so unrecognized characters are rendered as these boxes (or “tofu”).

The Noto fonts are Google’s answer to tofu and the name “noto” was chosen to convey the idea that Google’s goal is to see “no more tofu”. The Noto fonts are free to use and have multiple styles and weights available. The Noto font family is comprised of over 100 individual fonts which have been designed to cover all the scripts encoded in the Unicode standard with a harmonious look and feel.

We’ll be using the NotoNaskhArabic-Regular and NotoSansTamil-Regular fonts to render our Arabic, Hindi and Tamil texts as they are intended to appear.

In the following code, we take the text from all four source files and display them as separate paragraphs in a single PDF document. You can click the button in the top-right of the code window to switch between Java and .NET (C#) code:

Java

 

C#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
final String[] sources = {"english.xml", "arabic.xml", "hindi.xml", "tamil.xml"};
        final PdfWriter writer = new PdfWriter(DEST);
        final PdfDocument pdfDocument = new PdfDocument(writer);
        final Document document = new Document(pdfDocument);
        final FontSet set = new FontSet();
        set.addFont("fonts/NotoNaskhArabic-Regular.ttf");
        set.addFont("fonts/NotoSansTamil-Regular.ttf");
        set.addFont("fonts/FreeSans.ttf");
        document.setFontProvider(new FontProvider(set));
        document.setProperty(Property.FONT, new String[]{"MyFontFamilyName"});
        for (final String source : sources) {
            final File xmlFile = new File(source);
            final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            final DocumentBuilder builder = factory.newDocumentBuilder();
            final org.w3c.dom.Document doc = builder.parse(xmlFile);
            final Node element = doc.getElementsByTagName("text").item(0);
            final Paragraph paragraph = new Paragraph();
            final Node textDirectionElement = element.getAttributes().getNamedItem("direction");
            boolean rtl = textDirectionElement != null && textDirectionElement.getTextContent()
                    .equalsIgnoreCase("rtl");
            if (rtl) {
                paragraph.setTextAlignment(TextAlignment.RIGHT);
            }
            paragraph.add(element.getTextContent());
            document.add(paragraph);
        }
        document.close();
        pdfDocument.close();
        writer.close();

This will create a PDF containing the following text in the four specified languages, as illustrated below:

An image displaying four different languages in a single PDF
Our example PDF showing the four different languages.

Results

You can see our example PDF here.

Resources

Example language files

Supported languages and scripts

This table shows the additional languages and scripts support enabled by pdfCalligraph, as well as the ones natively supported in iText 7:

Language Script Module
Arabic, Persian, Kurdish, Azerbaijani, Sindhi, Pashto, Lurish, Urdu, Mandinka, Punjabi and others ARABIC pdfCalligraph
Hebrew, Yiddish, Judaeo-Spanish, and Judeo-Arabic HEBREW pdfCalligraph
Bengali BENGALI pdfCalligraph
Hindi, Sanskrit, Pali, Awadhi, Bhojpuri, Braj Bhasha, Chhattisgarhi, Haryanvi, Magahi, Nagpuri, Rajasthani, Bhili, Dogri, Marathi, Nepali, Maithili, Kashmiri, Konkani, Sindhi, Bodo, Nepalbhasa, Mundari and Santali DEVANAGARI, NAGARI pdfCalligraph
Gujarati and Kutchi GUJARATI pdfCalligraph
Punjabi GURMUKHI pdfCalligraph
Kannada, Konkani and others KANNADA pdfCalligraph
Khmer (Cambodia) KHMER pdfCalligraph
Malayalam MALAYALAM pdfCalligraph
Odia ORIYA pdfCalligraph
Tamil TAMIL pdfCalligraph
Telugu (Dravidian language) TELUGU pdfCalligraph
Thai THAI pdfCalligraph
Chinese Core
Japanese Core
Korean Core
WESTERN Core
Russian, Ukrainian, Belarussian, Bulgarian and others CYRILLIC Core
Greek GREEK Core
Armenian ARMENIAN Core
Georgian GEORGIAN Core

Still have questions?

If you are interested in learning more or have additional questions, contact us


The iText Suite is a comprehensive PDF SDK which includes iText Core and optional add-ons to give you the flexibility to fit your needs. iText Core is an open source PDF library that you can build into your own applications and is a reimagining of the popular iText 5 engine…

Read more
WordPress Cookie Notice by Real Cookie Banner