PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

A complete

PDF is trustworthy; authors may not be!

Verizon is tricked by a fraudulent PDF, a family tree of PDF specifications, veraPDF is tops for the Open Preservation Foundation in 2023, a new PDF malware dataset from IEEE, Apple expands support for PDF, more from ChatGPT… and of course, the PDFacademicBot for December.
About the author: The PDF Association staff delivers a vendor-neutral platform for PDF’s stakeholders, facilitating the development of open specifications and ISO standards for PDF technology. Staff members include: Alexandra Oettler (Editor), Betsy Fanning … Read more

PDF is trustworthy; authors may not be!

In a world of increasingly sophisticated fakes, the importance of digital signature technology is growing, and never more so than when it comes to official documents. Case in point: US phone company Verizon recently fell for a fraudulent PDF file purportedly originating from a real police department and signed by a real judge. Nonetheless, Verizon provided the phone number and phone logs the faker requested.

Don’t let this happen to you.

A “family tree” of PDF specifications and related ISO standards

A complete "family tree" of PDF specifications and ISO standards.
Kindly provided by the German Printing and Media Industries Federation (BVDM).

The Ghent Workgroup advocates for PDF 2.0…

The next Ghent Workgroup webinar from GWG is called: “Why are you still using a 20-year-old standard? Stop that!

We love that title.

At 4 pm CET on Tuesday, January 23, 2024, join Christian and David as they discuss PDF standards, best practices advocated by the Ghent Workgroup, and the future.

Open Preservation Foundation End of Year Highlights

veraPDF was created by the veraPDF Consortium, led by the Open Preservation Foundation (OPF) and the PDF Association. OPF has released its annual report and highlights for 2023, with veraPDF featuring strongly, including as the clear leader for 2023 downloads. OPF gives special mention to the veraPDF development pre-release leveraging the Arlington PDF Model, developed by the PDF Association as part of our DARPA “SafeDocs” engagement.

Fido, 18977,, JHOVE 14188, JPLYZER 5580, veraPDF 24559.

The PDF Association thanks OPF for its ongoing support of veraPDF. We look forward to more opportunities to share information and resources with OPF members in 2024!

IEEE DataPort “pdfrep” malware dataset

New PDF Association Institutional member IEEE has published PDFREP, a new malware dataset. The authors of the dataset describe it as follows:

“With the widespread use of the Portable Document Format (PDF), it’s increasingly becoming a target for malware, highlighting the need for effective detection solutions. In recent years, machine learning-based methods for PDF malware detection have grown in popularity. However, the effectiveness of ML models is closely related to the quality of the training datasets. In this research, we investigated two widely used PDF malware datasets: Contagio and CIC. We found biases and representativeness issues that could affect the reliability and applicability of models built on them. Our statistical analysis revealed marked difference between these datasets and PDF malware samples from VirusTotal, as well as benign PDFs from Govdocs, pointing to the necessity for more representative datasets in PDF malware research.. To address this gap, we introduce a novel dataset: PdfRep. Our findings demonstrate that PdfRep outperforms both CIC and Contagio across various evaluation metrics. The main contribution of this paper is the introduction of PdfRep, a new PDF malware dataset that overcomes the limitations of representativeness in existing datasets. This enhancement substantially increases the accuracy of PDF malware detection models and holds promise for advancing the field of PDF malware detection research.”

More questions for ChatGPT 4.0…

We asked ChatGPT 4.0 “Is PDF's popularity increasing or decreasing?”

It answered:

As of my last update in April 2023, the popularity of the Portable Document Format (PDF) has remained strong and is generally considered to be stable or even increasing. The reasons for its sustained popularity include:

  1. Standardization: PDF is an open standard maintained by the International Organization for Standardization (ISO). This standardization ensures consistency and reliability, making it a trusted format for business, legal, and government documents.
  2. Versatility: PDFs can encapsulate text, fonts, images, and even interactive elements, making them suitable for a wide range of uses—from simple document sharing to complex interactive reports.
  3. Compatibility: PDF files are compatible with almost all operating systems and devices, and can be viewed and created using a multitude of software options, ranging from free viewers to professional editing tools.
  4. Reliability: A PDF file captures a document's exact appearance, independent of the software, hardware, or operating system used to view it. This reliability is crucial for official documents, contracts, and publications.
  5. Digital Workflows and E-Signatures: With the rise of digital workflows and remote work, PDFs have become increasingly important for electronic signatures and document management systems, accommodating the need for legally binding digital signatures and document tracking.
  6. Accessibility: Advancements in PDF technology have improved its accessibility features, making it easier to create documents that are accessible to people with disabilities, which is a growing requirement in many sectors.
  7. Evolving Standards: The PDF format continues to evolve, with new versions like PDF 2.0 introducing enhancements that keep it relevant to modern needs, such as better support for rich media, improved encryption, and data compression.

However, the exact trend can vary by industry and use case. For example, in web and mobile application development, other formats like HTML5 may be preferred for their interactivity and responsiveness. But in legal, academic, and many business contexts, the PDF's ability to maintain document fidelity ensures its ongoing relevance and popularity.

Not bad, ChatGPT, not bad at all…

Questions for the ISO, IEC, JTC1 Directives GPT

Jonghong Jeon, Standardization Director at the Korean Society of Artificial Intelligence in Medicine, has established a ChatGPT instance dedicated to the standardization rules for ISO, IEC, and JTC1 with the intention of helping guide users through these procedures, processes, and styles.

We decided to ask it a question with an eye towards the utilization of PDF for ISO standards.

PDF Association: Do ISO Directives or style guides prohibit the use of file attachments in published ISO documents?

ISO, IEC, JTC1 Directives GPT: 

The ISO Directives do not prohibit the use of file attachments in published ISO documents. In fact, the ISO/IEC Directives, Part 2 (2021), under clause 6.5, specifically address the inclusion of supplementary content in electronic formats that may differ from the main content of the document. This supplementary material can be provided as an attached file or via a hyperlink (URN or URL). However, it is stipulated that such supplementary content should only be included if it cannot reasonably be incorporated into the main body of the text. Moreover, each item of supplementary content must be explicitly referred to within the text to explain its context and use.

This provision allows for flexibility in including additional materials like data sets, code components, or test forms, which may be essential for the comprehensiveness and usability of the standard, especially in technical fields such as software and file format specifications.

Apple continues investing in PDF

The latest MacOS, Sonoma 14.2, includes enhancements for PDF forms including “Enhanced Autofill”.

PDFacademicBot for December, 2023

Devi, R.N., Mohan, K.R. and Nagaram, N.B. (2023) ‘Novel cryptographic approach based on PDF and CDF’, Southeast Europe Journal of Soft Computing, 12(2), pp. 34–38. Available at: https://doi.org/10.21533/scjournal.v12i2.270.

Guangyong Gao et al. (2023) ‘Blockchain-based PDF File Copyright Protection and Tracing’. (preprint, Peer Networking and Applications). Available at: https://doi.org/10.21203/rs.3.rs-3568563/v1.

Habing, T. et al. (2023) ‘PDF/MAIL: Moving Theory Towards Practice’, 19th International Conference on Digital Preservation, p. 6. Available at: https://hdl.handle.net/2142/121124 (Accessed: 12 November 2023).

Hauser, N. and Pennekamp, J. (2023) ‘[Tool] Automatically Extracting Hardware Descriptions from PDF Technical Documentation’, Journal of Systems Research, 3(1). Available at: https://doi.org/10.5070/SR33162446.

Júnior, C.D.S. and Dorneles, C.F. (2023) ‘Enrichment of data in digital documents with metadata extraction’, International Journal of Metadata, Semantics and Ontologies, 16(2), pp. 187–193. Available at: https://doi.org/10.1504/IJMSO.2023.135335.

Liu, C. et al. (2023) ‘DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF’. arXiv. Available at: https://doi.org/10.48550/arXiv.2310.17910.

Lo, K. et al. (2023) ‘PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents’, in Y. Feng and E. Lefever (eds) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Singapore: Association for Computational Linguistics, pp. 495–507. Available at: https://aclanthology.org/2023.emnlp-demo.45.

Marie Peeples (2023) ‘PDFs and Accessibility: Portable Document Format or Pretty Deeply Flawed?’, Between the Stacks 07-2023, July, pp. 10–111. Available at: https://digitalcommons.memphis.edu/cgi/viewcontent.cgi?article=1023&context=betweenthestacks.

Niccolò Revel Garrone (2023) Information retrieval from PDF of companies - Calculation of the ‘intensity’ metric to assess emissions. Master’s Thesis, Data Science Engineering. Polytechnic of Turin. Available at: https://webthesis.biblio.polito.it/secure/28645/1/tesi.pdf.

Santhakumar, D., Sasikala, L. and Balajee, A. (2023) ‘Book GPT: An Innovative PDF Querying Tool’, International Journal of Pharmacy Research & Technology, 14(1), p. 6. Available at: https://doi.org/10.31838/ijprt/14.01.03.

Shah, A.K. et al. (2023) ‘ChemScraper: Graphics Extraction, Molecular Diagram Parsing, and Annotated Data Generation for PDF Images’. arXiv. Available at: https://doi.org/10.48550/arXiv.2311.12161.

Szyjewski, G. (December 2023) ‘Securing Digital Copies of the Documents to Ensure Documents’ Integrity’, European Research Studies Journal, XXVI(4), pp. 718–726. Available at: https://ideas.repec.org//a/ers/journl/vxxviy2023i4p718-726.html.

Svetlana Zamyatina (2022) Text mining of the company’s annual reports in PDF format. Thesis. Masters in Modelling, Data Analysis and Decision Support Systems. Universidade do Porto. Available at: https://repositorio-aberto.up.pt/bitstream/10216/153385/2/646701.pdf.

Takatsume, Y. and Sakamoto, Y. (2023) ‘Potential of Three-Dimensional Portable Document Format for Article Publication in Craniofacial Surgery’, Journal of Craniofacial Surgery, p. 10.1097/SCS.0000000000009891. Available at: https://doi.org/10.1097/SCS.0000000000009891.

Wang, S. (August 2023) ‘Survey of Malicious PDF Attacks’, Frontiers in Computing and Intelligent Systems, 5(2), pp. 104–107. Available at: https://drpress.org/ojs/index.php/fcis/article/view/13109.

Yamaguchi, A. and Morishita, T. (October 2023) ‘appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit’. arXiv. Available at: https://doi.org/10.48550/arXiv.2310.01206.

WordPress Cookie Notice by Real Cookie Banner