PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

Artistic rendition of a galaxy on a page.

Time to get on board with ARIA in PDF!?!

PDF Week Online 2024 week 2 is coming up! Did you know that PDF 2.0 adds ARIA to PDF? How about a PDF bigger than the universe? Are you going to let an AI write your proposal? PDFacademicBot for February 2024.
About the author: The PDF Association staff delivers a vendor-neutral platform for PDF’s stakeholders, facilitating the development of open specifications and ISO standards for PDF technology. Staff members include: Alexandra Oettler (Editor), Betsy Fanning … Read more

We are halfway through PDF Week Online 2024!

PDF Week Online logoLast week covered the PDF Imaging Model discussion on how to integrate modern imaging technologies in PDF, including HDR, as well as new image formats such as AVIF and JPEG-XL while the PDF TWG addressed many outstanding issues in the pdf-issues repo. Here’s the PDF Week lineup.

For the 2nd week the major focus is on various aspects of tagged PDF + the PDF Technology Marketing Working Group (for members’ marketing teams) and the EA-PDF LWG continues its project of specifying email archiving in PDF.

Time to get on board with ARIA in PDF!?!

Various members of PDF Association working groups, including Frank Mittelbach, Ulrike Fischer, Roman Toda, Neil Soiffer, and Ross Moore will present papers at next week’s 5th International Workshop on Digitization and E-Inclusion in Mathematics and Science:

  • Frank Mittelbach and Ulrike Fischer will present "Enhancing LaTeX to Automatically Produce Tagged and Accessible PDF".
  • Roman Toda, Jing Mu, Youqiang Wu and Neil Soiffer will present “PDF Document Object Model Support for Math".
  • Ross Moore’s paper (we’ve seen a preview) will discuss how he leverages the addition of ARIA support in PDF 2.0. We look forward to the day when many more implementers support PDF with ARIA-enhanced semantics! Check out his website to see what Ross has been thinking about.

A PDF bigger than the universe!

It seems that many people are entertained by the idea of a ridiculously large PDF page. Alex Chan is one of these; we know this because she’s written an amusing blog post about her explorations into writing PDF files by hand (don’t do it!) as well as making absurdly large PDF files.

In practice, Alex did make a few monster PDFs - files that use abuse the /UserUnit value to push page dimensions to kilometers… and yes, light-years. There are some problems with her “universe.pdf” file (/Size in the trailer should be “5”, an incorrect stream length according to the Arlington PDF Model), but she’s not wrong that PDF makes it possible to make a 1:1 scale representation of, say, the Milky Way galaxy… but just don’t expect software to support the result!

Gonna let that AI write your response to that RFP?

A proposal for “intelligent documents” suggests that a combination of AI tools and metadata will deliver a next-generation of documents, “...allowing them to respond, adapt, and provide tailored information based on the user's needs.”

At the core of intelligent documents is the use of metadata-driven by LLMs. This metadata acts as a set of instructions or prompt engineering that defines the context and meaning of the information contained within the document. It can specify how to access external data via APIs, outline the logic and reasoning behind the document's structure, and set boundaries for how and where interactions with the document should occur.

We’ve been down similar paths before. Does anyone remember Wolfram’s proprietary “Computable Document Format” (CDF) launched back in 2011 (obviously from a time with far less “AI-ness”). Like  many other formats, today CDF is officially regarded by Wolfram as a “legacy format”. More recently AI-startup bit.ai have promoted their “Living document” and “AI genius writing assistant”.

Our observation: call them what you will “intelligent documents”, “smart documents”, “living documents” or simply documents connected to live data sources, shift authoring from something that almost anyone can do today to something where prompt engineering, databases, APIs, and other technical skills become required (what this proposal calls “metadata”). Although undoubtedly AI will (is?) influence the authoring process, communication via documents requires reliability and trust in the information communicated, and an ability to preserve and share the same dependable information with others.

PDFacademicBot for February, 2024

Erkan, B. et al. (2023) ‘3D PDF: A Promising Learning Tool for Anatomy Education’, AMEE 2023, p. pp.439-440. https://avesis.gazi.edu.tr/yayin/d58f88df-a9f2-49b7-a0a3-60d5eaf6e436/3d-pdf-a-promising-learning-tool-for-anatomy-education

“Take-home Message: 3D PDFs can be used as a feasible tool to learn complex anatomical structures instead of 2D atlases.”

Fernandes, P., Ó Ciardhuáin, S. and Antunes, M. (2024) ‘Uncovering Manipulated Files Using Mathematical Natural Laws’, in V. Vasconcelos, I. Domingues, and S. Paredes (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Cham: Springer Nature Switzerland (Lecture Notes in Computer Science), pp. 46–62. https://doi.org/10.1007/978-3-031-49018-7_4.

M. Sakhawat Hossain et al. (Dec 2023) ‘PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis’. IEEE. https://ieeexplore.ieee.org/ielx7/6287639/10380310/10412055.pdf?tp=&arnumber=10412055&isnumber=10380310.

Jiang, Z., Wang, H. and Han, S. (2024) ‘A robust PDF watermarking scheme with versatility and compatibility’, Multimedia Tools and Applications [Preprint]. https://doi.org/10.1007/s11042-024-18151-w.

Li, J. et al. (Dec. 2023) ‘Design and implementation of a drawing encryption tool based on image processing and PDF manipulation’, in Fourth International Conference on Signal Processing and Computer Science (SPCS 2023). Fourth International Conference on Signal Processing and Computer Science (SPCS 2023), SPIE, pp. 547–551. https://doi.org/10.1117/12.3012311.

Lin, D. (23 Jan 2024) ‘Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Recognition’. arXiv. https://doi.org/10.48550/arXiv.2401.12599.

Liu, R. et al. (2023) ‘Evaluating Representativeness in PDF Malware Datasets: A Comparative Study and a New Dataset’, in. 2023 IEEE International Conference on Big Data (BigData), IEEE Computer Society, pp. 3017–3024. https://doi.org/10.1109/BigData59044.2023.10386516.

Rhiannon Simpson (November 2023) ‘Delete the PDF and Start Again?: Exploring the Potential for Innovative Dissemination Methods of Music Education Scholarship’, in. Action, Criticism, and Theory for Music Education, The Action, Criticism, and Theory and MAYDAY Group, p. 159-83. https://doi.org/10.22176/act22.3.159.

WordPress Cookie Notice by Real Cookie Banner