All top word-processors default to Tagged PDF

The PDF Association staff delivers a vendor-neutral platform in service of PDF’s stakeholders.


All top word-processors default to Tagged PDF
As of December 2024 (when Google joined the party), Microsoft Word, Google Docs, LibreOffice, Apple Pages, Collabora Online… all create Tagged PDF by default!
The just-released Collabora Online 25.04 goes further by additionally stating that “PDF 2.0 support also provides stronger encryption, improved digital signatures and better handling of transparency and annotations”.
Keep up-to-date with Graphic Technology
ISO TC 130 works on many topics beyond purely PDF-related technologies. After each ISO TC 130 meeting, FOGRA publishes a free summary of the key discussion topics from many of the ISO TC 130 working groups.
The Library of Congress is studying EA-PDF
Among its many other functions the US Library of Congress conducts extensive research on digital file-formats, both to inform its own activities but to provide a public service, with its Sustainability of Digital Formats service, providing a wealth of information about digital content formats.
The Library recently announced its near-term objectives for research, including the PDF Association’s recently-published EA-PDF specification defining a use of PDF for archiving email.
First errata updates to Office Open XML Formats since 2016
ISO/IEC JTC 1 SC 34 has progressed to the Draft International Standard stage (public availability for review), introducing a sizeable set of errata corrections to ISO/IEC 29500 part 1 and part 4 since 2016. These revisions address defect reports submitted by members, external organizations, and individuals, with “some defect reports [being] challenging and time consuming to resolve [and] editorial changes required were extensive and took significant time to make”. No new features have been added. Style changes were also needed to satisfy ISO editorial requirements given in the latest Directives Part 2 and ISO House Style (as ISO/IEC 29500 were originally Fast Track submissions and not required to conform to these requirements).
Interestingly, this nine-year difference between 2016 and 2025 is the same amount of time between the fast-track publication of PDF 1.7 as ISO 32000-1:2008 and the initial ISO publication of PDF 2.0 as ISO 32000-2:2017; however, PDF 2.0 also included many additional new features.
PDFacademicBot for June 2025
Ahmed, M.W. and Afzal, M.T. (2020) ‘FLAG-PDFe: Features Oriented Metadata Extraction Framework for Scientific Publications’, IEEE Access, 8, pp. 99458–99469. https://doi.org/10.1109/ACCESS.2020.2997907.
Billah, M.M. (2025) A ROBUST FRAMEWORK FOR DATA EXTRACTION FROM UNSTRUCTURED SOURCES USING LARGE LANGUAGE MODELS. Masters Thesis. Mälardalen University, Sweden. School of Innovation, Design and Engineering. https://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-71838.
Bonimir Penchev and Latinka Todoranova (Nov. 2024) ‘Accessibility of Electronic Resources for Students with Disabilities - ProQuest’, Acta Educationis Generalis, 15(1), pp. 19–30. https://www.proquest.com/openview/a99752208da0c7e1fca359bef0dde529/1?pq-origsite=gscholar&cbl=6758134.
Cao, R. et al. (May 2025) ‘NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering’. arXiv preprint. Accepted to ACL 2025. https://doi.org/10.48550/arXiv.2505.19754.
Chambi, S.P.V. et al. (June 2025) ‘Constructing a Structured Corpus from Geoscience Literature: A Case Study using Western Australia Iron and Lithium Deposits’, International Journal of Geoinformatics, 21(5), pp. 37–61. https://doi.org/10.52939/ijg.v21i5.4159.
Corvette, M. and Bostian, J. (May 2025) ‘Beyond Barriers: How AI is Reshaping PDF Accessibility’, Teaching and Learning with AI Conference Presentations [Preprint]. https://stars.library.ucf.edu/teachwithai/2025/wednesday/31.
C, E. et al. (April 2025) ‘Biomedical Chat Assistant with Personalized Document Reader Using BioMistral and RAG’, in 2025 International Conference on Computing and Communication Technologies (ICCCT), pp. 1–6. https://doi.org/10.1109/ICCCT63501.2025.11020219.
Hasan, M. (May 2025) Benchmarking Extraction of Structured Data from Templatized Documents. Master of Science Thesis. University of California, Berkeley. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-77.pdf.
Hofmeier, Michael et al. (June 2025) ‘Individual Technology Commitment and the Rating of Usability and Trustworthiness of Electronic Signature Systems’, in A. Moallem (ed.) HCI for Cybersecurity, Privacy and Trust. Cham: Springer Nature Switzerland, pp. 42–55. https://doi.org/10.1007/978-3-031-92840-6_3.
Hossain, G.M.S. et al. (2024) ‘PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis’, IEEE Access, 12, pp. 13833–13859. https://ieeexplore.ieee.org/document/10412055/ and https://www.ijirid.in/4-1-25Feb/4-1-18-Amruta%20Patil-Rahul%20Khade,%20Shivani%20Lande,%20Sanika%20Jadhav,%20Pranjali%20Garud.pdf
Iyengar, S.S. et al. (May 2025) ‘Hybrid Detection of Malicious Portable Document Format (PDFs): Safeguarding Against Embedded JavaScript Attacks’, in S.S. Iyengar et al. (eds) Artificial Intelligence in Practice: Theory and Application for Cyber Security and Forensics. Cham: Springer Nature Switzerland, pp. 257–290. https://doi.org/10.1007/978-3-031-89327-8_9.
Lopez-Duran, M. et al. (May 2025) ‘Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs’. 5th ICDAR International Workshop on Machine Learning. arXiv. https://doi.org/10.48550/arXiv.2505.14699.
Shilaskar, S. et al. (March 2025) ‘GenAI based Data Extraction with Query-Based Insights’, in 2025 International Conference on Emerging Smart Computing and Informatics (ESCI). 2025 International Conference on Emerging Smart Computing and Informatics (ESCI), pp. 1–6. https://doi.org/10.1109/ESCI63694.2025.10987906.
Shurithi, S. et al. (2025) ‘Revolutionize Your Workflow using PDF Simplifier AI for Quick, Smart, and Streamlined Document Processing’, International Journal of Advanced Research in Education and Technology, 12(3), pp. 291-1295. https://doi.org/10.15680/IJARETY.2025.1203033.
Sm Kamali et al. (April 2025) ‘Automated Text-to-Audio Conversion for Visually Impaired People Using Optical Character Recognition’, International Research Journal of Multidisciplinary Scope, 06(02), pp. 992–1008. https://doi.org/10.47857/irjms.2025.v06i02.03672.
Teixeira, F. et al. (May 2025) ‘Automating Data Extraction from PDF Sleep Reports Using Data Mining Techniques’, Studies in Health Technology and Informatics, 327, pp. 898–899. https://doi.org/10.3233/SHTI250498.
Tchantchou, Y.-U.S. (April 2025) ‘An n-gram-bAn N-gram-based Information Retrieval Approach for Surveys on Scientific Articlesased information retrieval approach for surveys on scientific articles’, Informatica, 49(20), p. 16. https://doi.org/10.31449/inf.v49i20.5895.
Usoroh, R.U., Ghergulescu, I. and Moldovan, A.-N. (April 2025) ‘Malware Detection in PDF and PE Files Using Machine Learning and Feature Selection’, in 2025 13th International Symposium on Digital Forensics and Security (ISDFS), pp. 1–6. https://doi.org/10.1109/ISDFS65363.2025.11012049.
Viviurka do Carmo, P. et al. (May 2025) ‘Improving Natural Product Knowledge Extraction from Academic Literature with Enhanced PDF Text Extraction and Large Language Models’, in Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing. New York, NY, USA: Association for Computing Machinery (SAC ’25), pp. 980–987. https://doi.org/10.1145/3672608.3707858.