Clouds

First set of candidate veraPDF corpus files delivered

Duff Johnson // May 18, 2015

News

Print Friendly, PDF & Email

As the veraPDF project gets under way the project is generating the first test files for PDF/A-1, complementing the Isartor test suite.

Dual Lab, the veraPDF consortium’s lead developer, has loaded the first set of 49 candidate test files to the public veraPDF github repository.

The test files can be found at the veraPDF corpus for PDF/A-1b (under development) along with the wiki page describing the set.

All test files follow the pattern of the Isartor Test Suite:

  • naming convention refers to the corresponding subsection in ISO 19005-
  • they are all atomic
  • they are self-documented via PDF bookmarks

However, unlike Isartor, these files also contain “pass” tests.

There is one remarkable file to note:

6-1-12-t07-fail-a: Maximum number of Indirect objects (8,388,607) in PDF file is exceeded (the file is about 40Mb zipped)

Screenshot of File Being Repaired dialog.The document cross reference table contains more than maximum allowed number of records, violating PDF/A-1 implementation limits.

Warning: Be careful trying to validate this file in Adobe Acrobat! It will probably open after 30 seconds of thrashing, but it will hang on preflight checks.


ABOUT THE AUTHORS

Duff Johnson
Duff Johnson

As CEO of the PDF Association and as an ISO Project Leader, Duff coordinates industry activities, represents industry stakeholders in a variety of settings and promotes the advancement and adoption of PDF technology worldwide.

ABOUT THE AUTHORS

Duff Johnson

Duff Johnson

As CEO of the PDF Association and as an ISO Project Leader, Duff coordinates industry activities, represents industry stakeholders in a …