How are signers identified?

December 20, 2022

Identifying the signer’s key in a PDF signature: how does that work?

Independent PDF expert and FOSS developer, co-representing the Belgian standards body NBN in ISO/TC 171/SC 2. I’m the current project leader for ISO/TS 32004. I was employed as a Research Engineer at iText in Ghent between 2020 and 2022. Before that, I spent a few years doing mathematical research at … Read more

Identifying the signer’s key in a PDF signature: how does that work?

Signer identification in CMS

People who've worked with me will undoubtedly know that I'm a zealous advocate for the Cryptographic Message Syntax (CMS), defined in RFC 5652. CMS is the spiritual successor to PKCS #7, and is sometimes still referred to as such. The SignedData type in CMS is particularly popular in all sorts of digital signing schemes, where it's used as a vehicle to encode a "raw" signature together with the relevant metadata necessary to understand it. PDF is part of the CMS club too: virtually everyone uses CMS SignedData objects to encode signatures in PDF documents nowadays ¹ .

But let's not get ahead of ourselves too much: we're still talking about CMS. The design philosophy of CMS (as interpreted by yours truly — don't put too much stock in that) is one of maximal extensibility: the CMS specification itself makes relatively few assumptions about what people put in. As far as SignedData is concerned, the specification only prescribes how to correctly encode SignedData objects, and sets out some basic validation rules. The idea is that other standards can then profile CMS to obtain a subset that's more appropriate for their use cases.

One of the things that CMS does not do (by default) is to pin the signature to a specific certificate. This is actually a fairly common misconception. From a mathematical point of view, the signature validation procedure doesn't really care about certificates, it only needs to know the signer's public key. There can be more than one certificate made out to the same public key, for any number of reasons. The certificate is only relevant to verify the binding between a key and its owner, which is a different problem entirely. However, it's important to note that some profiles of CMS do pin the signature to a specific certificate ² . That requires a separate mechanism and the security concerns it addresses are beyond the scope of this post.

So, how does CMS identify signers? To answer that question, we need to take a look at some definitions from RFC 5652.

SignedData ::= SEQUENCE {
       version CMSVersion,
       digestAlgorithms DigestAlgorithmIdentifiers,
       encapContentInfo EncapsulatedContentInfo,
       certificates [0] IMPLICIT CertificateSet OPTIONAL,
       crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
       signerInfos SignerInfos }

SignerInfo ::= SEQUENCE {
       version CMSVersion,
       sid SignerIdentifier,
       digestAlgorithm DigestAlgorithmIdentifier,
       signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
       signatureAlgorithm SignatureAlgorithmIdentifier,
       signature SignatureValue,
       unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }

SignerIdentifier ::= CHOICE {
       issuerAndSerialNumber IssuerAndSerialNumber,
       subjectKeyIdentifier [0] SubjectKeyIdentifier }

The relevant definition is the one for the SignerIdentifier type. This is a union type between IssuerAndSerialNumber and SubjectKeyIdentifier. Essentially, this is saying that we can either identify the signer by means of an IssuerAndSerialNumber value, or a SubjectKeyIdentifier value. We'll talk about what each of these means in a minute, but the way they're used by the validator is more or less the same in either case: the validator searches its certificate store until it finds a certificate matching the identifier, and then tries to validate the signature against the public key found in that certificate ³ . The certificates entry would typically contain all signer's certificates (among other potentially relevant ones).

Note

It's important to observe that the sid field is not actually part of the portion of the payload that's cryptographically signed. This allows for some flexibility but is also potentially problematic in many cases. This is a large part of the raison d'être for attributes like ESS-signing-certificate.

The `IssuerAndSerialNumber` way

IssuerAndSerialNumber is defined like this:

IssuerAndSerialNumber ::= SEQUENCE {
       issuer Name,
       serialNumber CertificateSerialNumber }

CertificateSerialNumber ::= INTEGER

The meaning of IssuerAndSerialNumber as an identifier is very straightforward: it tells the validator who the issuer of the relevant certificate is, and also mentions the certificate's serial number. Since best practices dictate that a given CA should never issue more than one certificate with a given serial, this should uniquely identify the certificate.

The `SubjectKeyIdentifier` way

If you take a look at RFC 5652, you'll notice that the definition of SubjectKeyIdentifier is simply this:

SubjectKeyIdentifier ::= OCTET STRING

This doesn't say all that much. In an X.509 context, this value is intended to be compared against the value of a certificate's subject key identifier (SKI) extension ⁴ . It's wrong to expect this value to be generated by any particular algorithm, but they're generally derived from the public key by a hashing procedure. See RFC 5280, § 4.2.1.2 for examples.

The advantages of this approach are twofold:

There's no expectation of global uniqueness for subject key identifiers, so it becomes possible to have a signature validate against more than one signer's certificate. Imagine a scenario where a signer doesn't know ahead of time what certificate will be used to verify their identity.
It (theoretically) enables the use of CMS-based signatures together with non-X.509 certificates, in particular in a context where there's no Name notion around.

Of course, the vast majority of use cases don’t need this kind of flexibility. In fact, many common workflows actually benefit from having the signature pinned to one uniquely determined X.509 certificate. While the additional expressivity afforded by SubjectKeyIdentifier is in no way incompatible with any of that, sticking with IssuerAndSerialNumber is still a good default choice.

What about PDF?

The reality

The CMS specification requires validators to implement support for both alternatives (see RFC 5652, § 5.3). This requirement has been part of CMS since 2002, and since both parts of ISO 32000 normatively cite CMS for signature generation, it would seem logical for PDF signature validators to support both alternatives.

However, that's not what we see in the wild: the vast majority of implementations in major PDF processors only support identifying the signer by issuer and serial number. If interoperability is a concern, you're therefore better off generating your signatures with an IssuerAndSerialNumber in the sid field.

Some historical speculation

PDF had support for digital signatures long before it became an ISO standard, and in those times, PKCS #7 (the predecessor to CMS) was more widely known. In PKCS #7, the approach based on IssuerAndSerialNumber was the only available choice.

The current CMS definition still shows some hints of this history, as indicated by the fact that the subjectKeyIdentifier in the definition below has a context-specific tag of 0, while the issuerAndSerialNumber field is universally tagged.

SignerIdentifier ::= CHOICE {
       issuerAndSerialNumber IssuerAndSerialNumber,
       subjectKeyIdentifier [0] SubjectKeyIdentifier }

This tagging choice ensures compatibility with PKCS #7 in both directions, as long as the signer makes sure to identify themselves using the IssuerAndSerialNumber option.

TL;DR

If you care about interoperability with other PDF processors as a signer, stick to IssuerAndSerialNumber in your PDF signatures. If you're implementing a validator, support both IssuerAndSerialNumber and SubjectKeyIdentifier.

Bibliography

RFC 5652 Internet Engineering Task Force (IETF), RFC 5652: Cryptographic Message Syntax (CMS), 2009.

RFC 5126 Internet Engineering Task Force (IETF), RFC 5126 CMS Advanced Electronic Signatures (CAdES), 2008.

ETSI TS 101 733, Electronic Signatures and Infrastructures (ESI); CMS Advanced Electronic Signatures (CAdES), V2.2.1, 2013.

RFC 5280 Internet Engineering Task Force (IETF), Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile, 2008.

PKCS #7 Internet Engineering Task Force (IETF), RFC 2315 PKCS #7: Cryptographic Message Syntax, Version 1.5, 1998.

Footnotes

1: This wasn't always the case, but all non-CMS signature encodings were deprecated in ISO 32000-2. ⇐

2: CAdES (see RFC 5126; ETSI TS 101 733) requires either the ESS-signing-certificate or the ESS-signing-certificate-v2 attribute to be part of the signature's signed attributes, thus binding it to one particular signer's certificate. ⇐

3: Obviously, there's more to it than that, but that's a story for another time. ⇐

4: In the PKIX profile defined in RFC 5280, the SKI extension is optional for end-entity (i.e. non-CA) certificates. Hence, this way of identifying signers isn't universally applicable. ⇐

A version of this article was originally published at https://mvalvekens.be/blog/2021/pdf-as-she-is-wrote-1.html

Featured articles

Discover pdfa.org

Key resources

Get involved

How are signers identified?

Signer identification in CMS

Note

The `IssuerAndSerialNumber` way

The `SubjectKeyIdentifier` way

What about PDF?

The reality

Some historical speculation

TL;DR

Bibliography

Footnotes

Featured articles

Discover pdfa.org

Key resources

Get involved

How are signers identified?

Signer identification in CMS

Note

The IssuerAndSerialNumber way

The SubjectKeyIdentifier way

What about PDF?

The reality

Some historical speculation

TL;DR

Bibliography

Footnotes

The `IssuerAndSerialNumber` way

The `SubjectKeyIdentifier` way