How are signers identified?
Identifying the signer’s key in a PDF signature: how does that work?Identifying the signer’s key in a PDF signature: how does that work?
Signer identification in CMS
People who've worked with me will undoubtedly know that I'm a zealous advocate for the Cryptographic Message Syntax (CMS), defined in RFC 5652. CMS is the spiritual successor to PKCS #7, and is sometimes still referred to as such. The SignedData
type in CMS is particularly popular in all sorts of digital signing schemes, where it's used as a vehicle to encode a "raw" signature together with the relevant metadata necessary to understand it. PDF is part of the CMS club too: virtually everyone uses CMS SignedData
objects to encode signatures in PDF documents nowadays 1.
But let's not get ahead of ourselves too much: we're still talking about CMS. The design philosophy of CMS (as interpreted by yours truly — don't put too much stock in that) is one of maximal extensibility: the CMS specification itself makes relatively few assumptions about what people put in. As far as SignedData
is concerned, the specification only prescribes how to correctly encode SignedData
objects, and sets out some basic validation rules. The idea is that other standards can then profile CMS to obtain a subset that's more appropriate for their use cases.
One of the things that CMS does not do (by default) is to pin the signature to a specific certificate. This is actually a fairly common misconception. From a mathematical point of view, the signature validation procedure doesn't really care about certificates, it only needs to know the signer's public key. There can be more than one certificate made out to the same public key, for any number of reasons. The certificate is only relevant to verify the binding between a key and its owner, which is a different problem entirely. However, it's important to note that some profiles of CMS do pin the signature to a specific certificate 2. That requires a separate mechanism and the security concerns it addresses are beyond the scope of this post.
So, how does CMS identify signers? To answer that question, we need to take a look at some definitions from RFC 5652.
SignedData ::= SEQUENCE { version CMSVersion, digestAlgorithms DigestAlgorithmIdentifiers, encapContentInfo EncapsulatedContentInfo, certificates [0] IMPLICIT CertificateSet OPTIONAL, crls [1] IMPLICIT RevocationInfoChoices OPTIONAL, signerInfos SignerInfos } SignerInfo ::= SEQUENCE { version CMSVersion, sid SignerIdentifier, digestAlgorithm DigestAlgorithmIdentifier, signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL, signatureAlgorithm SignatureAlgorithmIdentifier, signature SignatureValue, unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL } SignerIdentifier ::= CHOICE { issuerAndSerialNumber IssuerAndSerialNumber, subjectKeyIdentifier [0] SubjectKeyIdentifier }
The relevant definition is the one for the SignerIdentifier
type. This is a union type between IssuerAndSerialNumber
and SubjectKeyIdentifier
. Essentially, this is saying that we can either identify the signer by means of an IssuerAndSerialNumber
value, or a SubjectKeyIdentifier
value. We'll talk about what each of these means in a minute, but the way they're used by the validator is more or less the same in either case: the validator searches its certificate store until it finds a certificate matching the identifier, and then tries to validate the signature against the public key found in that certificate 3 . The certificates
entry would typically contain all signer's certificates (among other potentially relevant ones).
Note
It's important to observe that the sid
field is not actually part of the portion of the payload that's cryptographically signed. This allows for some flexibility but is also potentially problematic in many cases. This is a large part of the raison d'être for attributes like ESS-signing-certificate
.
The IssuerAndSerialNumber
way
IssuerAndSerialNumber
is defined like this:
IssuerAndSerialNumber ::= SEQUENCE { issuer Name, serialNumber CertificateSerialNumber } CertificateSerialNumber ::= INTEGER
The meaning of IssuerAndSerialNumber
as an identifier is very straightforward: it tells the validator who the issuer of the relevant certificate is, and also mentions the certificate's serial number. Since best practices dictate that a given CA should never issue more than one certificate with a given serial, this should uniquely identify the certificate.
The SubjectKeyIdentifier
way
If you take a look at RFC 5652, you'll notice that the definition of SubjectKeyIdentifier
is simply this:
SubjectKeyIdentifier ::= OCTET STRING
This doesn't say all that much. In an X.509 context, this value is intended to be compared against the value of a certificate's subject key identifier (SKI) extension 4. It's wrong to expect this value to be generated by any particular algorithm, but they're generally derived from the public key by a hashing procedure. See RFC 5280, § 4.2.1.2 for examples.
The advantages of this approach are twofold:
- There's no expectation of global uniqueness for subject key identifiers, so it becomes possible to have a signature validate against more than one signer's certificate. Imagine a scenario where a signer doesn't know ahead of time what certificate will be used to verify their identity.
- It (theoretically) enables the use of CMS-based signatures together with non-X.509 certificates, in particular in a context where there's no
Name
notion around.
Of course, the vast majority of use cases don’t need this kind of flexibility. In fact, many common workflows actually benefit from having the signature pinned to one uniquely determined X.509 certificate. While the additional expressivity afforded by SubjectKeyIdentifier
is in no way incompatible with any of that, sticking with IssuerAndSerialNumber
is still a good default choice.
What about PDF?
The reality
The CMS specification requires validators to implement support for both alternatives (see RFC 5652, § 5.3). This requirement has been part of CMS since 2002, and since both parts of ISO 32000 normatively cite CMS for signature generation, it would seem logical for PDF signature validators to support both alternatives.
However, that's not what we see in the wild: the vast majority of implementations in major PDF processors only support identifying the signer by issuer and serial number. If interoperability is a concern, you're therefore better off generating your signatures with an IssuerAndSerialNumber
in the sid
field.
Some historical speculation
PDF had support for digital signatures long before it became an ISO standard, and in those times, PKCS #7 (the predecessor to CMS) was more widely known. In PKCS #7, the approach based on IssuerAndSerialNumber
was the only available choice.
The current CMS definition still shows some hints of this history, as indicated by the fact that the subjectKeyIdentifier
in the definition below has a context-specific tag of 0
, while the issuerAndSerialNumber
field is universally tagged.
SignerIdentifier ::= CHOICE { issuerAndSerialNumber IssuerAndSerialNumber, subjectKeyIdentifier [0] SubjectKeyIdentifier }
This tagging choice ensures compatibility with PKCS #7 in both directions, as long as the signer makes sure to identify themselves using the IssuerAndSerialNumber
option.
TL;DR
If you care about interoperability with other PDF processors as a signer, stick to IssuerAndSerialNumber
in your PDF signatures. If you're implementing a validator, support both IssuerAndSerialNumber
and SubjectKeyIdentifier
.
Bibliography
RFC 5652 Internet Engineering Task Force (IETF), RFC 5652: Cryptographic Message Syntax (CMS), 2009.
RFC 5126 Internet Engineering Task Force (IETF), RFC 5126 CMS Advanced Electronic Signatures (CAdES), 2008.
ETSI TS 101 733, Electronic Signatures and Infrastructures (ESI); CMS Advanced Electronic Signatures (CAdES), V2.2.1, 2013.
RFC 5280 Internet Engineering Task Force (IETF), Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile, 2008.
PKCS #7 Internet Engineering Task Force (IETF), RFC 2315 PKCS #7: Cryptographic Message Syntax, Version 1.5, 1998.
Footnotes
1: This wasn't always the case, but all non-CMS signature encodings were deprecated in ISO 32000-2. ⇐
2: CAdES (see RFC 5126; ETSI TS 101 733) requires either the ESS-signing-certificate
or the ESS-signing-certificate-v2
attribute to be part of the signature's signed attributes, thus binding it to one particular signer's certificate. ⇐
3: Obviously, there's more to it than that, but that's a story for another time. ⇐
4: In the PKIX profile defined in RFC 5280, the SKI extension is optional for end-entity (i.e. non-CA) certificates. Hence, this way of identifying signers isn't universally applicable. ⇐
A version of this article was originally published at https://mvalvekens.be/blog/2021/pdf-as-she-is-wrote-1.html