Software signature tutorial, from sources to Docker images

Developers or not, we all use software we didn’t develop ourselves. In the professional context, this requires some measure of trust: running untrusted software is a huge risk, as it can do harm, from spreading viruses to misusing the business processes e.g. wiring money to an unintended bank account.

Given that nowadays software distribution is mostly immaterial, as software in all forms is downloaded over the Internet, there is the need to find a way to replace the good old holographic seal we had once on CD-ROMs with an equivalent digital proof of trustworthiness.

Without this proof we must assume anyone could potentially have tampered the software, by adding/removing/updating both code and configuration: this is known as the man-in-the-middle attack (aka MITM).

This is true for source code as well for code libraries or final executables. One may use a specific dependency in a NodeJS application, that unbeknownst to anyone will hijack the running computer’s CPU to mine cryptocurrencies, or downright steal BitCoins from a wallet.

Digital signatures are a solution to such issues. A signature is a seal of sort on the software: it verifies the author’s identity and ensures that the code has not been changed or corrupted since it was signed by the author, ensuring authentication and integrity.

The generation of this digital seal is a process that only the author can reproduce, and is based on the very same principles that govern TLS or SSH.

We are going to recap some cryptography principles, take a dive into the digital signature process and see how to ensure trust in both code and binaries, from sources to the final Docker container. We will use GnuPrivacyGuard alias GPG, and we will assume you already are familiar with how it works and have a keypair at hand.

Digital software signature

Principles of cryptography

Data encryption using keys offers two different approaches. Before diving right into the signing process itself, here is a lightweight refresher to cryptography if needed.

Symmetric cryptography summary

One can use the same key to both encrypt and decrypt the data. For example, a letter can be encrypted by replacing it with another letter which index in the alphabet has been moved by X This particular method is famously known as the Caesar’s cipher. To decrypt it, the only necessary information is X - just move the position back by X. X is the key, and it’s used during both encryption and decryption. This common key principle is behind what is known as symmetric cryptography, and it’s what most people have in mind when they think about encryption.

The biggest issue with symmetric cryptography arises during key sharing. For example, Alice encrypts a message, and sends it in encrypted form to Bob. Alice needs to send the key to Bob as well. Because of MITM attacks, she should also encrypt the key, using another key. But how should she send this other key? This is a bottomless issue.

Asymmetric cryptography summary

To cope with that, an alternative approach is to decouple the encryption from the decryption by using different keys. This is known as asymmetric cryptography, or public key cryptography.

Though the mathematical concepts and proofs behind it are complex, the principle in itself is quite straightforward. From a bird’s-eyes view:

Alice creates a private key
She then proceeds to generate a public key out of the private key. Because of the tight relationship between the public key and the private key it has been generated from, only the private key is able to decrypt messages encrypted with the public key.
Alice makes the public key available to everyone
Bob uses the public key to encrypt the message
The encrypted message is delivered to Alice, potentially through unsafe channels. If the message is intercepted by third-parties, it cannot be decrypted because the attacker has no access to the private key
Alice uses the private key to decrypt the message

That’s all! Now, let’s go back to digital signatures.

Principles of digital signatures and software signing

The same principle of asymmetric cryptography can be used for software signing. With encryption, only the desired recipient can read the data. On the opposite, signing means that though everybody can read the data, its source can safely be proven to be the designated emitter only.

However, the public key and the private key have opposite roles, and the process is different: the private key encrypts and the public key decrypts.

To get started with software signature, a basic setup should be executed once, similarly to what you would do for most other cryptographical setups:

The trust authority (or to be more simple the software author) creates a private key, using one of the many tools available. This private key needs to be stored in a safe place, since anybody who has access to the private key can impersonate the trust authority
It then generates a public key out of this private key
The public key is made available to the general public e.g. in a public key registry, so that everybody interested can access it

Now, for every different piece of data that has to be signed, the trust authority proceeds to create a signature encrypting a one-way hash of the software with its private key.

The encrypted hash is the bundled along the code, and since the public key is shared in the public domain, everyone can now verify the software-signature pair, ensuring the data and its source are genuine.

Digital signature of source code

Before signing the software we provide as a whole, it’s necessary to ensure the source code itself is trusted.

In structured work environments with multiple developers working on a single project, many are the opportunities for a MITM, from gaining access to a machine with write access to a git repository to malicious impersonation of a colleague.

With Git (or any other Control Version Software), it’s very simple to impersonate someone else. With the following commands, everyone could e.g. commit under the author’s name:

$ git config user.name "Nicolas Frankel"
$ git config user.email "nicolas.frankel@exoscale.com"

It is child’s play to spoof commits pretending to be someone else.

There are several ways to prevent malicious code from finding its way into code repositories: screening developers, limiting commit rights to a limited set of them, code reviews, etc.

But in every case, there’s still a slight chance such malicious code might end up in the final software anyway. The first step to prevent that kind of impersonation is to sign the source code.

Choosing between signing tags or single commits

Git allows to sign both tags and commits, ensuring a way for the committer to certify the code is his, pinpointing the ownership of the source code.

The most common approach is to sign tags only: that generates a signature based on all previous commits, meaning a previous commit cannot be changed without invalidating the signature.

Of course is important to remember that there’s no guarantee a rogue commit could have slipped in unnoticed. Code review assumes once more an important control role, and so does privileging SSH as the transport mechanism to actually push your commit to the remote repository.

Signing all commits on the other hand implicitly means signing tags as well, as those are only pointers to a specific commit. Human error plays a big role in this approach, as a developer might simply forget to sign a commit. This lends usually to trying to automate the commit signature, resulting in even more threats: if the working machine is compromised, with an automated signature process or an active and loaded agent, malicious code could be easily introduced in a signed commit.

Signing tags

Signing tags with Git requires just a few easy steps.

Configure Git to use your private GPG key, using its hash ID:

$ git config --global user.signingkey 2AA100494BD5A842A1E516F7F606480170C56BED

Then, to actually sign the tag:
```
$ git tag -s v0.1 -m 'my signed tag'
```
TTY not defined

In case the following error happens:
```
error: gpg failed to sign the data
error: unable to sign the tag
```
You need to export the GPG TTY to the actual one:
```
$ export GPG_TTY=$(tty)
```

Finally, to make sure the tag has been signed:

$ git show v0.1

Tagger: Nicolas Frankel <nicolas.frankel@exoscale.com>
Date:   Tue Nov 13 13:16:02 2018 +0100

My signed tag
-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEEKqEASUvVqEKh5Rb39gZIAXDFa+0FAlvqwIIACgkQ9gZIAXDF
a+2HVAgAnc2VznE95UftPbZtRBj24NUFI55y78lWVoO6fy5+jb8VmHyVP8o0N0QM
me9DjiucLE8jc5kigtJ2a+rarKXcV4AR8/5UDxAzHJV2jqhWgKLW7xt4t1giyeaN
a1ac+GbuZlEXSBOaoijMjiUceuu//6nOxGo/aoKhLCQfxTWoDla333Z5z6bGOrie
WNLcj08KC6hM2JNQ7EUndVuKYqczVGsGYlaJQyEBIfXnX5GfiIge3uCvSUCz9pyy
kaYxLRUlGLRtP06VMU9pfS4lgIm8RzSL6R0udhKJ8ciwezpPZ87U/9cAWgYxO1hw
bleiQHPvQ3eN9qmmMrE/N/L2WnpyUg==
=z6TR
-----END PGP SIGNATURE-----

commit 266b11fc00ded7711a436574d29a09e31e86dc58 (HEAD -> master, tag: v0.1)
Author: Nicolas Frankel <nicolas.frankel@exoscale.com>
Date:   Tue Nov 13 13:09:06 2018 +0100

    Initial commit

diff --git a/foo b/foo
new file mode 100644
index 0000000..e69de29

Verifying tags signatures

Everyone should be able to check if the tag signature is valid and who signed it. Assuming you have the signer’s public key in your GPG keyring:

$ git tag -v v0.1

object 266b11fc00ded7711a436574d29a09e31e86dc58
type commit
tag v0.1
tagger Nicolas Frankel <nicolas.frankel@exoscale.com> 1542111362 +0100

My signed tag
gpg: Signature made Tue Nov 13 13:16:02 2018 CET
gpg:                using RSA key 2AA100494BD5A842A1E516F7F606480170C56BED
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: next trustdb check due at 2020-11-12
gpg: Good signature from "Nicolas Frankel <nicolas.frankel@exoscale.com>" [ultimate]

This approach is especially effective when integrated in the build step of a Continuous Integration pipeline, in order to automatically ensure only sources signed by trusted team members are accepted.

Unfortunately At the time of the writing of this post, neither Jenkins nor GitLab CI provide a feature to check the GPG signature, and the implementation needs to be custom-scripted depending on the Continuous Integration platform you choose.

Automated dependencies checks

Sources are probably not the only building blocks of one’s software: except in very specific environments, developers make use of libraries and/or frameworks.

Dependencies you wish to use in your software need the same grade of trustworthiness, and need to be checked carefully.

As above, JARs will be used in the following section, but the approach is similar for other technology stacks.

To check the signing of dependencies, Maven provides the org.simplify4u.plugins:pgpverify-maven-plugin plugin. It can be used either:

At discrete points in time, by using the following command-line instruction:
```
mvn org.simplify4u.plugins:pgpverify-maven-plugin:check
```
Or automatically be executed in the build process, by configuring the build accordingly.

For example, the following is an excerpt from a Maven POM that checks the signatures of dependencies:

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- ... -->
  <build>
    <plugins>
      <plugin>
        <groupId>org.simplify4u.plugins</groupId>
        <artifactId>pgpverify-maven-plugin</artifactId>
        <version>1.2.0</version>
        <executions>
          <execution>
            <goals>
              <goal>check</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

At this point, any Maven lifecycle invocation (e.g. mvn compile) will trigger the check goal. From a SecDevOps standpoint, this is to be preferred to the command-line approach. It yields the following:

[INFO] Scanning for projects...
[INFO] 
[INFO] ----------------< com.exoscale.signing:sign-everything >----------------
[INFO] Building sign-everything 0.0.1-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- pgpverify-maven-plugin:1.2.0:check (default) @ sign-everything ---
[INFO] org.springframework:spring-jcl:pom:5.1.2.RELEASE PGP Signature OK
       KeyId: 0x9A2C7A98E457C53D UserIds: [Spring Buildmaster <buildmaster@springframework.org>]
[INFO] org.springframework.boot:spring-boot-test-autoconfigure:jar:2.1.0.RELEASE PGP Signature OK
       KeyId: 0x9A2C7A98E457C53D UserIds: [Spring Buildmaster <buildmaster@springframework.org>]
[INFO] net.minidev:json-smart:jar:2.3 PGP Signature OK
       KeyId: 0xF6BC09712C8DF6EC UserIds: [Uriel Chemouni (dev) <uchemouni@gmail.com>]
[INFO] org.springframework:spring-test:pom:5.1.2.RELEASE PGP Signature OK
       KeyId: 0x9A2C7A98E457C53D UserIds: [Spring Buildmaster <buildmaster@springframework.org>]
[INFO] Receive key: https://hkps.pool.sks-keyservers.net/pks/lookup?op=get&options=mr&search=0x4C5EED3C53B75933
  to /Users/nico/.m2/repository/pgpkeys-cache/4C/5E/4C5EED3C53B75933.asc
[INFO] org.skyscreamer:jsonassert:jar:1.5.0 PGP Signature OK
       KeyId: 0x4C5EED3C53B75933 UserIds: [Carter Page (Signing key for Yoga) <carter@skyscreamer.org>]
...

Digital signature of final artifacts

Making sure source code come from a trusted party is a great first step in enforcing a trust chain, but is visibly more valuable to the software provider than to the clients, as the latter tend to use a packaged form of the software itself rather than the source code. The trust needs to be conveyed to the output artifact by signing it as well.

We made sure both sources and dependencies come from trusted parties, and artifacts being the result of compiled sources and dependencies, they transitively should be trusted too. Nevertheless, they should be signed as well: software distribution is the most user-facing part of the entire process, and usually the one where the end user needs to ensure the authenticity and identity of the publisher he trust. Most of the final users of our software will trust this final signature in deciding to install and use your software.

Using GPG, it’s trivial to sign a file. The following applies to any artifact, regardless of the technology stack e.g. Python eggs, Ruby gems, NPM packages, Java JARs, etc. As an example, let’s use it to sign a common Java binary, a JAR:

$ gpg -ab target/sign-everything-0.0.1-SNAPSHOT.jar

This creates an additional sign-everything-0.0.1-SNAPSHOT.jar.asc file, which includes a key as clear-signed text, which can then be verified using GPG.

Likewise, it’s simple to verify the signature validity and signing authority:

$ gpg --verify target/sign-everything-0.0.1-SNAPSHOT.jar.asc \
               target/sign-everything-0.0.1-SNAPSHOT.jar

gpg: Signature made Tue Nov 13 16:49:29 2018 CET
gpg:                using RSA key 2AA100494BD5A842A1E516F7F606480170C56BED
gpg: Good signature from "Nicolas Frankel <nicolas.frankel@exoscale.com>" [ultimate]

Automated artifact signature: an example with Maven

The signing of the final artifact itself can be automated as part of the build process. The following is a Maven POM snippet that will automatically sign every artifact using our GPG default key:

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- ... -->
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-gpg-plugin</artifactId>
        <version>1.6</version>
        <executions>
          <execution>
            <id>sign-artifacts</id>
            <phase>package</phase>
            <goals>
              <goal>sign</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

Now, when actually building the package with mvn package, it will ask for the private key passphrase. While this seems to indicate there can be no automated build pipeline, there are a couple of solutions:

Write the passphrase in the POM

Just don’t! Remember the passphrase is a secret, treat it as such: it shouldn’t be disclosed to everyone, even within the limits of the organization.
Pass the passphrase on the command-line
```
$ mvn package -Dgpg.passphrase=dontDoThatEither
```
Passing the passphrase on the command-line is a bit of an improvement, but not that much: it now risks to appear in the bash history, and chances are it may also appear in the Continuous Integration build configuration.

Store the passphrase in the user’s Maven settings file

This should be the preferred solution, as Maven allows to encrypt passwords to improve security.

In a basic implementation form, the file is ~/.m2/settings.xml, and the passphrase should be located in a server section explicitly named gpg.passphrase.

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                        https://maven.apache.org/xsd/settings-1.0.0.xsd">
  <servers>
    <server>
      <id>gpg.passphrase</id>
      <passphrase>thisShouldBeEncrypted</passphrase>
    </server>
  </servers>
</settings>

Digital signature of Docker images

Nowadays, it’s far from uncommon to distribute software through Docker images. Instead of providing a technology-specific artifact and writing a lot of documentation on how to run it, it makes sense to bundle both the artifact and the platform it runs on in a container. For example, after building the above JAR, it will be distributed in the image along with the Java Virtual Machine, and their respective configurations.

Some companies use this approach for their entire portfolio, e.g. Elastic provides Docker images for: Elasticsearch (Java-based), Filebeat (Go-based), Logstash (Ruby-based), etc.

If the Docker image is what is distributed in the end, it makes sense to sign it, in addition to (or in replacement of) the signing of the artifact. To achieve that, Docker provides Docker Content Trust (DCT):

Docker Content Trust (DCT) allows operations with a remote Docker registry to enforce client-side signing and verification of image tags. DCT provides the ability to use digital signatures for data sent to and received from remote Docker registries. These signatures allow client-side verification of the integrity and publisher of specific image tags.

Once DCT is enabled, image publishers can sign their images. Image consumers can ensure that the images they use are signed.

To sum up, enabling DCT prevents:

Building an image, unless the parent image itself is signed (or is scratch)
Running a container of an unsigned image

Enabling Docker Content Trust

There are two ways to enable DCT:

By far the easiest way is to set the DOCKER_CONTENT_TRUST environment variable:
```
export DOCKER_CONTENT_TRUST=1
```
In the docker daemon configuration file. The file location is dependent on one’s operating system. On Linux distributions, it’s /etc/docker/daemon.json. On OSX, it’s available in the GUI.

Here’s a sample configuration snippet:

{
    ...
    "content-trust": {
        "trust-pinning": {
            "root-keys": {                                       <1>
                "exoscale.com/exoscale/*": ["key_1"],            <2>
                "exoscale.com/exoscale/repo": ["key_2", "key_3"] <3>
            },
            "official-images": true,
        },
        "mode": "enforced",                                      <4>
        "allow-expired-trust-cache": true,                       <5>
    }
}

Set the root key(s)
A root key can be set on all repositories of a registry
Or on a specific repository, with a fallback mechanism if a repository is not explicitly listed
Modes include: disabled for no signing/verification, enforced to disallow unsafe operations (see above), and permissive to log them only
Whether to allow image validation with offline metadata. Useful if the machine is offline for some reason

Using Docker Content Trust

The most important thing to remember about DCT is that only tags are signed (and verified).

Hence, the following command doesn’t trigger DCT because the push doesn’t set a tag explicitly. Check the last line of the output.

$ docker push nfrankel/signeverything

The push refers to repository [docker.io/nfrankel/signeverything]
3142de4b3a79: Pushed 
d6feba3b416d: Pushed 
9bca1faaa73e: Mounted from library/openjdk 
0c3170905795: Mounted from library/maven 
df64d3292fd6: Mounted from docker/dtr 
latest: digest: sha256:f5d0a195b629d650e55a644f0f7e6065f51cde7abb215d057361b07a21114259 size: 1364
No tag specified, skipping trust metadata push

Note that, as stated above, only tags might be signed. This means a specific image can have multiple tags pointing to it, some of them signed, and some not.

If DCT is enabled and a tag is set for the push, then the signing process kicks in:

$ docker push nfrankel/signeverything:latest

The push refers to repository [docker.io/nfrankel/signeverything]
3142de4b3a79: Layer already exists 
d6feba3b416d: Layer already exists 
9bca1faaa73e: Layer already exists 
0c3170905795: Layer already exists 
df64d3292fd6: Layer already exists 
latest: digest: sha256:f5d0a195b629d650e55a644f0f7e6065f51cde7abb215d057361b07a21114259 size: 1364
Signing and pushing trust metadata
You are about to create a new root signing key passphrase. This passphrase
will be used to protect the most sensitive key in your signing system. Please
choose a long, complex passphrase and be careful to keep the password and the
key file itself secure and backed up. It is highly recommended that you use a
password manager to generate the passphrase and keep it safe. There will be no
way to recover this key. You can find the key in your config directory.
Enter passphrase for new root key with ID ee7b795: 
Repeat passphrase for new root key with ID ee7b795: 
Enter passphrase for new repository key with ID 1d15dea: 
Repeat passphrase for new repository key with ID 1d15dea: 
Finished initializing "docker.io/nfrankel/signeverything"
Successfully signed docker.io/nfrankel/signeverything:latest

The first time Docker uses DCT within a push, it will follow a procedure to create the keys - the root key, and the repo key, and immediately use them to sign the tagged image.

Verifying Docker images signatures

Once DCT has been enabled, the Docker daemon will automatically prevent pulling unsigned images:

$ docker pull nfrankel/simplelog:2

docker: Error: remote trust data does not exist for docker.io/nfrankel/simplelog: notary.docker.io does not have trust data for docker.io/nfrankel/simplelog

The same goes for running, i.e. if the image is already present locally:

$ docker run nfrankel/simplelog:2

Error: remote trust data does not exist for docker.io/nfrankel/simplelog: notary.docker.io does not have trust data for docker.io/nfrankel/simplelog

However, if the tag has been signed at the time the image was built, everything runs smoothly:

$ docker run nfrankel/signeverything:latest

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.0.RELEASE)

2018-12-10 09:39:51.758  INFO 1 --- [           main] c.e.s.e.SignEverythingApplicationKt      : Starting SignEverythingApplicationKt v0.0.1-SNAPSHOT on bfa0596d18db with PID 1 (/app/sign-everything-0.0.1-SNAPSHOT.jar started by root in /app)
2018-12-10 09:39:51.762  INFO 1 --- [           main] c.e.s.e.SignEverythingApplicationKt      : No active profile set, falling back to default profiles: default
2018-12-10 09:39:52.678  INFO 1 --- [           main] c.e.s.e.SignEverythingApplicationKt      : Started SignEverythingApplicationKt in 1.444 seconds (JVM running for 2.47)
Hello world!

Conclusion

While it’s impossible to completely remove the risk of malicious code in software, it’s possible to reduce it drastically by signing software. In order to achieve this goal, the signing process should be part of the delivery pipeline: the source code should be signed, then the binary, and if Docker is used as the delivery channel, images should be signed as well.

Delivering signed software, as well as consuming signed software, both require an additional effort, as well as a constant discipline. Still, this is one of the main way to lower risks related to security inside any organization, and across them.