Random numbers generation is critical to the smooth operations of modern information systems. However, it can run into pitfalls when dealing with virtual machines. This article covers the basics on random numbers generation and show you how to circumvent the problems that may arise.

A complete description of random numbers generation

Random Number Generators (RNG) and randomness

Cryptography and its many derivative products — encrypted protocols (e.g. HTTPS), SSL certificates, SSH key pairs, etc. — are highly dependent on random numbers, obtained from so-called Random Number Generators (RNG).

The most important characteristic of a random number generator is its randomness, namely its ability to deliver random numbers that are impossible to predict.

Entropy, the source of all randomness

Unfortunately, as simple as its definition is, (true) randomness is notoriously difficult to achieve. By nature, computers and their CPUs are highly deterministic machines, which is quite the opposite of what one is looking for to obtain randomness. Random number generators thus must look for entropy (randomness, disorder) elsewhere.

In physical machines, entropy can be gathered by observing the countless components that make up a computer and interact with the real (messy) world: peripherals (e.g. interrupts), CPU (e.g. context switches), memory (e.g. page faults), etc. Think of it as if you were standing in the middle of a forest, looking to, listening at, feeling, all what surrounds you.

The picture is entirely different in virtual machines, which are highly constrained environments, often dedicated to a given purpose, where almost nothing unanticipated happens (especially during the early age of their lifetime, the boot process). Think of them as if you were standing in the middle of a well-insulated, perfectly white room: no entropy to be found anywhere. The so-called state of entropy starvation.

Linux /dev/random versus /dev/urandom

The Linux kernel provides two devices which can be used (read) to obtain random numbers:

  • /dev/random: very high quality of randomness, extracted directly from the entropy pool (itself fed mostly by peripherals interrupts timings and block devices seek times); it will block if you request more bits than available in the entropy pool (as reported by cat /proc/sys/kernel/random/entropy_avail)

  • /dev/urandom: lesser — but still high — quality of randomness, generated by an intermediate Cryptographically (Secure) Random Number Generator (CRNG); it will block until it is properly seeded from the entropy pool (at boot time) but not afterwards (albeit with decreasing randomness quality if you read too many data, too fast, before it gets re-seeded)

The following rules and facts should thus be kept in mind:

  1. Given its entropy-greedy and blocking nature, never use /dev/random! (unless you’re a cryptographer and know for sure you need it)

  2. Always use /dev/urandom for common cryptography purposes (such as generating SSL certificates, SSH key pairs, Diffie-Hellmann parameters, etc.)

  3. Be aware that /dev/urandom will block at boot time, until enough entropy has been gathered to properly seed it (which can take several minutes in an entropy-starving virtual machine; look for the random: crng done or random: crng init done message in the kernel logs: dmesg or /var/log/kern.log)

Many software rely on /dev/(u)random to achieve their purpose: systemd, SSH, every service that uses SSL/TLS (HTTPS, SMTPS, IMAPS, POPS), Kerberos, etc. It is thus important to make sure enough entropy is available for the kernel to feed its entropy pool, even early in the boot process.

Entropy in virtual machines

Virtual machines commonly have two ways to combat their entropy-starving nature:

  • VirtIO RNG: a QEMU/KVM-emulated hardware RNG

  • RDRAND: hardware RNG featured by Intel (and AMD) CPUs

VirtIO RNG

Using the QEMU/KVM VirtIO RNG device, you can make the physical, entropy-rich, hypervisor emulate a hardware RNG and pass it to the VM, where it will appear as a /dev/hwrng.

Recent versions of the Linux kernel will automatically detect and use this hardware device, as reported by cat /sys/devices/virtual/misc/hw_random/rng_available => virtio_rng. If nothing shows up, you will need to install and run the rngd daemon to bridge the gap.

Also, the kernel needs to be configured (and compiled) with the corresponding driver: make sure CONFIG_HW_RANDOM_VIRTIO is set, usually as a module (or the /dev/hwrng device won’t appear).

VirtIO RNG is exposed in all Exoscale Compute products.

RDRAND

One can also rely on the RDRAND CPU feature — a Digital Random Number Generator (DRNG) introduced by Intel in the Broadwell microarchitecture then followed by AMD — and pass it to the VM vCPU, as reported by grep rdrand /proc/cpuinfo.

Again, recent versions of the Linux kernel will automatically detect and use this (v)CPU feature, provided is has been configured (and compiled) with CONFIG_RANDOM_TRUST_CPU set y.

A telltale sign of RDRAND presence, trust and use is the random: crng done (trusting CPU's manufacturer) message in the kernel logs (dmesg or /var/log/kern.log).

Thanks to hypervisors that are regularly updated to leverage the latest in CPU technologies, RDRAND is also available in all Exoscale Compute products.

Other sources of entropy

As we’ve just seen, good randomness depends on entropy, which itself isn’t easy to obtain. The Linux kernel does a good job of gathering entropy from readily available sources, whether they are internal (e.g. interrupts) or external (e.g. RDRAND) sources. Yet, in order to speed-up the gathering of entropy or improve its quality, one may choose to rely on additional sources, depending on their availability or adequateness.

The Havege daemon

haveged is a daemon — derived from the HAVEGE algorithm — designed to help (the kernel) gather entropy from more sources (than the kernel itself does). It is common to install it on physical hosts to gather entropy faster from their entropy-rich environment. However, it is not recommended to use it in virtual machines, since the very reasons that make them prone to entropy starvation will hinder, if not defeat, HAVEGE (or the quality — randomness — of the entropy it will gather).

Trusted Platform Module (TPM)

Modern laptops and server motherboards are often equipped with a Trusted Platform Module (TPM) which features its own hardware-backed random number generator.

Again, a kernel recent enough will automatically pick it up, as reported by cat /sys/devices/virtual/misc/hw_random/rng_available => tpm-rng.

Conclusion

To prevent long delays when your virtual machine is starting up — or when performing cryptography operations (e.g. generate a SSL certificate) — make sure it can access a proper source of entropy, among those exported by the host hypervisor.

At Exoscale, both VirtIO RNG and RDRAND are passed to VMs, allowing each customer to choose freely the entropy source(s) one wishes to rely on, or which Linux distribution to use, no matter how its stock kernel is configured.

Afterwords

(Lack of) randomness proof

While one can prove the lack of randomness of a random number generator by observing numbers that one is able to predict, it is impossible to prove randomness itself. Complicated mathematics can give you a fair amount of belief a given RNG is up to its task but you can never say for sure. Think of it as trying to locate a black hole by looking at the light it (itself) emits; you just can’t (the very definition of a black hole being it emits no light).

RDRAND polemic

Given the impossibility of proving its randomness, the introduction of the RDRAND CPU-backed random number generator — potentially back-doored at the request of some government agency — has led to a petition to remove it from Linux kernel support, which prompted a rather crisp response from kernel maintainer Linus Torvalds: “[…] we use rdrand as one of many inputs into the random pool, and we use it as a way to improve that random pool. So even if rdrand were to be back-doored by the NSA, our use of rdrand actually improves the quality of the random numbers you get from /dev/random. […]”

C(SP)RNG

Although computers are highly deterministic by nature, it is still possible to programmatically generate numbers that are unprectibable enough to be considered random, using carefully crafted algorithms backed by serious maths: the so-called Cryptographically-Secure Pseudo-Random Number Generators (CSPRNG). In order to deliver good randomness, those will need to be seeded with random data from a true random source (or as close as it comes to it, such as /dev/random). The ARC4(random) algorithm has long been a reference implementation of such CSPRNG, nowadays supplanted by ChaCha20 (used in /dev/urandom).