Random numbers generation is critical to the smooth operations of modern information systems. However, it can run into pitfalls when dealing with virtual machines. This article covers the basics on random numbers generation and show you how to circumvent the problems that may arise.
Random Number Generators (RNG) and randomness
Cryptography and its many derivative products — encrypted protocols (e.g. HTTPS), SSL certificates, SSH key pairs, etc. — are highly dependent on random numbers, obtained from so-called Random Number Generators (RNG).
The most important characteristic of a random number generator is its randomness, namely its ability to deliver random numbers that are impossible to predict.
Entropy, the source of all randomness
Unfortunately, as simple as its definition is, (true) randomness is notoriously difficult to achieve. By nature, computers and their CPUs are highly deterministic machines, which is quite the opposite of what one is looking for to obtain randomness. Random number generators thus must look for entropy (randomness, disorder) elsewhere.
In physical machines, entropy can be gathered by observing the countless components that make up a computer and interact with the real (messy) world: peripherals (e.g. interrupts), CPU (e.g. context switches), memory (e.g. page faults), etc. Think of it as if you were standing in the middle of a forest, looking to, listening at, feeling, all what surrounds you.
The picture is entirely different in virtual machines, which are highly constrained environments, often dedicated to a given purpose, where almost nothing unanticipated happens (especially during the early age of their lifetime, the boot process). Think of them as if you were standing in the middle of a well-insulated, perfectly white room: no entropy to be found anywhere. The so-called state of entropy starvation.
Linux /dev/random versus /dev/urandom
The Linux kernel provides two devices which can be used (read) to obtain random numbers:
/dev/random: very high quality of randomness, extracted directly from the entropy pool (itself fed mostly by peripherals interrupts timings and block devices seek times); it will block if you request more bits than available in the entropy pool (as reported by
/dev/urandom: lesser — but still high — quality of randomness, generated by an intermediate Cryptographically (Secure) Random Number Generator (CRNG); it will block until it is properly seeded from the entropy pool (at boot time) but not afterwards (albeit with decreasing randomness quality if you read too many data, too fast, before it gets re-seeded)
The following rules and facts should thus be kept in mind:
Given its entropy-greedy and blocking nature, never use
/dev/random! (unless you’re a cryptographer and know for sure you need it)
/dev/urandomfor common cryptography purposes (such as generating SSL certificates, SSH key pairs, Diffie-Hellmann parameters, etc.)
Be aware that
/dev/urandomwill block at boot time, until enough entropy has been gathered to properly seed it (which can take several minutes in an entropy-starving virtual machine; look for the
random: crng doneor
random: crng init donemessage in the kernel logs:
Many software rely on
/dev/(u)random to achieve their purpose: systemd, SSH,
every service that uses SSL/TLS (HTTPS, SMTPS, IMAPS, POPS), Kerberos, etc.
It is thus important to make sure enough entropy is available for the kernel to
feed its entropy pool, even early in the boot process.
Entropy in virtual machines
Virtual machines commonly have two ways to combat their entropy-starving nature:
VirtIO RNG: a QEMU/KVM-emulated hardware RNG
RDRAND: hardware RNG featured by Intel (and AMD) CPUs
Using the QEMU/KVM VirtIO RNG device, you can make the
physical, entropy-rich, hypervisor emulate a hardware RNG and pass it to the
VM, where it will appear as a
Recent versions of the Linux kernel will automatically detect and use this
hardware device, as reported by
cat /sys/devices/virtual/misc/hw_random/rng_available =>
If nothing shows up, you will need to install and run the
rngd daemon to
bridge the gap.
Also, the kernel needs to be configured (and compiled) with the corresponding
driver: make sure
CONFIG_HW_RANDOM_VIRTIO is set, usually as a
/dev/hwrng device won’t appear).
VirtIO RNG is exposed in all Exoscale Compute products.
One can also rely on the RDRAND CPU feature — a Digital Random Number Generator (DRNG) introduced by Intel in the Broadwell microarchitecture then followed by AMD — and pass it to the VM vCPU, as reported by
grep rdrand /proc/cpuinfo.
Again, recent versions of the Linux kernel will automatically detect and use
this (v)CPU feature, provided is has been configured (and compiled) with
A telltale sign of RDRAND presence, trust and use is the
random: crng done
(trusting CPU's manufacturer) message in the kernel logs (
Thanks to hypervisors that are regularly updated to leverage the latest in CPU technologies, RDRAND is also available in all Exoscale Compute products.
Other sources of entropy
As we’ve just seen, good randomness depends on entropy, which itself isn’t easy to obtain. The Linux kernel does a good job of gathering entropy from readily available sources, whether they are internal (e.g. interrupts) or external (e.g. RDRAND) sources. Yet, in order to speed-up the gathering of entropy or improve its quality, one may choose to rely on additional sources, depending on their availability or adequateness.
The Havege daemon
haveged is a daemon — derived from the HAVEGE algorithm —
designed to help (the kernel) gather entropy from more sources (than the kernel
itself does). It is common to install it on physical hosts to gather entropy
faster from their entropy-rich environment. However, it is not recommended to
use it in virtual machines, since the very reasons that make them prone to
entropy starvation will hinder, if not defeat, HAVEGE (or the quality —
randomness — of the entropy it will gather).
Trusted Platform Module (TPM)
Modern laptops and server motherboards are often equipped with a Trusted Platform Module (TPM) which features its own hardware-backed random number generator.
Again, a kernel recent enough will automatically pick it up, as reported by
cat /sys/devices/virtual/misc/hw_random/rng_available =>
To prevent long delays when your virtual machine is starting up — or when performing cryptography operations (e.g. generate a SSL certificate) — make sure it can access a proper source of entropy, among those exported by the host hypervisor.
At Exoscale, both VirtIO RNG and RDRAND are passed to VMs, allowing each customer to choose freely the entropy source(s) one wishes to rely on, or which Linux distribution to use, no matter how its stock kernel is configured.
(Lack of) randomness proof
While one can prove the lack of randomness of a random number generator by observing numbers that one is able to predict, it is impossible to prove randomness itself. Complicated mathematics can give you a fair amount of belief a given RNG is up to its task but you can never say for sure. Think of it as trying to locate a black hole by looking at the light it (itself) emits; you just can’t (the very definition of a black hole being it emits no light).
Given the impossibility of proving its randomness, the introduction of the RDRAND CPU-backed random number generator — potentially back-doored at the request of some government agency — has led to a petition to remove it from Linux kernel support, which prompted a rather crisp response from kernel maintainer Linus Torvalds: “[…] we use rdrand as one of many inputs into the random pool, and we use it as a way to improve that random pool. So even if rdrand were to be back-doored by the NSA, our use of rdrand actually improves the quality of the random numbers you get from /dev/random. […]”
Although computers are highly deterministic by nature, it is still possible to
programmatically generate numbers that are unprectibable enough to be considered
random, using carefully crafted algorithms backed by serious maths: the so-called
Cryptographically-Secure Pseudo-Random Number Generators (CSPRNG). In order to
deliver good randomness, those will need to be seeded with random data from
a true random source (or as close as it comes to it, such as
ARC4(random) algorithm has long been a reference implementation of
such CSPRNG, nowadays supplanted by ChaCha20 (used in