Stephan Mueller
2016-Jul-29 17:03 UTC
getrandom waits for a long time when /dev/random is insufficiently read from
Am Freitag, 29. Juli 2016, 10:14:07 CEST schrieb Alex Xu: Hi Alex,> On Fri, 29 Jul 2016 15:12:30 +0200 > > Stephan Mueller <smueller at chronox.de> wrote as excerpted: > > Am Freitag, 29. Juli 2016, 09:03:45 CEST schrieb Alex Xu: > > > In my opinion, assuming I am not doing something terribly wrong, > > > this constitutes a bug in the kernel's handling of getrandom calls > > > at boot, possibly only when the primary source of entropy is > > > virtio. > > > > Nope, I do not think that this is true: > > > > - /dev/random returns one byte for one byte of entropy received, but > > it has a lower limit of 64 bits > > > > - getrandom behaves like /dev/urandom (i.e. nonblocking) except > > during boot where it waits until the RNG has collected 128 bits > > before operating like a DRNG that is seeded once in a while when > > entropy comes in. > > > > > > Ciao > > Stephan > > I don't follow. Assuming you are correct and this is the issue, then > reading 128 bits (16 bytes) from /dev/random should "exhaust the > supply" and then both reads from /dev/random and calling getrandom > should block.You assume that getrandom works like /dev/random. This is not the case. It is a full deterministic RNG like /dev/urandom (which is seeded during its operation as entropy is available). getrandom *only* differs from /dev/*u*random in that it waits initially such that the system collected 128 bits of entropy. But you point to a real issue: when /dev/random is pulled before getrandom (and yet insufficient entropy is present), then the getrandom call will be woken up when the input_pool received 128 bits. But those 128 bits are fed from the input_pool to the blocking_pool based on the caller at the /dev/ random device. This implies that the reader for getrandom will NOT be able to obtain data from the input_pool and the nonblocking_pool because the transfer operation will not succeed. This implies that the nonblocking_pool remains unseeded and yet getrandom returns data to the caller.> > That, however, is not the behavior I observed, which is that reading > any amount from /dev/random will never block (since it is fed > from /dev/urandom on the host side) whereas calling getrandom will > always block unless /dev/random is read from first.That is a different issue that I did not read from your initial explanation. I need to look into it a bit deeper.> > Moreover, as long as virtio-rng is available (and fed > from /dev/urandom), /proc/sys/kernel/random/entropy_avail is always 961 > immediately after booting, which is more than enough to satisfy a > one-byte read. After reading 1 byte, the estimate decreases to 896 or > 897, but after reading 29 more bytes it increases to 1106. > > Again, these observations are consistent with the conjecture that the > issue arises since virtio-rng is a "pull" source of entropy whereas > most other methods (e.g. interrupt timing) are "push" sources. I > suspect that a similar issue occurs if RDRAND is the only source of > entropy. > > I also tried running rngd in the guest which resolved the issue but > seems entirely stupid to me, even moreso since > http://rhelblog.redhat.com/2015/03/09/red-hat-enterprise-linux-virtual-machi > nes-access-to-random-numbers-made-easy/ says that "The use of rngd is now > not required and the guest kernel itself fetches entropy from the host when > the available entropy falls below a specific threshold.".right -- the kernel has now an in-kernel link that makes rngd superflowous in this case. Ciao Stephan
Alex Xu
2016-Jul-29 17:31 UTC
getrandom waits for a long time when /dev/random is insufficiently read from
On Fri, 29 Jul 2016 19:03:51 +0200 Stephan Mueller <smueller at chronox.de> wrote as excerpted:> Am Freitag, 29. Juli 2016, 10:14:07 CEST schrieb Alex Xu: > > I don't follow. Assuming you are correct and this is the issue, then > > reading 128 bits (16 bytes) from /dev/random should "exhaust the > > supply" and then both reads from /dev/random and calling getrandom > > should block. > > You assume that getrandom works like /dev/random. This is not the > case. It is a full deterministic RNG like /dev/urandom (which is > seeded during its operation as entropy is available).My understanding was that all three methods of obtaining entropy from userspace all receive data from the CSPRNG in the kernel, and that the only difference is that /dev/random and getrandom may block depending on the kernel's estimate of the currently available entropy.> getrandom *only* differs from /dev/*u*random in that it waits > initially such that the system collected 128 bits of entropy.I agree, this is the documented behavior of getrandom.> But you point to a real issue: when /dev/random is pulled before > getrandom (and yet insufficient entropy is present), then the > getrandom call will be woken up when the input_pool received 128 > bits. But those 128 bits are fed from the input_pool to the > blocking_pool based on the caller at the /dev/ random device. This > implies that the reader for getrandom will NOT be able to obtain data > from the input_pool and the nonblocking_pool because the transfer > operation will not succeed. This implies that the nonblocking_pool > remains unseeded and yet getrandom returns data to the caller.I don't understand what this means. For my use case, hwrng is fed from the host's urandom, so none of /dev/random, /dev/hwrng, /dev/urandom, or getrandom with any flags in the guest should ever block except possibly for very large amounts requested (megabytes at least).> > That, however, is not the behavior I observed, which is that reading > > any amount from /dev/random will never block (since it is fed > > from /dev/urandom on the host side) whereas calling getrandom will > > always block unless /dev/random is read from first. > > That is a different issue that I did not read from your initial > explanation. > > I need to look into it a bit deeper.I have been trying to explain the same problem the entire time. Let me be clear what the problem is as I see it: When qemu is started with -object rng-random,filename=/dev/urandom, and immediately (i.e. with no initrd and as the first thing in init): 1. the guest runs dd if=/dev/random, there is no blocking and tons of data goes to the screen. the data appears to be random. 2. the guest runs getrandom with any requested amount (tested 1 byte and 16 bytes) and no flags, it blocks for 90-110 seconds while the "non-blocking pool is initialized". the returned data appears to be random. 3. the guest runs getrandom with GRND_RANDOM with any requested amount, it returns the desired amount or possibly less, but in my experience at least 10 bytes. the returned data appears to be random. I believe that the difference between cases 1 and 2 is a bug, since based on my previous statement, in this scenario, getrandom should never block.
Theodore Ts'o
2016-Jul-30 22:09 UTC
getrandom waits for a long time when /dev/random is insufficiently read from
On Fri, Jul 29, 2016 at 01:31:14PM -0400, Alex Xu wrote:> > My understanding was that all three methods of obtaining entropy from > userspace all receive data from the CSPRNG in the kernel, and that the > only difference is that /dev/random and getrandom may block depending > on the kernel's estimate of the currently available entropy.This is incorrect. /dev/random is a legacy interface which dates back to a time when people didn't have as much trust in the cryptographic primitives --- when there was concerns that the NSA might have put a back-door into SHA-1, for example. (As it turns out; we were wrong. NSA put the back door into Dual EC DRBG.) So it uses a strategy of an extremely conservative entropy estimator, and will allow N bytes to be /dev/random pool as the entropy estimator believes that it has gathered at least N bytes of entropy from environmental noise. /dev/urandom uses a different output pool from /dev/random (the random and urandom pools both draw from an common input pool). Originally the /dev/urandom pool drew from the input pool as needed, but it wouldn't block if there was insufficient entropy. Over time, it now has limits about how quickly it can draw from the input pool, and it behaves more and more like a CSPRNG. In fact, in the most recent set of patches which Linus has accepted for v4.8-rc1, the urandom pool has been replaced by an actual CSPRNG using the ChaCha-20 stream cipher. The getrandom(2) system call uses the same output pool (4.7 and earlier) or CSPRG (starting with v4.8-rc1) as /dev/urandom. The big difference is that it blocks until we know for sure that the output pool or CSRPNG has been seeded with 128 bits of entropy. We don't do this with /dev/urandom for backwards compatibility reasons. (For example, if we did make /dev/urandom block until it was seeded, it would break systemd, because systemd and progams run by systemd draws from /dev/urandom before it has been initialized, and if /dev/urandom were to block, the boot would hang, and with the system quiscient, we wouldn't get much environmental noise, and the system would hang hours.)> When qemu is started with -object rng-random,filename=/dev/urandom, and > immediately (i.e. with no initrd and as the first thing in init): > > 1. the guest runs dd if=/dev/random, there is no blocking and tons of > data goes to the screen. the data appears to be random. > > 2. the guest runs getrandom with any requested amount (tested 1 byte > and 16 bytes) and no flags, it blocks for 90-110 seconds while the > "non-blocking pool is initialized". the returned data appears to be > random. > > 3. the guest runs getrandom with GRND_RANDOM with any requested amount, > it returns the desired amount or possibly less, but in my experience at > least 10 bytes. the returned data appears to be random. > > I believe that the difference between cases 1 and 2 is a bug, since > based on my previous statement, in this scenario, getrandom should > never block.This is correct; and it has been fixed in the patches in v4.8-rc1. The patch which fixes this has been marked for backporting to stable kernels: commit 3371f3da08cff4b75c1f2dce742d460539d6566d Author: Theodore Ts'o <tytso at mit.edu> Date: Sun Jun 12 18:11:51 2016 -0400 random: initialize the non-blocking pool via add_hwgenerator_randomness() If we have a hardware RNG and are using the in-kernel rngd, we should use this to initialize the non-blocking pool so that getrandom(2) doesn't block unnecessarily. Cc: stable at kernel.org Signed-off-by: Theodore Ts'o <tytso at mit.edu> Basically, the urandom pool (now CSRPNG) wasn't getting initialized from the hardware random number generator. Most people didn't notice because very few people actually *use* hardware random number generators (although it's much more common in VM's, which is how you're using it), and use of getrandom(2) is still relatively rare, given that glibc hasn't yet seen fit to support it yet. Cheers, - Ted
Reasonably Related Threads
- getrandom waits for a long time when /dev/random is insufficiently read from
- getrandom waits for a long time when /dev/random is insufficiently read from
- getrandom waits for a long time when /dev/random is insufficiently read from
- getrandom waits for a long time when /dev/random is insufficiently read from
- getrandom waits for a long time when /dev/random is insufficiently read from