Hi. I experimented a bit with collecting entropy from the time it takes for device_attach() to run (in CPU cycles). It seems that those times have enough variation that we can use it for entropy harvesting. It happens even before root is mounted, so pretty early. On the machine I'm testing it, which has minimal kernel plus NIC driver I see 75 device_attach() calls. I'm being very careful and advertising to yarrow that each call has only 4 bits of entropy (most of the time there is much more). This gives 300 bits of entropy on this machine before we even start init. For real hardware like sound card it takes between 34647162 and 35548675 cycles to run device_attach(), so the difference here is 901513. If all the times are more or less equally probable in this range we have more than 19 bits of entropy from this one call, but I reduced if to four bits only, because there are devices that are much faster to attach. We could make the code more complex by assuming 0.01% of the time varies, which should still be safe and will allow to collect more entropy from those long calls. The patch is here: http://people.freebsd.org/~pjd/patches/harvest_device_attach.patch Comments? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-security/attachments/20120918/64884bfb/attachment.pgp
Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:> I experimented a bit with collecting entropy from the time it takes for > device_attach() to run (in CPU cycles). It seems that those times have > enough variation that we can use it for entropy harvesting. It happens > even before root is mounted, so pretty early.Excellent idea :)> On the machine I'm testing it, which has minimal kernel plus NIC driver > I see 75 device_attach() calls. I'm being very careful and advertising > to yarrow that each call has only 4 bits of entropy (most of the time > there is much more). This gives 300 bits of entropy on this machine > before we even start init.Virtual machines (and even some physical hardware) can have as few as 40 devices. I have a VirtualBox instance running 9.1-RC1 that has only 36 devices (based on `sysctl dev | cut -d. -f2-3 | sort -u | wc -l`), and a soekris net5501 that only has 43. This does not count network interfaces, though.> For real hardware like sound card it takes between 34647162 and 35548675 > cycles to run device_attach(), [...]You can't rely on the existence of a TSC. I would suggest using the fractional part of binuptime instead. I would also suggest modifying yarrow to block reseeding as long as possible, ideally right up until the first time something asks for a random number, since reseeding throws away all accumulated entropy. I'd suggest delaying reseeding until right before we start the scheduler, but if I understand correctly, geom_geli may need randomness before that? DES -- Dag-Erling Sm?rgrav - des@des.no
On Tue, 18 Sep 2012 23:14:22 +0200 Pawel Jakub Dawidek wrote:> Hi. >> The patch is here: > > http://people.freebsd.org/~pjd/patches/harvest_device_attach.patch > > Comments? >+ attachtime = get_cyclecount() - attachtime; the above line is redundant since random_harvest() already contains a call to get_cyclecount(). On Wed, 19 Sep 2012 17:28:46 +0200 Dag-Erling Sm?rgrav wrote:> You can't rely on the existence of a TSC. I would suggest using the > fractional part of binuptime instead.get_cyclecount() is supposed to be platform independent and should fall-back to nanotime(9) if TSC or equivalent is absent.
On Tuesday, 18 September 2012 at 22:14, Pawel Jakub Dawidek wrote:> I experimented a bit with collecting entropy from the time it takes for > device_attach() to run (in CPU cycles). It seems that those times have > enough variation that we can use it for entropy harvesting. It happens > even before root is mounted, so pretty early. >That sounds really great.> If all the times are more or less equally probable in this range [?]They're very unlikely to be equally probable. It would make sense to do some characterization of these times and their statistics: a highly non-uniform distribution would mean that we don't actually get many bits per attach.> [?] we have more > than 19 bits of entropy from this one call, but I reduced if to four > bits only, because there are devices that are much faster to attach. >Another reason for doing the above characterization is that, if a particular device_attach() really does provide 12 bits of uncertainty, it's a shame to drop eight of them on the floor.> We could make the code more complex by assuming 0.01% of the time > varies, which should still be safe and will allow to collect more > entropy from those long calls. >I'm a bit leery of assuming that things "should still be safe" for the above reasons. Again, some hard numbers would really help here. Maybe we should even convince a student to do a project. :) Jon -- Jonathan Anderson Research Associate Computer Laboratory University of Cambridge jonathan.anderson@cl.cam.ac.uk +44 1223 763 747
On Tue, Sep 18, 2012 at 11:14:22PM +0200, Pawel Jakub Dawidek wrote:> I experimented a bit with collecting entropy from the time it takes for > device_attach() to run (in CPU cycles). It seems that those times have > enough variation that we can use it for entropy harvesting. It happens > even before root is mounted, so pretty early.I like it. Microsoft harvests from something like 900 events/things. The more good things like this we find improves our security.> The patch is here: > http://people.freebsd.org/~pjd/patches/harvest_device_attach.patch > Comments?Embelishments: Index: sys/dev/random/randomdev_soft.c ==================================================================--- sys/dev/random/randomdev_soft.c (revision 240694) +++ sys/dev/random/randomdev_soft.c (working copy) @@ -158,6 +185,11 @@ random_yarrow_init(void) "Harvest serial net entropy"); SYSCTL_ADD_PROC(&random_clist, SYSCTL_CHILDREN(random_sys_harvest_o), + OID_AUTO, "devprobe", CTLTYPE_INT | CTLFLAG_RW, + &harvest.devprobe, 1, random_check_boolean, "I", + "Harvest Device Probe entropy"); + SYSCTL_ADD_PROC(&random_clist, + SYSCTL_CHILDREN(random_sys_harvest_o), OID_AUTO, "interrupt", CTLTYPE_INT | CTLFLAG_RW, &harvest.interrupt, 1, random_check_boolean, "I", "Harvest IRQ entropy"); @@ -303,7 +341,7 @@ random_harvest_internal(u_int64_t someco KASSERT(origin == RANDOM_START || origin == RANDOM_WRITE || origin == RANDOM_KEYBOARD || origin == RANDOM_MOUSE || origin == RANDOM_NET || origin == RANDOM_INTERRUPT || - origin == RANDOM_PURE, + origin == RANDOM_PURE || origin == RANDOM_DEVICE, ("random_harvest_internal: origin %d invalid\n", origin)); /* Lockless read to avoid lock operations if fifo is full. */ Index: sys/dev/random/harvest.c ==================================================================--- sys/dev/random/harvest.c (revision 240694) +++ sys/dev/random/harvest.c (working copy) @@ -48,7 +48,13 @@ __FBSDID("$FreeBSD$"); static int read_random_phony(void *, int); /* Structure holding the desired entropy sources */ -struct harvest_select harvest = { 1, 1, 1, 0 }; +struct harvest_select harvest = { + 1, /*ethernet*/ + 1, /*pt2pt*/ + 1, /*intr*/ + 0, /*swi*/ + 1, /*devprobe*/ +}; static int warned = 0; /* hold the address of the routine which is actually called if Index: sys/sys/random.h ==================================================================--- sys/sys/random.h (revision 240495) +++ sys/sys/random.h (working copy) @@ -45,6 +45,7 @@ enum esource { RANDOM_NET, RANDOM_INTERRUPT, RANDOM_PURE, + RANDOM_DEVICE, ENTROPYSOURCE }; void random_harvest(void *, u_int, u_int, u_int, enum esource); @@ -57,6 +58,7 @@ struct harvest_select { int point_to_point; int interrupt; int swi; + int device; }; extern struct harvest_select harvest; Index: sys/kern/subr_bus.c ==================================================================--- sys/kern/subr_bus.c (revision 240495) +++ sys/kern/subr_bus.c (working copy) @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$"); #include <sys/condvar.h> #include <sys/queue.h> #include <machine/bus.h> +#include <sys/random.h> #include <sys/rman.h> #include <sys/selinfo.h> #include <sys/signalvar.h> @@ -53,6 +54,7 @@ __FBSDID("$FreeBSD$"); #include <sys/bus.h> #include <sys/interrupt.h> +#include <machine/cpu.h> #include <machine/stdarg.h> #include <vm/uma.h> @@ -2760,8 +2762,10 @@ device_probe_and_attach(device_t dev) int device_attach(device_t dev) { + uint64_t attachtime; int error; + attachtime = get_cyclecount(); device_sysctl_init(dev); if (!device_is_quiet(dev)) device_print_child(dev->parent, dev); @@ -2784,6 +2788,10 @@ device_attach(device_t dev) dev->state = DS_ATTACHED; dev->flags &= ~DF_DONENOMATCH; devadded(dev); + if (harvest.devprobe) + random_harvest(&attachtime, sizeof(attachtime), 4, 0, + RANDOM_DEVICE); + return (0); } Index: etc/defaults/rc.conf ==================================================================--- etc/defaults/rc.conf (revision 239610) +++ etc/defaults/rc.conf (working copy) @@ -642,6 +642,7 @@ entropy_file="/entropy" # Set to NO to d entropy_dir="/var/db/entropy" # Set to NO to disable caching entropy via cron. entropy_save_sz="2048" # Size of the entropy cache files. entropy_save_num="8" # Number of entropy cache files to save. +harvest_devprobe="YES" # Entropy device harvests device probe randomness harvest_interrupt="YES" # Entropy device harvests interrupt randomness harvest_ethernet="YES" # Entropy device harvests ethernet randomness harvest_p_to_p="YES" # Entropy device harvests point-to-point randomness Index: etc/rc.d/initrandom ==================================================================--- etc/rc.d/initrandom (revision 239610) +++ etc/rc.d/initrandom (working copy) @@ -41,6 +63,12 @@ initrandom_start() if [ \! -z "${soft_random_generator}" ] ; then if [ -w /dev/random ]; then + if checkyesno harvest_devprobe; then + ${SYSCTL} kern.random.sys.harvest.devprobe=1 >/dev/null + echo -n ' interrupts' + else + ${SYSCTL} kern.random.sys.harvest.devprobe=0 >/dev/null + fi if checkyesno harvest_interrupt; then ${SYSCTL} kern.random.sys.harvest.interrupt=1 >/dev/null echo -n ' interrupts'
On Wed, Sep 19, 2012 at 07:28:36PM +0100, RW wrote:> On Tue, 18 Sep 2012 23:14:22 +0200 > Pawel Jakub Dawidek wrote: > > > Hi. > > > > > The patch is here: > > > > http://people.freebsd.org/~pjd/patches/harvest_device_attach.patch > > > > Comments? > > > > + attachtime = get_cyclecount() - attachtime; > > the above line is redundant since random_harvest() already contains a > call to get_cyclecount().Agreed, although in more recent patch I need total time, so I can calculate how many bits it has, so I can estimate how much entropy there is. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-security/attachments/20120922/a87315e7/attachment.pgp
On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote:> W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze: > > Mariusz, can you confirm my findings? > > Pawel, > > Your conclusions can be easily confirmed by shape analysis of the EDF. > Usually maximum quantile difference (called D-statistic) gives you a > kind of overview, function shape gives you a strong feeling, p-value > gives you a formal proof. > D-statistic values (your data): > > 6bit: 0.33% > 7bit: 0.29% > 8bit: 0.27% > 9bit: 0.21% > 10bit: 6.34% > 11bit: 19.07% > 12bit: 54.80% > > What I would say: increasing the number of bits from 6 to 9 does not > affect distribution "uniformity", reaching the tenth bit results in > sudden increase in the difference measure - the more bits, the more > difference is observed. Distribution shape analysis for the 10th bit > shows non-linear function. Lack of "randomness" in the quntile > difference curve - chart shows completely lack of noise (pure > functional relation). These are very strong indicators that starting > from 10th bit distribution was changed and is no longer uniform. > > To formally confirm above conclusion for i.e. 5% significance level, > which means that confidence level is 95%, I need some extra data > regarding sample sizes. Please pass to me number of collected > observations in each 6-12 bit experiment.Total number of observations was 162833. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-security/attachments/20120923/d3fc7f89/attachment.pgp
W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze:> On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote: >> W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze: >>> Mariusz, can you confirm my findings? >> >> Pawel, >> >> Your conclusions can be easily confirmed by shape analysis of the EDF. >> Usually maximum quantile difference (called D-statistic) gives you a >> kind of overview, function shape gives you a strong feeling, p-value >> gives you a formal proof. >> D-statistic values (your data): >> >> 6bit: 0.33% >> 7bit: 0.29% >> 8bit: 0.27% >> 9bit: 0.21% >> 10bit: 6.34% >> 11bit: 19.07% >> 12bit: 54.80% >> >> What I would say: increasing the number of bits from 6 to 9 does not >> affect distribution "uniformity", reaching the tenth bit results in >> sudden increase in the difference measure - the more bits, the more >> difference is observed. Distribution shape analysis for the 10th bit >> shows non-linear function. Lack of "randomness" in the quntile >> difference curve - chart shows completely lack of noise (pure >> functional relation). These are very strong indicators that starting >> from 10th bit distribution was changed and is no longer uniform. >> >> To formally confirm above conclusion for i.e. 5% significance level, >> which means that confidence level is 95%, I need some extra data >> regarding sample sizes. Please pass to me number of collected >> observations in each 6-12 bit experiment. > > Total number of observations was 162833. >Ok, finally I have some formal results. To be completely honest I need to point out that, in fact, we have a discrete data (for example integers 0, 1, ..., 63, but not continues numbers spread across 0 and 63). That is way I am going to use two sample Kolmogorov-Smirnov test. Methodology is simple: - Pawel?s data will be called empirical one - Theoretical data will be generated as a sequence of unique integer numbers from 0 to 2**n -1, where n is the number of bits. Assumption - each number appears in theoretical data only once representing ideal uniform distribution. Calculations will be done in the R-cran package Loading empirical data form files: > e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt") > e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt") > e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt") > e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt") > e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt") > e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt") > e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt") Generating ideal theoretical data: > t6 = c(0:(2**6-1)) > t7 = c(0:(2**7-1)) > t8 = c(0:(2**8-1)) > t9 = c(0:(2**9-1)) > t10 = c(0:(2**10-1)) > t11 = c(0:(2**11-1)) > t12 = c(0:(2**12-1)) Performing KS tests: > ks.test(e6, t6) D = 0.0032, p-value = 1 > ks.test(e7, t7) D = 0.0029, p-value = 1 > ks.test(e8, t8) D = 0.0027, p-value = 1 > ks.test(e9, t9) D = 0.0022, p-value = 1 > ks.test(e10, t10) D = 0.0634, p-value = 0.0005562 > ks.test(e11, t11) D = 0.1907, p-value < 2.2e-16 > ks.test(e12, t12) D = 0.5479, p-value < 2.2e-16 As you can see D-statistics are almost the same as calculated by Pawel (considering roundings). P-values are very interesting due to very high number of observations generated by Pawel. Between 6 bits and 9 bits estimated p-values are equal to 1, so it means that it is impossible (at any significance level) to reject null hypothesis stating that compared distributions are equal. Final conclusion: it has to be random, and for sure it is random! Additionally starting form 10 bits we can observe dramatic decrease of p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So low p-value means that it is impossible not to reject null hypothesis stating that compared distributions are equal. Final conclusion: it cannot be random, and for sure it is not random. I did the same comparison for the previous real device attach data (2081 obs.). R code and the results are below: > e16 = read.table("E:\\pawel\\device_attach_16bit.log") > t16 = c(0:(2**16-1)) > ks.test(e16, t16) D = 0.0178, p-value = 0.5422 Again, D-statistic an p-value are almost the same as previously calculated "manually". P-value is very high (it is not as high as in the 6-12 bits tests, but consider much lower number of observations: 2081 vs 162833), giving almost sureness that you have captured real 16-bits entropy! Regards, Mariusz
W dniu 2012-09-24 23:56, Mariusz Gromada pisze:> Ok, finally I have some formal results. To be completely honest I need > to point out that, in fact, we have a discrete data (for example > integers 0, 1, ..., 63, but not continues numbers spread across 0 and > 63). That is way I am going to use two sample Kolmogorov-Smirnov test.Another clarification is needed. KS test in general (and in theory) should be used for continuous distributions. But in our case we can easily say that we observe our distribution in integers only (rounding), and the whole rest is easily estimated. Regards, Mariusz