Eric Blake
2023-May-16 19:35 UTC
[Libguestfs] [PATCH nbdkit 1/5] Add new public nbdkit_parse_probability function
On Tue, May 16, 2023 at 07:20:16PM +0200, Laszlo Ersek wrote:> > On 5/16/23 14:12, Richard W.M. Jones wrote: > > In nbdkit-error-filter we need to parse parameters as probabilities. > > This is useful enough to add to nbdkit, since we will use it in > > another filter in future. > > ---I'll start with the good, before focusing on the questionable. You make a compelling argument about wanting to reuse already-existing code for parsing a percentage. It is indeed useful to have a single utility function, to be reused by multiple plugins and filters, so that we have consistent grammars for what we are willing to accept (rather than multiple ad hoc parsers being subtly different), and where we can fix any bugs or add new supported parses in just one location. So that is a strong argument in favor of this patch if we can justify why having a floating point error representation in the first place is worthwhile. The latter, of course, is harder to argue; so just because this patch is good at eliminating redundancy does not necessarily mean it is a good idea to use floating point.> > +=head2 Parsing probabilities > > + > > +Use the C<nbdkit_parse_probability> utility function to parse > > +probabilities. Common formats understood include: C<"0.1">, C<"10%"> > > +or C<"1:10">, which all mean a probability of 1 in 10. > > + > > + int nbdkit_parse_probability (const char *what, const char *str, > > + double *ret); > > + > > +The C<what> parameter is printed in error messages to provide context. > > +The C<str> parameter is the probability string. > > + > > +On success the function returns C<0> and sets C<*ret>. B<Note> that > > +the probability returned may be outside the range S<[ 0.0..1.0 ]>, for > > +example if C<str == "200%">. If you want to clamp the result you must > > +check that yourself.Would it be helpful if the utility function included bounds that the result must lie between? As in: if (nbdkit_parse_probability ("my function", str, 0.0, 1.0, &value) == -1) return -1; where code that wants to allow larger than 100% can do so by passing different bounds?> > > > +NBDKIT_DLL_PUBLIC int > > +nbdkit_parse_probability (const char *what, const char *str, > > + double *retp) > > +{ > > + double d, d2; > > + char c; > > + int n; > > + > > + if (sscanf (str, "%lg%[:/]%lg%n", &d, &c, &d2, &n) == 3 && > > + strcmp (&str[n], "") == 0) { /* N:M or N/M */This accepts LOTS of strings. Maybe that is intentional, but still, it accepts more than you documented. For example, it accepts "-inf:nan(0x0)", but only on some platforms (since expanded nan(...) sequences are implementation defined), even though I can't possibly see how that makes for a valid percentage.> > + if (d == 0 && d2 == 0) /* 0/0 is OK */ > > + ;So your intent is to allow bypassing 0/0 to instead behave identically to 0%.> > + else if (d2 == 0) /* N/0 is bad */ > > + goto bad_parse; > > + else > > + d /= d2; > > I don't want to be a spoilsport, but this division makes me extra > nervous [1], on top of parsing floating point *at all* [2]. > > [2] Eric has just posted a 19-part series, v2, for QEMU, for fixing the > numerous issues in QEMU's floating point parser. Personal opinion: QEMU > doesn't even have a *valid reason* for parsing floating point! All QEMU > deals with are *integer quantities* such as disk images consisting of > whole numbers of bytes! But again, that's just a personal opinion.No offense taken. And while you are correct that my v2 series for qemu clocked in at 19 patches, it doesn't necessarily translate to 19 bugs fixed (although it was definitely more than one), nor were they all in floating point (one of the bugs was in qemu_strtoui() which is integer only). In short, my biggest time sink there was fixing technical debt on needing a LOT more unit tests to make sure I wasn't inadvertently breaking some corner case (and with strtod() or scanf("%g"), there are a LOT of those). Still, being able to write 1.5G rather than 1610612736 explains why qemu goes to the bother of attempting to allow certain floating point strings. So inspired by your concern, I just took another look at nbdkit_parse_size(), which currently has rather terse public documentation of: | Use the C<nbdkit_parse_size> utility function to parse human-readable | size strings such as "100M" into the size in bytes. | | int64_t nbdkit_parse_size (const char *str); | | C<str> can be a string in a number of common formats. The function | returns the size in bytes. If there was an error, it returns C<-1>. but where you are better reading server/public.c to see what we actually parse, along with server/test-public.c:test_nbdkit_parse_size() for what we promise not to break (currently, no strings with '.' are in our explicitly supported set). The former has: /* XXX Should we also parse things like '1.5M'? */ /* XXX Should we allow hex? If so, hex cannot use scaling suffixes, * because some of them are valid hex digits */ which are some of the very corner cases I was just dealing with in qemu's qemu_strtosz(); and the latter has its own shortfalls (for example, classifying "8E" as a bogus string, instead of in the section of strings leading to overflow - the real reason we reject 8E is because it doesn't fit as a positive value in off_t). Alas, since qemu has a stricter license, I can't just copy my work there over to nbdkit without also exercising my right as author to dual-license it more permissively. But the way our docs are currently written, we do seem to reserve the right to make strings that currently fail to parse start validly parsing in the future (such as 1.5M).> > [1] I maintain that floating point is only a part of the C language so > that C could be made "competitive" with Fortran. Therefore I honestly > believe that *no* floating point should be done in C without fully > engaging in numerical analysis. I'm not up to that -- I've had numerical > analysis in college, and I remember just enough of it to steer the heck > clear of it. In C99, there's a whole chapter dedicated to the floating > point environment (7.6 Floating-point environment <fenv.h>), and there's > also the normative Annex F ("IEC 60559 floating-point arithmetic"). > > In this particular case, we have a floating point division. The divisor > "d2" can be an extremely small *positive* number (DBL_MIN, or another > value with a very small positive mantissa and a huge absolute value > negative exponent, like FLT_RADIX raised to DBL_MIN_EXP), and the > dividend "d" may be a huge positive number (DBL_MAX, or another value > with a large positive mantissa and a huge positive exponent like > FLT_RADIX raised to DBL_MAX_EXP). > > The quotient d/d2 is then almost certainly not representable in the type > -- the resultant exponent would have to be about (DBL_MAX_EXP - > DBL_MIN_EXP), which in practice is about 2 * DBL_MAX_EXP -- and there's > just not another exponent bit available in the type for that. > > And I can't even say whether that would result in a silent NaN, or a > silent positive infinity, or a floating point exception. I don't like > code that I can't reason about (even if it is ultimately my fault, for > not knowing the language well enough).C is a lot looser here than other languages. For example, Java took pains to define that division by 0 or infinity produces platform-independent reliable results (at the expense of someone writing a Java virtual machine having to do slower implementations which special case anything that the hardware does not natively do in the same manner). At any rate, I agree with you that floating point division, when you have not validated that the unbounded-precision result won't overflow/underflow under rounding scenarios, is not trivial to think about.> > > Relatedly... in patch#5, we have: > > + block_size = next_power_of_2 ((uint64_t) (100. / evil_probability)); > > where "evil_probability" comes from nbdkit_parse_probability(), and is > checked separately to be in [0, 1] (both boundaries inclusive, per the > code). > > So, evil_probability==0 will produce a division by zero here, but even > if we assume that's just a typo and we actually mean to exclude 0, the > user can still provide a value near DBL_MIN on the command line, like > 1E-37. 100 is 1E+2, so the quotient is 1E+39. Representing 10^39 in > binary requires ceil(39 * log2(10)) bits -- that's 130 bits; uint64_t is > too small for that. And then the following applies from C99: > > 6.3.1.4 Real floating and integer > > When a finite value of real floating type is converted to an integer > type other than _Bool, the fractional part is discarded (i.e., the > value is truncated toward zero). If the value of the integral part > cannot be represented by the integer type, the behavior is > undefined. [Footnote 50] > > Footnote 50: > > The remaindering operation performed when a value of integer type is > converted to unsigned type need not be performed when a value of > real floating type is converted to unsigned type. Thus, the range of > portable real floating values is (?1, Utype_MAX+1).Oh, so my assumption (in my reply to patch 5, before replying to this email) that (uint64_t)INFINITY produces UINT64_MAX is invalid (again, maybe something I'm remembering from Java taking pains to explicitly specify that). It may happen to do so, but other values are also possible, and the compiler can also use this explicit promise of UB to optimize things differently than we would like. Yeah, this really is a bear-trap.> > (Note the round parens in the footnote: it means the boundaries are > *excluded*. So, a value like -0.5 will be truncated to zero, and a value > like UINT64_MAX + 0.5 will be truncated to UINT64_MAX, but (-1) itself > and (UINT64_MAX + 1) itself are not permitted already.)I challenge you to find a system where 'double' can hold the value UINT64_MAX + 0.5 with sufficient precision (IEEE double only requires 51 bits of precision). In fact, some of the back-story of my work on qemu's floating point parser was an earlier bug report, raised by nbdkit's unit testing, when we discovered that qemu was rounding large values (because of its trip through 'double') rather than matching an integral parse (found when qemu was unable to produce large sparse images as large as nbdkit because of the lost precision). Semi-related, just last week, I also fixed a bug in glib where it was assuming that the round trip (uint64_t)(long double)(uint64_t)value would not lose precision, which happens to be the case on x86_64 but is not true on other hardware; but here we're using sscanf with 'double' not 'long double'.> > I recommend the following: take two unsigned integers, numerator "a" and > denominator "b", from the command line. We know how to parse those safely. > > Maintain those values separately. > > In every calculation where we need to multiply x by a/b, calculate > (x*a)/b, and use MUL_OVERFLOW() for the multiplication. Where we need to > divide x by (a/b), use (x*b)/a, again with MUL_OVERFLOW(). (And of > course the divisor must never be 0.) > > In case we can determine the possible range for "x" *in advance*, for > example because we know we're going to divide x=100 by (a/b), we can > *also* determine the safe range for "a" and "b" as well -- for x=100, > x*b must fit in a uint64_t, so "b" must not exceed UINT64_MAX/100, and > that can be verified safely after parsing. > > Floating point is in fact *supposed* to take away all this messing, and > "compress" the ratio into the mantissa, and set the exponent (the scale) > accordingly. But the simplicity is deceiving, due to cancellation, error > accumulation, etc. (Cue the articles "what every programmer should know > about floating point".) > > I don't think I can *safely* use floating point, I can only say when > something looks unsafe. Maintaining two positive integers (numerator and > denominator) is more work, but I can at least *reason* about those. > > If Eric is OK with this patch set, I won't try to block it, it just > makes me feel very uncomfortable. It's more than just scanf() lacking > proper internal error handling; the real complications come when we > perform operations on the floating point values.Our pre-existing error filter is already doing percentage calculations via floating point, so on the grounds that this patch is consolidating common code patterns to avoid reimplementation, we're good. But as you say, representing things as a ratio between two integers may be more maintainable than floating point where accuracy matters, or this may be a case where we are explicit that use of the floating point fractions as a percentage is intentionally liable to rounding issues. Still, avoiding undefined behavior when the division can overflow (whether that be the overflow to floating point infinity or even the overflow beyond UINT64_MAX), does make it worth questioning if floating point is our best representation, or at a minimum if we want to be stricter in what we parse before passing a substring on to strtod() so that we have more control over the values (part of my 19 patches to qemu was explicitly truncating any string at 'e' or 'E' so that strtod() could not do an exponent parse that would inadverently hit overflow to INFINITY). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2023-May-16 20:22 UTC
[Libguestfs] [PATCH nbdkit 1/5] Add new public nbdkit_parse_probability function
On Tue, May 16, 2023 at 02:35:38PM -0500, Eric Blake wrote:> > > +=head2 Parsing probabilities > > > + > > > +Use the C<nbdkit_parse_probability> utility function to parse > > > +probabilities. Common formats understood include: C<"0.1">, C<"10%"> > > > +or C<"1:10">, which all mean a probability of 1 in 10. > > > + > > > + int nbdkit_parse_probability (const char *what, const char *str, > > > + double *ret); > > > + > > > +The C<what> parameter is printed in error messages to provide context. > > > +The C<str> parameter is the probability string. > > > + > > > +On success the function returns C<0> and sets C<*ret>. B<Note> that > > > +the probability returned may be outside the range S<[ 0.0..1.0 ]>, for > > > +example if C<str == "200%">. If you want to clamp the result you must > > > +check that yourself. > > Would it be helpful if the utility function included bounds that the > result must lie between? As in: > > if (nbdkit_parse_probability ("my function", str, 0.0, 1.0, &value) == -1) > return -1; > > where code that wants to allow larger than 100% can do so by passing > different bounds?I had an earlier version which attempted this, but the problem is it's hard to describe when the upper or lower limit is not bounded. Or at least you'd need to add flags which makes the binding in other languages more difficult.> > > > > > +NBDKIT_DLL_PUBLIC int > > > +nbdkit_parse_probability (const char *what, const char *str, > > > + double *retp) > > > +{ > > > + double d, d2; > > > + char c; > > > + int n; > > > + > > > + if (sscanf (str, "%lg%[:/]%lg%n", &d, &c, &d2, &n) == 3 && > > > + strcmp (&str[n], "") == 0) { /* N:M or N/M */ > > This accepts LOTS of strings. Maybe that is intentional, but still, > it accepts more than you documented. For example, it accepts > "-inf:nan(0x0)", but only on some platforms (since expanded nan(...) > sequences are implementation defined), even though I can't possibly > see how that makes for a valid percentage. > > > > + if (d == 0 && d2 == 0) /* 0/0 is OK */ > > > + ; > > So your intent is to allow bypassing 0/0 to instead behave identically > to 0%. > > > > + else if (d2 == 0) /* N/0 is bad */ > > > + goto bad_parse; > > > + else > > > + d /= d2; > > > > I don't want to be a spoilsport, but this division makes me extra > > nervous [1], on top of parsing floating point *at all* [2]. > > > > [2] Eric has just posted a 19-part series, v2, for QEMU, for fixing the > > numerous issues in QEMU's floating point parser. Personal opinion: QEMU > > doesn't even have a *valid reason* for parsing floating point! All QEMU > > deals with are *integer quantities* such as disk images consisting of > > whole numbers of bytes! But again, that's just a personal opinion. > > No offense taken. And while you are correct that my v2 series for > qemu clocked in at 19 patches, it doesn't necessarily translate to 19 > bugs fixed (although it was definitely more than one), nor were they > all in floating point (one of the bugs was in qemu_strtoui() which is > integer only). In short, my biggest time sink there was fixing > technical debt on needing a LOT more unit tests to make sure I wasn't > inadvertently breaking some corner case (and with strtod() or > scanf("%g"), there are a LOT of those). > > Still, being able to write 1.5G rather than 1610612736 explains why > qemu goes to the bother of attempting to allow certain floating point > strings. So inspired by your concern, I just took another look at > nbdkit_parse_size(), which currently has rather terse public > documentation of: > > | Use the C<nbdkit_parse_size> utility function to parse human-readable > | size strings such as "100M" into the size in bytes. > | > | int64_t nbdkit_parse_size (const char *str); > | > | C<str> can be a string in a number of common formats. The function > | returns the size in bytes. If there was an error, it returns C<-1>. > > but where you are better reading server/public.c to see what we > actually parse, along with > server/test-public.c:test_nbdkit_parse_size() for what we promise not > to break (currently, no strings with '.' are in our explicitly > supported set). The former has: > > /* XXX Should we also parse things like '1.5M'? */ > /* XXX Should we allow hex? If so, hex cannot use scaling suffixes, > * because some of them are valid hex digits */ > > which are some of the very corner cases I was just dealing with in > qemu's qemu_strtosz(); and the latter has its own shortfalls (for > example, classifying "8E" as a bogus string, instead of in the section > of strings leading to overflow - the real reason we reject 8E is > because it doesn't fit as a positive value in off_t). Alas, since > qemu has a stricter license, I can't just copy my work there over to > nbdkit without also exercising my right as author to dual-license it > more permissively. But the way our docs are currently written, we do > seem to reserve the right to make strings that currently fail to parse > start validly parsing in the future (such as 1.5M). > > > > > [1] I maintain that floating point is only a part of the C language so > > that C could be made "competitive" with Fortran. Therefore I honestly > > believe that *no* floating point should be done in C without fully > > engaging in numerical analysis. I'm not up to that -- I've had numerical > > analysis in college, and I remember just enough of it to steer the heck > > clear of it. In C99, there's a whole chapter dedicated to the floating > > point environment (7.6 Floating-point environment <fenv.h>), and there's > > also the normative Annex F ("IEC 60559 floating-point arithmetic"). > > > > In this particular case, we have a floating point division. The divisor > > "d2" can be an extremely small *positive* number (DBL_MIN, or another > > value with a very small positive mantissa and a huge absolute value > > negative exponent, like FLT_RADIX raised to DBL_MIN_EXP), and the > > dividend "d" may be a huge positive number (DBL_MAX, or another value > > with a large positive mantissa and a huge positive exponent like > > FLT_RADIX raised to DBL_MAX_EXP). > > > > The quotient d/d2 is then almost certainly not representable in the type > > -- the resultant exponent would have to be about (DBL_MAX_EXP - > > DBL_MIN_EXP), which in practice is about 2 * DBL_MAX_EXP -- and there's > > just not another exponent bit available in the type for that. > > > > And I can't even say whether that would result in a silent NaN, or a > > silent positive infinity, or a floating point exception. I don't like > > code that I can't reason about (even if it is ultimately my fault, for > > not knowing the language well enough). > > C is a lot looser here than other languages. For example, Java took > pains to define that division by 0 or infinity produces > platform-independent reliable results (at the expense of someone > writing a Java virtual machine having to do slower implementations > which special case anything that the hardware does not natively do in > the same manner). At any rate, I agree with you that floating point > division, when you have not validated that the unbounded-precision > result won't overflow/underflow under rounding scenarios, is not > trivial to think about. > > > > > > > Relatedly... in patch#5, we have: > > > > + block_size = next_power_of_2 ((uint64_t) (100. / evil_probability)); > > > > where "evil_probability" comes from nbdkit_parse_probability(), and is > > checked separately to be in [0, 1] (both boundaries inclusive, per the > > code). > > > > So, evil_probability==0 will produce a division by zero here, but even > > if we assume that's just a typo and we actually mean to exclude 0, the > > user can still provide a value near DBL_MIN on the command line, like > > 1E-37. 100 is 1E+2, so the quotient is 1E+39. Representing 10^39 in > > binary requires ceil(39 * log2(10)) bits -- that's 130 bits; uint64_t is > > too small for that. And then the following applies from C99: > > > > 6.3.1.4 Real floating and integer > > > > When a finite value of real floating type is converted to an integer > > type other than _Bool, the fractional part is discarded (i.e., the > > value is truncated toward zero). If the value of the integral part > > cannot be represented by the integer type, the behavior is > > undefined. [Footnote 50] > > > > Footnote 50: > > > > The remaindering operation performed when a value of integer type is > > converted to unsigned type need not be performed when a value of > > real floating type is converted to unsigned type. Thus, the range of > > portable real floating values is (?1, Utype_MAX+1). > > Oh, so my assumption (in my reply to patch 5, before replying to this > email) that (uint64_t)INFINITY produces UINT64_MAX is invalid (again, > maybe something I'm remembering from Java taking pains to explicitly > specify that). It may happen to do so, but other values are also > possible, and the compiler can also use this explicit promise of UB to > optimize things differently than we would like. Yeah, this really is > a bear-trap. > > > > > (Note the round parens in the footnote: it means the boundaries are > > *excluded*. So, a value like -0.5 will be truncated to zero, and a value > > like UINT64_MAX + 0.5 will be truncated to UINT64_MAX, but (-1) itself > > and (UINT64_MAX + 1) itself are not permitted already.) > > I challenge you to find a system where 'double' can hold the value > UINT64_MAX + 0.5 with sufficient precision (IEEE double only requires > 51 bits of precision). In fact, some of the back-story of my work on > qemu's floating point parser was an earlier bug report, raised by > nbdkit's unit testing, when we discovered that qemu was rounding large > values (because of its trip through 'double') rather than matching an > integral parse (found when qemu was unable to produce large sparse > images as large as nbdkit because of the lost precision). > Semi-related, just last week, I also fixed a bug in glib where it was > assuming that the round trip (uint64_t)(long double)(uint64_t)value > would not lose precision, which happens to be the case on x86_64 but > is not true on other hardware; but here we're using sscanf with > 'double' not 'long double'. > > > > > I recommend the following: take two unsigned integers, numerator "a" and > > denominator "b", from the command line. We know how to parse those safely. > > > > Maintain those values separately. > > > > In every calculation where we need to multiply x by a/b, calculate > > (x*a)/b, and use MUL_OVERFLOW() for the multiplication. Where we need to > > divide x by (a/b), use (x*b)/a, again with MUL_OVERFLOW(). (And of > > course the divisor must never be 0.) > > > > In case we can determine the possible range for "x" *in advance*, for > > example because we know we're going to divide x=100 by (a/b), we can > > *also* determine the safe range for "a" and "b" as well -- for x=100, > > x*b must fit in a uint64_t, so "b" must not exceed UINT64_MAX/100, and > > that can be verified safely after parsing. > > > > Floating point is in fact *supposed* to take away all this messing, and > > "compress" the ratio into the mantissa, and set the exponent (the scale) > > accordingly. But the simplicity is deceiving, due to cancellation, error > > accumulation, etc. (Cue the articles "what every programmer should know > > about floating point".) > > > > I don't think I can *safely* use floating point, I can only say when > > something looks unsafe. Maintaining two positive integers (numerator and > > denominator) is more work, but I can at least *reason* about those. > > > > If Eric is OK with this patch set, I won't try to block it, it just > > makes me feel very uncomfortable. It's more than just scanf() lacking > > proper internal error handling; the real complications come when we > > perform operations on the floating point values. > > Our pre-existing error filter is already doing percentage calculations > via floating point, so on the grounds that this patch is consolidating > common code patterns to avoid reimplementation, we're good. But as you > say, representing things as a ratio between two integers may be more > maintainable than floating point where accuracy matters, or this may > be a case where we are explicit that use of the floating point > fractions as a percentage is intentionally liable to rounding issues. > Still, avoiding undefined behavior when the division can overflow > (whether that be the overflow to floating point infinity or even the > overflow beyond UINT64_MAX), does make it worth questioning if > floating point is our best representation, or at a minimum if we want > to be stricter in what we parse before passing a substring on to > strtod() so that we have more control over the values (part of my 19 > patches to qemu was explicitly truncating any string at 'e' or 'E' so > that strtod() could not do an exponent parse that would inadverently > hit overflow to INFINITY).I'm not really sure we reached a conclusion so far, but I did want to say that I changed the evil filter impl so that it now treats any 0 <= P < 1e-12 as effectively 0. (P == evil-probability; P < 0 and P > 1 are still rejected at config time as before). With P == 1e-12: nbdkit: debug: evil: mode: stuck-bits, P: 1e-12 nbdkit: debug: evil: block_size: 17592186044416 (2**44) nbdkit: debug: evil: expected bits per block: 140.737 (The large block size isn't an issue here as nothing is allocated. However it would be a problem if the block_size calculation overflows, and I believe that limiting P to larger values avoids this.) Also I'm not sure that handling probabilities as a numerator and denominator actually helps here? Both still need to be stored as floats, so it just pushes the same problem to the plugin. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit
Laszlo Ersek
2023-May-17 14:08 UTC
[Libguestfs] [PATCH nbdkit 1/5] Add new public nbdkit_parse_probability function
On 5/16/23 21:35, Eric Blake wrote:> On Tue, May 16, 2023 at 07:20:16PM +0200, Laszlo Ersek wrote: >> On 5/16/23 14:12, Richard W.M. Jones wrote:>>> + else if (d2 == 0) /* N/0 is bad */ >>> + goto bad_parse; >>> + else >>> + d /= d2; >> >> I don't want to be a spoilsport, but this division makes me extra >> nervous [1], on top of parsing floating point *at all* [2]. >> >> [2] Eric has just posted a 19-part series, v2, for QEMU, for fixing the >> numerous issues in QEMU's floating point parser. Personal opinion: QEMU >> doesn't even have a *valid reason* for parsing floating point! All QEMU >> deals with are *integer quantities* such as disk images consisting of >> whole numbers of bytes! But again, that's just a personal opinion. > > No offense taken.To clarify, I didn't believe it could in any way offend you, technically speaking; I thought my opinion could perhaps be considered offensive by the author of the original QEMU feature, i.e. by the developer who had introduced floating point parsing originally to QEMU (I don't know who that developer was). You're "only" fixing bugs in an existing feature (and improving its test coverage). I'm challenging the existence of the entire feature. I do agree that *IF* the feature exists, it should be reasonably bug-free.> And while you are correct that my v2 series for > qemu clocked in at 19 patches, it doesn't necessarily translate to 19 > bugs fixed (although it was definitely more than one), nor were they > all in floating point (one of the bugs was in qemu_strtoui() which is > integer only). In short, my biggest time sink there was fixing > technical debt on needing a LOT more unit tests to make sure I wasn't > inadvertently breaking some corner case (and with strtod() or > scanf("%g"), there are a LOT of those). > > Still, being able to write 1.5G rather than 1610612736 explains why > qemu goes to the bother of attempting to allow certain floating point > strings.This is actually the core mistake, in my opinion. Preferring "one and half gigabytes" rather than "one billion, six hundred ten million, six hundred twelve thousand, seven hundred thirty six bytes", is understandable. What is *not* understandable, knowing the pitfalls of floating point (!), why the former form was equated with "parsing floating point". Note: I wrote "one and half gigabytes", not "one point five tenths gigabytes". Huge difference! In everyday speech, we use super simple fractions, with both numerator and denominator being very small natural numbers. And especially with multipliers such as K, M, G, T, which are powers of 2 (10, 20, 30, 40, respectively), I would argue that no denominators other than 2, 4 and maybe (!) 8 matter! (Who has ever said, "give me two thirds of a gigabyte"? Or "five sevenths"?) I totally don't see a user entering, as disk size, "1.234567G". I don't even see them entering "1.875G" ("one point eight hundred seventy five thousandths gigabytes"). At best I can see them say "two gigs minus an eighth gig", or "one plus seven eighths gig", but even for that, I need to squint. We don't actually *think* about sizes in numbers that have many digits after the decimal point. What we do is, we think in simple fractions, and have just learned how those simple fractions *look* when written as decimal numbers. So adding a general floating point parser was overkill; it must have looked attractive only because the complications of floating point were under-estimated. So QEMU should have added the following formats instead: - a[s] - b/c[s] - a+b/c[s] where "a" is your usual integer, [s] is an (optional) suffix like K or G, c may be 2, 4 or 8 (nothing else), and b may be [0 .. c-1]. If the suffix is omitted, it is taken as "byte", and then the second and third forms are not permitted (no fractions of bytes). Of course, hindsight is 20/20. I had learned to stay away from floating point very early on; topics in university like bounding errors and non-associativity of floating point addition made me look for articles on the web, and those had permanently convinced me that floating point is *unusable* unless you are able and willing to do complete numerical analysis in your C source code. In practice, C programmers don't even remember the integer promotions and the usual arithmetic conversions. :/ [...]>> (Note the round parens in the footnote: it means the boundaries are >> *excluded*. So, a value like -0.5 will be truncated to zero, and a value >> like UINT64_MAX + 0.5 will be truncated to UINT64_MAX, but (-1) itself >> and (UINT64_MAX + 1) itself are not permitted already.) > > I challenge you to find a system where 'double' can hold the value > UINT64_MAX + 0.5 with sufficient precision (IEEE double only requires > 51 bits of precision).This was just an example (independent of actual fp precision) to illustrate the meaning of the round parens in the footnote. "long double" would work equally well for the example.> In fact, some of the back-story of my work on > qemu's floating point parser was an earlier bug report, raised by > nbdkit's unit testing, when we discovered that qemu was rounding large > values (because of its trip through 'double') rather than matching an > integral parse (found when qemu was unable to produce large sparse > images as large as nbdkit because of the lost precision). > Semi-related, just last week, I also fixed a bug in glib where it was > assuming that the round trip (uint64_t)(long double)(uint64_t)value > would not lose precision, which happens to be the case on x86_64 but > is not true on other hardware;I remember :)> but here we're using sscanf with 'double' not 'long double'.Again: yes. This was only a tangent. I meant to quote the fp->integer conversion rule, and then I thought the related footnote deserved additional (but independent) reasoning. By that point I didn't feel tied to "double" any longer, really. [...]>> If Eric is OK with this patch set, I won't try to block it, it just >> makes me feel very uncomfortable. It's more than just scanf() lacking >> proper internal error handling; the real complications come when we >> perform operations on the floating point values. > > Our pre-existing error filter is already doing percentage calculations > via floating point, so on the grounds that this patch is consolidating > common code patterns to avoid reimplementation, we're good.Agreed.> But as you > say, representing things as a ratio between two integers may be more > maintainable than floating point where accuracy matters, or this may > be a case where we are explicit that use of the floating point > fractions as a percentage is intentionally liable to rounding issues.I think that, for using floating point, we should: - error-check any conversion from string to binary (object) representation, - bound the parsed values as early as possible, with an eye towards later use, - in each subsequent operation, show that the result will be in range, and (hopefully) will not see catastrophic loss of precision. I don't insist on this; it's just what I could explicitly ACK.> Still, avoiding undefined behavior when the division can overflow > (whether that be the overflow to floating point infinity or even the > overflow beyond UINT64_MAX), does make it worth questioning if > floating point is our best representation, or at a minimum if we want > to be stricter in what we parse before passing a substring on to > strtod() so that we have more control over the values (part of my 19 > patches to qemu was explicitly truncating any string at 'e' or 'E' so > that strtod() could not do an exponent parse that would inadverently > hit overflow to INFINITY).*shudder*. :) I didn't read your patches in detail BTW. I went through the subjects in order, and looked at some details. I could tell it required an immense amount of thinking and work. Dealing with (unsigned!) integers is less work, IMO. Thanks! Laszlo