Alexander Kolbasov
2007-Oct-31 22:48 UTC
[dtrace-discuss] Aggregations and negative values
I was trying to get an average value of a bunch of numbers which could be either positive or negative to see whether the average is close to zero. It turns out that aggregations (including max, min and avg) treat values as unsigned and, thus. do not work as expected for negative values. Do you think it is a valid RFE to have version of these aggregations which work on signed data? - akolb
On Wed, Oct 31, 2007 at 03:48:53PM -0700, Alexander Kolbasov wrote:> I was trying to get an average value of a bunch of numbers which could be > either positive or negative to see whether the average is close to zero. It > turns out that aggregations (including max, min and avg) treat values as > unsigned and, thus. do not work as expected for negative values. Do you think > it is a valid RFE to have version of these aggregations which work on signed > data?Yes, definitely valid -- especially since quantize()/lquantize() already correctly deal with negative values. Apologies for that; not quite sure why I didn''t address max()/min()/avg() when I was getting quantize() and lquantize() to correctly deal with negative data... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Sun Microsystems FishWorks. http://blogs.sun.com/bmc
Alexander Kolbasov
2007-Oct-31 23:27 UTC
[dtrace-discuss] Aggregations and negative values
> > On Wed, Oct 31, 2007 at 03:48:53PM -0700, Alexander Kolbasov wrote: > > I was trying to get an average value of a bunch of numbers which could be > > either positive or negative to see whether the average is close to zero. It > > turns out that aggregations (including max, min and avg) treat values as > > unsigned and, thus. do not work as expected for negative values. Do you think > > it is a valid RFE to have version of these aggregations which work on signed > > data? > > Yes, definitely valid -- especially since quantize()/lquantize() already > correctly deal with negative values. Apologies for that; not quite sure > why I didn''t address max()/min()/avg() when I was getting quantize() and > lquantize() to correctly deal with negative data...So do you think that "the right thing" to do is to fix max()/min()/avg() to treat arguments as signed or to keep these intact and have another set with different names (smax()?) to preserve compatibility with whatever scripts assume the old semantics? - akolb
> > > I was trying to get an average value of a bunch of numbers which could be > > > either positive or negative to see whether the average is close to zero. It > > > turns out that aggregations (including max, min and avg) treat values as > > > unsigned and, thus. do not work as expected for negative values. Do you think > > > it is a valid RFE to have version of these aggregations which work on signed > > > data? > > > > Yes, definitely valid -- especially since quantize()/lquantize() already > > correctly deal with negative values. Apologies for that; not quite sure > > why I didn''t address max()/min()/avg() when I was getting quantize() and > > lquantize() to correctly deal with negative data... > > So do you think that "the right thing" to do is to fix max()/min()/avg() to > treat arguments as signed or to keep these intact and have another set > with different names (smax()?) to preserve compatibility with whatever scripts > assume the old semantics?While I reserve the right to be wrong about this, I would say fix max()/ min()/avg(). That''s what I did to quantize()/lquantize(), and no one noticed -- unsigned values that large are rather rare in the real world and (as Chad points out on another thread), they''re prone to overflow silently today anyway. It''s hard to imagine that fixing this will break anyone... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Sun Microsystems FishWorks. http://blogs.sun.com/bmc
Alexander Kolbasov
2007-Nov-01 00:08 UTC
[dtrace-discuss] Aggregations and negative values
Filed 6624541 dtrace aggregations should assume signed arguments> > > > > I was trying to get an average value of a bunch of numbers which could be > > > > either positive or negative to see whether the average is close to zero. It > > > > turns out that aggregations (including max, min and avg) treat values as > > > > unsigned and, thus. do not work as expected for negative values. Do you think > > > > it is a valid RFE to have version of these aggregations which work on signed > > > > data? > > > > > > Yes, definitely valid -- especially since quantize()/lquantize() already > > > correctly deal with negative values. Apologies for that; not quite sure > > > why I didn''t address max()/min()/avg() when I was getting quantize() and > > > lquantize() to correctly deal with negative data... > > > > So do you think that "the right thing" to do is to fix max()/min()/avg() to > > treat arguments as signed or to keep these intact and have another set > > with different names (smax()?) to preserve compatibility with whatever scripts > > assume the old semantics? > > While I reserve the right to be wrong about this, I would say fix max()/ > min()/avg(). That''s what I did to quantize()/lquantize(), and no one > noticed -- unsigned values that large are rather rare in the real world > and (as Chad points out on another thread), they''re prone to overflow > silently today anyway. It''s hard to imagine that fixing this will break > anyone... > > - Bryan
Alexander Kolbasov writes:> Filed > > 6624541 dtrace aggregations should assume signed argumentsPerhaps a silly question ... but why not just follow the signedness of the argument provided? If the user provides an unsigned data type to the aggregation, then do unsigned arithmetic on it. If he provides a signed type, do signed arithmetic. (Just like the Fortran intrinsic functions. ;-}) -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Alexander Kolbasov
2007-Nov-01 18:11 UTC
[dtrace-discuss] Aggregations and negative values
> Alexander Kolbasov writes: > > Filed > > > > 6624541 dtrace aggregations should assume signed arguments > > Perhaps a silly question ... but why not just follow the signedness of > the argument provided? > > If the user provides an unsigned data type to the aggregation, then do > unsigned arithmetic on it. If he provides a signed type, do signed > arithmetic.This would be nice, but I don''t know whether DTrace framework allows passing argument types to aggregation functions.> -- > James Carlson, Solaris Networking <james.d.carlson at sun.com> > Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 > MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 >
Alexander Kolbasov writes:> > Alexander Kolbasov writes: > > > Filed > > > > > > 6624541 dtrace aggregations should assume signed arguments > > > > Perhaps a silly question ... but why not just follow the signedness of > > the argument provided? > > > > If the user provides an unsigned data type to the aggregation, then do > > unsigned arithmetic on it. If he provides a signed type, do signed > > arithmetic. > > This would be nice, but I don''t know whether DTrace framework allows passing > argument types to aggregation functions.I''m not sure I understand. If I say this: @foo[execname] = sum(bar); Doesn''t ''bar'' have some type, at least by default? I had thought that dtrace tracked types dynamically. It seems to do so when I cast a value and assign it to "self->blah". It also seems to keep track of types if I try to pass in something that can''t work. Why doesn''t the expression have a type? -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Alexander Kolbasov
2007-Nov-01 18:41 UTC
[dtrace-discuss] Aggregations and negative values
> Alexander Kolbasov writes: > > > Alexander Kolbasov writes: > > > > Filed > > > > > > > > 6624541 dtrace aggregations should assume signed arguments > > > > > > Perhaps a silly question ... but why not just follow the signedness of > > > the argument provided? > > > > > > If the user provides an unsigned data type to the aggregation, then do > > > unsigned arithmetic on it. If he provides a signed type, do signed > > > arithmetic. > > > > This would be nice, but I don''t know whether DTrace framework allows passing > > argument types to aggregation functions. > > I''m not sure I understand. > > If I say this: > > @foo[execname] = sum(bar); > > Doesn''t ''bar'' have some type, at least by default? I had thought that > dtrace tracked types dynamically. It seems to do so when I cast a > value and assign it to "self->blah". It also seems to keep track of > types if I try to pass in something that can''t work. > > Why doesn''t the expression have a type?IMO values are uint64_t by default. You can say @foo[execname] = sum((int)bar) But, aggregation functions in common/dtrace/dtrace.c look like this: /*ARGSUSED*/ static void dtrace_aggregate_avg(uint64_t *data, uint64_t nval, uint64_t arg) { data[0]++; data[1] += nval; } so even if the argument type information is available somewhere it is not passed to agregation implementation. I don''t know how involved is passing the type information into various aggregation routines. - akolb
> > > > I''m not sure I understand. > > > > If I say this: > > > > @foo[execname] = sum(bar); > > > > Doesn''t ''bar'' have some type, at least by default? I had thought that > > dtrace tracked types dynamically. It seems to do so when I cast a > > value and assign it to "self->blah". It also seems to keep track of > > types if I try to pass in something that can''t work. > > > > Why doesn''t the expression have a type? > > IMO values are uint64_t by default. You can say > > @foo[execname] = sum((int)bar) > > But, aggregation functions in common/dtrace/dtrace.c look like this: > > /*ARGSUSED*/ > static void > dtrace_aggregate_avg(uint64_t *data, uint64_t nval, uint64_t arg) > { > data[0]++; > data[1] += nval; > } > > so even if the argument type information is available somewhere it is not > passed to agregation implementation. I don''t know how involved is passing the > type information into various aggregation routines. > > - akolbEvery aggregating function has to have a defined, fixed type, that determines how its implementation in kernel and userland, a portion of which is quoted above, performs the underlying arithmetic. Since our original release, the type signatures of the aggregating functions to which you are referring have their signature as: sum(uint64_t arg), and so on. Every input expression has a type, and just as with a normal function call, that type is either compatible with the type signature of the aggregating function, or not (a syntax error). If it is compatible, the Normal Arithmetic Conversions are applied as part of passing that parameter: i.e. you can cast (int)bar, but since the signature is uint64_t, the normal conversions are applied to transform that int to uint64_t as per ISO C. I can''t see how we can change the type signatures of the aggregating functions without breaking existing program behavior -- if you want signed versions of these things, then we would need to introduce separate signed equivalents. -Mike -- Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/
Alexander Kolbasov
2007-Nov-01 19:15 UTC
[dtrace-discuss] Aggregations and negative values
Mike,> Every aggregating function has to have a defined, fixed type, that determines > how its implementation in kernel and userland, a portion of which is quoted > above, performs the underlying arithmetic. Since our original release, the > type signatures of the aggregating functions to which you are referring > have their signature as: sum(uint64_t arg), and so on.As Bryan mentioned earlier, quantize()/lquantize() also treat their arguments as uint64_t but in the kernel implementation convert it to int64_t. Can the same thing be done for max()/min()/avg()? - akolb
Michael Shapiro writes:> Every aggregating function has to have a defined, fixed type, that determines > how its implementation in kernel and userland, a portion of which is quoted > above, performs the underlying arithmetic. Since our original release, the > type signatures of the aggregating functions to which you are referring > have their signature as: sum(uint64_t arg), and so on.Ah, that''s too bad. The Fortran intrinsics I was referring to are essentially polymorphic functions (yes, 20-odd years before O-O) that look like this: uint64_t sum(uint64_t); int64_t sum(int64_t); ... where the compiler picks the right function (there are actually two or more) based on the context. It''d be pretty helpful here, and a shame if it can''t be done. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
> Mike, > > > Every aggregating function has to have a defined, fixed type, that determines > > how its implementation in kernel and userland, a portion of which is quoted > > above, performs the underlying arithmetic. Since our original release, the > > type signatures of the aggregating functions to which you are referring > > have their signature as: sum(uint64_t arg), and so on. > > As Bryan mentioned earlier, quantize()/lquantize() also treat their arguments > as uint64_t but in the kernel implementation convert it to int64_t. Can the > same thing be done for max()/min()/avg()? > > - akolbNot without breaking program compatibility, because the type signatures we assigned to the aggregating functions also determine the default base that is used when printf()''ing an aggregation, and therefore would break the output of programs that rely or desire the unsigned behavior. -Mike -- Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/
> Michael Shapiro writes: > > Every aggregating function has to have a defined, fixed type, that determines > > how its implementation in kernel and userland, a portion of which is quoted > > above, performs the underlying arithmetic. Since our original release, the > > type signatures of the aggregating functions to which you are referring > > have their signature as: sum(uint64_t arg), and so on. > > Ah, that''s too bad. > > The Fortran intrinsics I was referring to are essentially polymorphic > functions (yes, 20-odd years before O-O) that look like this: > > uint64_t sum(uint64_t); > int64_t sum(int64_t); > > ... where the compiler picks the right function (there are actually > two or more) based on the context. It''d be pretty helpful here, and a > shame if it can''t be done.That can certainly be done, it just isn''t the way it works now. That too can break program compatibility if someone was assuming at present that signed data is converted to unsigned by virtue of conversion in the absence of an explicit cast, but we have a versioning mechanism in DTrace that can be used to handle this type of upgrade. Sasha: either close the bug you filed, or change it to an RFE requesting that we either introduce a signed alternative, implemented either as another choice of aggregating function name or the above polymorphism. -Mike -- Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/
> > > Every aggregating function has to have a defined, fixed type, that determines > > > how its implementation in kernel and userland, a portion of which is quoted > > > above, performs the underlying arithmetic. Since our original release, the > > > type signatures of the aggregating functions to which you are referring > > > have their signature as: sum(uint64_t arg), and so on. > > > > As Bryan mentioned earlier, quantize()/lquantize() also treat their arguments > > as uint64_t but in the kernel implementation convert it to int64_t. Can the > > same thing be done for max()/min()/avg()? > > > > - akolb > > Not without breaking program compatibility, because the type signatures > we assigned to the aggregating functions also determine the default > base that is used when printf()''ing an aggregation, and therefore would > break the output of programs that rely or desire the unsigned behavior.Mike and I just talked about this -- he didn''t realize that we already have precedent for doing this in the fix (and patch) for 6320439. And because we print aggregation results out as signed 64-bit values (that is "dtrace -n BEGIN''{ @ = max(-10); exit(0); }''" does what you think it should), the only D program that this could break would be one relying on aggregating values above 2^63 -- and as Chad pointed out, these can silently overflow and break anyway. So I think we can go ahead and fix that bug as filed. And yes, passing the type information down to the kernel would be a cleaner way of doing it -- but that''s also a hell of a lot more work, and it only benefits those that are aggregating values greater than 2^63... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Sun Microsystems FishWorks. http://blogs.sun.com/bmc
Bryan Cantrill wrote On 11/01/07 13:33,:>>>>Every aggregating function has to have a defined, fixed type, that determines >>>>how its implementation in kernel and userland, a portion of which is quoted >>>>above, performs the underlying arithmetic. Since our original release, the >>>>type signatures of the aggregating functions to which you are referring >>>>have their signature as: sum(uint64_t arg), and so on. >>> >>>As Bryan mentioned earlier, quantize()/lquantize() also treat their arguments >>>as uint64_t but in the kernel implementation convert it to int64_t. Can the >>>same thing be done for max()/min()/avg()? >>> >>>- akolb >> >>Not without breaking program compatibility, because the type signatures >>we assigned to the aggregating functions also determine the default >>base that is used when printf()''ing an aggregation, and therefore would >>break the output of programs that rely or desire the unsigned behavior. > > > Mike and I just talked about this -- he didn''t realize that we already > have precedent for doing this in the fix (and patch) for 6320439. > And because we print aggregation results out as signed 64-bit values (that > is "dtrace -n BEGIN''{ @ = max(-10); exit(0); }''" does what you think it > should), the only D program that this could break would be one relying > on aggregating values above 2^63 -- and as Chad pointed out, these > can silently overflow and break anyway. > > So I think we can go ahead and fix that bug as filed. And yes, passing > the type information down to the kernel would be a cleaner way of doing > it -- but that''s also a hell of a lot more work, and it only benefits > those that are aggregating values greater than 2^63...+1 <SOAPBOX> not having unsigned is one of the things they did right in Java. </SOAPBOX> -Pete.> > - Bryan > > -------------------------------------------------------------------------- > Bryan Cantrill, Sun Microsystems FishWorks. http://blogs.sun.com/bmc > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org