thr3ads.net - dtrace discuss - [dtrace-discuss] Aggregations and negative values [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Alexander Kolbasov

2007-Oct-31 22:48 UTC

[dtrace-discuss] Aggregations and negative values

I was trying to get an average value of a bunch of numbers which could be 
either positive or negative to see whether the average is close to zero. It 
turns out that aggregations (including max, min and avg) treat values as 
unsigned and, thus. do not work as expected for negative values. Do you think 
it is a valid RFE to have version of these aggregations which work on signed 
data?

- akolb

Bryan Cantrill

2007-Oct-31 22:57 UTC

head link

[dtrace-discuss] Aggregations and negative values

On Wed, Oct 31, 2007 at 03:48:53PM -0700, Alexander Kolbasov
wrote:> I was trying to get an average value of a bunch of numbers which could be 
> either positive or negative to see whether the average is close to zero. It
> turns out that aggregations (including max, min and avg) treat values as 
> unsigned and, thus. do not work as expected for negative values. Do you
think
> it is a valid RFE to have version of these aggregations which work on
signed
> data?
Yes, definitely valid -- especially since quantize()/lquantize() already
correctly deal with negative values.  Apologies for that; not quite sure
why I didn''t address max()/min()/avg() when I was getting quantize()
and
lquantize() to correctly deal with negative data...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Sun Microsystems FishWorks.       http://blogs.sun.com/bmc

Alexander Kolbasov

2007-Oct-31 23:27 UTC

head link

[dtrace-discuss] Aggregations and negative values

> 
> On Wed, Oct 31, 2007 at 03:48:53PM -0700, Alexander Kolbasov wrote:
> > I was trying to get an average value of a bunch of numbers which could
be
> > either positive or negative to see whether the average is close to
zero. It
> > turns out that aggregations (including max, min and avg) treat values
as
> > unsigned and, thus. do not work as expected for negative values. Do
you think
> > it is a valid RFE to have version of these aggregations which work on
signed
> > data?
> 
> Yes, definitely valid -- especially since quantize()/lquantize() already
> correctly deal with negative values.  Apologies for that; not quite sure
> why I didn''t address max()/min()/avg() when I was getting
quantize() and
> lquantize() to correctly deal with negative data...
So do you think that "the right thing" to do is to fix
max()/min()/avg() to
treat arguments as signed or to keep these intact and have another set
with different names (smax()?) to preserve compatibility with whatever scripts 
assume the old semantics?

- akolb

Bryan Cantrill

2007-Oct-31 23:44 UTC

head link

[dtrace-discuss] Aggregations and negative values

> > > I was trying to get an average value of a bunch of numbers which
could be
> > > either positive or negative to see whether the average is close
to zero. It
> > > turns out that aggregations (including max, min and avg) treat
values as
> > > unsigned and, thus. do not work as expected for negative values.
Do you think
> > > it is a valid RFE to have version of these aggregations which
work on signed
> > > data?
> > 
> > Yes, definitely valid -- especially since quantize()/lquantize()
already
> > correctly deal with negative values.  Apologies for that; not quite
sure
> > why I didn''t address max()/min()/avg() when I was getting
quantize() and
> > lquantize() to correctly deal with negative data...
> 
> So do you think that "the right thing" to do is to fix
max()/min()/avg() to
> treat arguments as signed or to keep these intact and have another set
> with different names (smax()?) to preserve compatibility with whatever
scripts
> assume the old semantics?
While I reserve the right to be wrong about this, I would say fix max()/
min()/avg().  That''s what I did to quantize()/lquantize(), and no one
noticed -- unsigned values that large are rather rare in the real world
and (as Chad points out on another thread), they''re prone to overflow
silently today anyway.  It''s hard to imagine that fixing this will
break
anyone...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Sun Microsystems FishWorks.       http://blogs.sun.com/bmc

Alexander Kolbasov

2007-Nov-01 00:08 UTC

head link

[dtrace-discuss] Aggregations and negative values

Filed 

6624541 dtrace aggregations should assume signed arguments
> 
> > > > I was trying to get an average value of a bunch of numbers
which could be
> > > > either positive or negative to see whether the average is
close to zero. It
> > > > turns out that aggregations (including max, min and avg)
treat values as
> > > > unsigned and, thus. do not work as expected for negative
values. Do you think
> > > > it is a valid RFE to have version of these aggregations
which work on signed
> > > > data?
> > > 
> > > Yes, definitely valid -- especially since quantize()/lquantize()
already
> > > correctly deal with negative values.  Apologies for that; not
quite sure
> > > why I didn''t address max()/min()/avg() when I was
getting quantize() and
> > > lquantize() to correctly deal with negative data...
> > 
> > So do you think that "the right thing" to do is to fix
max()/min()/avg() to
> > treat arguments as signed or to keep these intact and have another set
> > with different names (smax()?) to preserve compatibility with whatever
scripts
> > assume the old semantics?
> 
> While I reserve the right to be wrong about this, I would say fix max()/
> min()/avg().  That''s what I did to quantize()/lquantize(), and no
one
> noticed -- unsigned values that large are rather rare in the real world
> and (as Chad points out on another thread), they''re prone to
overflow
> silently today anyway.  It''s hard to imagine that fixing this will
break
> anyone...
> 
> 	- Bryan

James Carlson

2007-Nov-01 13:00 UTC

head link

[dtrace-discuss] Aggregations and negative values

Alexander Kolbasov writes:> Filed 
> 
> 6624541 dtrace aggregations should assume signed arguments
Perhaps a silly question ... but why not just follow the signedness of
the argument provided?

If the user provides an unsigned data type to the aggregation, then do
unsigned arithmetic on it.  If he provides a signed type, do signed
arithmetic.

(Just like the Fortran intrinsic functions.  ;-})

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Alexander Kolbasov

2007-Nov-01 18:11 UTC

head link

[dtrace-discuss] Aggregations and negative values

> Alexander Kolbasov writes:
> > Filed 
> > 
> > 6624541 dtrace aggregations should assume signed arguments
> 
> Perhaps a silly question ... but why not just follow the signedness of
> the argument provided?
> 
> If the user provides an unsigned data type to the aggregation, then do
> unsigned arithmetic on it.  If he provides a signed type, do signed
> arithmetic.
This would be nice, but I don''t know whether DTrace framework allows
passing
argument types to aggregation functions.

> -- 
> James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
> Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
> MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
>

James Carlson

2007-Nov-01 18:25 UTC

head link

[dtrace-discuss] Aggregations and negative values

Alexander Kolbasov writes:> > Alexander Kolbasov writes:
> > > Filed 
> > > 
> > > 6624541 dtrace aggregations should assume signed arguments
> > 
> > Perhaps a silly question ... but why not just follow the signedness of
> > the argument provided?
> > 
> > If the user provides an unsigned data type to the aggregation, then do
> > unsigned arithmetic on it.  If he provides a signed type, do signed
> > arithmetic.
> 
> This would be nice, but I don''t know whether DTrace framework
allows passing
> argument types to aggregation functions.
I''m not sure I understand.

If I say this:

	@foo[execname] = sum(bar);

Doesn''t ''bar'' have some type, at least by default?  I
had thought that
dtrace tracked types dynamically.  It seems to do so when I cast a
value and assign it to "self->blah".  It also seems to keep track
of
types if I try to pass in something that can''t work.

Why doesn''t the expression have a type?

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Alexander Kolbasov

2007-Nov-01 18:41 UTC

head link

[dtrace-discuss] Aggregations and negative values

> Alexander Kolbasov writes:
> > > Alexander Kolbasov writes:
> > > > Filed 
> > > > 
> > > > 6624541 dtrace aggregations should assume signed arguments
> > > 
> > > Perhaps a silly question ... but why not just follow the
signedness of
> > > the argument provided?
> > > 
> > > If the user provides an unsigned data type to the aggregation,
then do
> > > unsigned arithmetic on it.  If he provides a signed type, do
signed
> > > arithmetic.
> > 
> > This would be nice, but I don''t know whether DTrace framework
allows passing
> > argument types to aggregation functions.
> 
> I''m not sure I understand.
> 
> If I say this:
> 
> 	@foo[execname] = sum(bar);
> 
> Doesn''t ''bar'' have some type, at least by
default?  I had thought that
> dtrace tracked types dynamically.  It seems to do so when I cast a
> value and assign it to "self->blah".  It also seems to keep
track of
> types if I try to pass in something that can''t work.
> 
> Why doesn''t the expression have a type?
IMO values are uint64_t by default. You can say 

@foo[execname] = sum((int)bar)

But, aggregation functions in common/dtrace/dtrace.c look like this:

/*ARGSUSED*/
static void
dtrace_aggregate_avg(uint64_t *data, uint64_t nval, uint64_t arg)
{
        data[0]++;
        data[1] += nval;
}

so even if the argument type information is available somewhere it is not 
passed to agregation implementation. I don''t know how involved is
passing the
type information into various aggregation routines.

- akolb

Michael Shapiro

2007-Nov-01 18:51 UTC

head link

[dtrace-discuss] Aggregations and negative values

> > 
> > I''m not sure I understand.
> > 
> > If I say this:
> > 
> > 	@foo[execname] = sum(bar);
> > 
> > Doesn''t ''bar'' have some type, at least by
default?  I had thought that
> > dtrace tracked types dynamically.  It seems to do so when I cast a
> > value and assign it to "self->blah".  It also seems to
keep track of
> > types if I try to pass in something that can''t work.
> > 
> > Why doesn''t the expression have a type?
> 
> IMO values are uint64_t by default. You can say 
> 
> @foo[execname] = sum((int)bar)
> 
> But, aggregation functions in common/dtrace/dtrace.c look like this:
> 
> /*ARGSUSED*/
> static void
> dtrace_aggregate_avg(uint64_t *data, uint64_t nval, uint64_t arg)
> {
>         data[0]++;
>         data[1] += nval;
> }
> 
> so even if the argument type information is available somewhere it is not 
> passed to agregation implementation. I don''t know how involved is
passing the
> type information into various aggregation routines.
> 
> - akolb
Every aggregating function has to have a defined, fixed type, that determines
how its implementation in kernel and userland, a portion of which is quoted
above, performs the underlying arithmetic.  Since our original release, the
type signatures of the aggregating functions to which you are referring
have their signature as: sum(uint64_t arg), and so on.

Every input expression has a type, and just as with a normal function call,
that type is either compatible with the type signature of the aggregating
function, or not (a syntax error).  If it is compatible, the Normal
Arithmetic Conversions are applied as part of passing that parameter:
i.e. you can cast (int)bar, but since the signature is uint64_t, the normal
conversions are applied to transform that int to uint64_t as per ISO C.

I can''t see how we can change the type signatures of the aggregating
functions
without breaking existing program behavior -- if you want signed versions of
these things, then we would need to introduce separate signed equivalents.

-Mike

-- 
Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/

Alexander Kolbasov

2007-Nov-01 19:15 UTC

head link

[dtrace-discuss] Aggregations and negative values

Mike,
> Every aggregating function has to have a defined, fixed type, that
determines
> how its implementation in kernel and userland, a portion of which is quoted
> above, performs the underlying arithmetic.  Since our original release, the
> type signatures of the aggregating functions to which you are referring
> have their signature as: sum(uint64_t arg), and so on.
As Bryan mentioned earlier, quantize()/lquantize() also treat their arguments 
as uint64_t but in the kernel implementation convert it to int64_t. Can the 
same thing be done for max()/min()/avg()?

- akolb

James Carlson

2007-Nov-01 19:27 UTC

head link

[dtrace-discuss] Aggregations and negative values

Michael Shapiro writes:> Every aggregating function has to have a defined, fixed type, that
determines
> how its implementation in kernel and userland, a portion of which is quoted
> above, performs the underlying arithmetic.  Since our original release, the
> type signatures of the aggregating functions to which you are referring
> have their signature as: sum(uint64_t arg), and so on.
Ah, that''s too bad.

The Fortran intrinsics I was referring to are essentially polymorphic
functions (yes, 20-odd years before O-O) that look like this:

	uint64_t sum(uint64_t);
	int64_t sum(int64_t);

... where the compiler picks the right function (there are actually
two or more) based on the context.  It''d be pretty helpful here, and a
shame if it can''t be done.

-- 
James Carlson, Solaris Networking              <james.d.carlson at
sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Michael Shapiro

2007-Nov-01 19:41 UTC

head link

[dtrace-discuss] Aggregations and negative values

> Mike,
> 
> > Every aggregating function has to have a defined, fixed type, that
determines
> > how its implementation in kernel and userland, a portion of which is
quoted
> > above, performs the underlying arithmetic.  Since our original
release, the
> > type signatures of the aggregating functions to which you are
referring
> > have their signature as: sum(uint64_t arg), and so on.
> 
> As Bryan mentioned earlier, quantize()/lquantize() also treat their
arguments
> as uint64_t but in the kernel implementation convert it to int64_t. Can the
> same thing be done for max()/min()/avg()?
> 
> - akolb
Not without breaking program compatibility, because the type signatures
we assigned to the aggregating functions also determine the default
base that is used when printf()''ing an aggregation, and therefore would
break the output of programs that rely or desire the unsigned behavior.

-Mike

-- 
Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/

Michael Shapiro

2007-Nov-01 19:44 UTC

head link

[dtrace-discuss] Aggregations and negative values

> Michael Shapiro writes:
> > Every aggregating function has to have a defined, fixed type, that
determines
> > how its implementation in kernel and userland, a portion of which is
quoted
> > above, performs the underlying arithmetic.  Since our original
release, the
> > type signatures of the aggregating functions to which you are
referring
> > have their signature as: sum(uint64_t arg), and so on.
> 
> Ah, that''s too bad.
> 
> The Fortran intrinsics I was referring to are essentially polymorphic
> functions (yes, 20-odd years before O-O) that look like this:
> 
> 	uint64_t sum(uint64_t);
> 	int64_t sum(int64_t);
> 
> ... where the compiler picks the right function (there are actually
> two or more) based on the context.  It''d be pretty helpful here,
and a
> shame if it can''t be done.
That can certainly be done, it just isn''t the way it works now.  That
too
can break program compatibility if someone was assuming at present that
signed data is converted to unsigned by virtue of conversion in the
absence of an explicit cast, but we have a versioning mechanism in DTrace
that can be used to handle this type of upgrade.

Sasha: either close the bug you filed, or change it to an RFE requesting
that we either introduce a signed alternative, implemented either as another
choice of aggregating function name or the above polymorphism.

-Mike

-- 
Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/

Bryan Cantrill

2007-Nov-01 20:33 UTC

head link

[dtrace-discuss] Aggregations and negative values

> > > Every aggregating function has to have a defined, fixed type,
that determines
> > > how its implementation in kernel and userland, a portion of which
is quoted
> > > above, performs the underlying arithmetic.  Since our original
release, the
> > > type signatures of the aggregating functions to which you are
referring
> > > have their signature as: sum(uint64_t arg), and so on.
> > 
> > As Bryan mentioned earlier, quantize()/lquantize() also treat their
arguments
> > as uint64_t but in the kernel implementation convert it to int64_t.
Can the
> > same thing be done for max()/min()/avg()?
> > 
> > - akolb
> 
> Not without breaking program compatibility, because the type signatures
> we assigned to the aggregating functions also determine the default
> base that is used when printf()''ing an aggregation, and therefore
would
> break the output of programs that rely or desire the unsigned behavior.
Mike and I just talked about this -- he didn''t realize that we already
have precedent for doing this in the fix (and patch) for 6320439. 
And because we print aggregation results out as signed 64-bit values (that
is "dtrace -n BEGIN''{ @ = max(-10); exit(0); }''"
does what you think it
should), the only D program that this could break would be one relying
on aggregating values above 2^63 -- and as Chad pointed out, these
can silently overflow and break anyway.

So I think we can go ahead and fix that bug as filed.  And yes, passing
the type information down to the kernel would be a cleaner way of doing
it -- but that''s also a hell of a lot more work, and it only benefits
those that are aggregating values greater than 2^63...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Sun Microsystems FishWorks.       http://blogs.sun.com/bmc

Peter Lawrence

2007-Nov-01 21:05 UTC

head link

[dtrace-discuss] Aggregations and negative values

Bryan Cantrill wrote On 11/01/07 13:33,:>>>>Every aggregating function has to have a defined, fixed type,
that determines
>>>>how its implementation in kernel and userland, a portion of
which is quoted
>>>>above, performs the underlying arithmetic.  Since our original
release, the
>>>>type signatures of the aggregating functions to which you are
referring
>>>>have their signature as: sum(uint64_t arg), and so on.
>>>
>>>As Bryan mentioned earlier, quantize()/lquantize() also treat their
arguments
>>>as uint64_t but in the kernel implementation convert it to int64_t.
Can the
>>>same thing be done for max()/min()/avg()?
>>>
>>>- akolb
>>
>>Not without breaking program compatibility, because the type signatures
>>we assigned to the aggregating functions also determine the default
>>base that is used when printf()''ing an aggregation, and
therefore would
>>break the output of programs that rely or desire the unsigned behavior.
> 
> 
> Mike and I just talked about this -- he didn''t realize that we
already
> have precedent for doing this in the fix (and patch) for 6320439. 
> And because we print aggregation results out as signed 64-bit values (that
> is "dtrace -n BEGIN''{ @ = max(-10); exit(0);
}''" does what you think it
> should), the only D program that this could break would be one relying
> on aggregating values above 2^63 -- and as Chad pointed out, these
> can silently overflow and break anyway.
> 
> So I think we can go ahead and fix that bug as filed.  And yes, passing
> the type information down to the kernel would be a cleaner way of doing
> it -- but that''s also a hell of a lot more work, and it only
benefits
> those that are aggregating values greater than 2^63...
+1

<SOAPBOX>
not having unsigned is one of the things they did right in Java.
</SOAPBOX>

-Pete.


> 
> 	- Bryan
> 
> --------------------------------------------------------------------------
> Bryan Cantrill, Sun Microsystems FishWorks.       http://blogs.sun.com/bmc
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

dtrace discuss - Oct 2007 - Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values

[dtrace-discuss] Aggregations and negative values