thr3ads.net - R devel - [Rd] [External] Re: zapsmall(x) for scalar x [Dec 2023]

If this information is useful, please help other people find it:
Share via:

Serguei Sokol

2023-Dec-18 09:29 UTC

[Rd] [External] Re: zapsmall(x) for scalar x

Le 17/12/2023 ? 18:26, Barry Rowlingson a ?crit?:> I think what's been missed is that zapsmall works relative to the
absolute
> largest value in the vector. Hence if there's only one
> item in the vector, it is the largest, so its not zapped. The
function's
> raison d'etre isn't to replace absolutely small values,
> but small values relative to the largest. Hence a vector of similar tiny
> values doesn't get zapped.
>
> Maybe the line in the docs:
>
> " (compared with the maximal absolute value)"
>
> needs to read:
>
> " (compared with the maximal absolute value in the vector)"I agree that this change in the doc would clarify the situation but 
would not resolve proposed corner cases.
I think that an additional argument 'mx' (absolute max value of 
reference) would do. Consider:

zapsmall2 <-
function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE))
{
 ??? if (length(digits) == 0L)
 ??????? stop("invalid 'digits'")
 ??? if (all(ina <- is.na(x)))
 ??????? return(x)
 ??? round(x, digits = if (mx > 0) max(0L, digits - 
as.numeric(log10(mx))) else digits)
}

then zapsmall2() without explicit 'mx' behaves identically to actual 
zapsmall() and for a scalar or a vector of identical value, user can 
manually fix the scale of what should be considered as small:

 > zapsmall2(y)
[1] 2.220446e-16
 > zapsmall2(y, mx=1)
[1] 0
 > zapsmall2(c(y, y), mx=1)
[1] 0 0
 > zapsmall2(c(y, NA))
[1] 2.220446e-16?????????? NA
 > zapsmall2(c(y, NA), mx=1)
[1]? 0 NA

Obviously, the name 'zapsmall2' was chosen just for this explanation. 
The original name 'zapsmall' could be reused as a full backward 
compatibility is preserved.

Best,
Serguei.
>
> Barry
>
>
>
>
>
> On Sun, Dec 17, 2023 at 2:17?PM Duncan Murdoch <murdoch.duncan at
gmail.com>
> wrote:
>
>> This email originated outside the University. Check before clicking
links
>> or attachments.
>>
>> I'm really confused.  Steve's example wasn't a scalar x, it
was a
>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and I
don't
>> see why summary() would if it was using your proposal.
>>
>> Duncan Murdoch
>>
>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
>>> Isn?t that the correct outcome?  The user can change the number of
>> digits if they want to see small values?
>>>
>>> --
>>> Change your thoughts and you change the world.
>>> --Dr. Norman Vincent Peale
>>>
>>>> On Dec 17, 2023, at 12:11?AM, Steve Martin <stevemartin041
at gmail.com>
>> wrote:
>>>> ?Zapping a vector of small numbers to zero would cause problems
when
>>>> printing the results of summary(). For example, if
>>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0)
then
>>>> print(summary(2.220446e-16), digits = 7) would print
>>>>     Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>          0          0            0           0           0     
0
>>>>
>>>> The same problem can also appear when printing the results of
>>>> summary.glm() with show.residuals = TRUE if there's little
dispersion
>>>> in the residuals.
>>>>
>>>> Steve
>>>>
>>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg at
warnes.net> wrote:
>>>>>
>>>>> I was quite suprised to discover that applying `zapsmall`
to a scalar
>> value has no apparent effect.  For example:
>>>>>> y <- 2.220446e-16
>>>>>> zapsmall(y,)
>>>>> [1] 2.2204e-16
>>>>>
>>>>> I was expecting zapsmall(x)` to act like
>>>>>
>>>>>> round(y, digits=getOption('digits'))
>>>>> [1] 0
>>>>>
>>>>> Looking at the current source code, indicates that
`zapsmall` is
>> expecting a vector:
>>>>> zapsmall <-
>>>>> function (x, digits = getOption("digits"))
>>>>> {
>>>>>      if (length(digits) == 0L)
>>>>>          stop("invalid 'digits'")
>>>>>      if (all(ina <- is.na(x)))
>>>>>          return(x)
>>>>>      mx <- max(abs(x[!ina]))
>>>>>      round(x, digits = if (mx > 0) max(0L, digits -
>> as.numeric(log10(mx))) else digits)
>>>>> }
>>>>>
>>>>> If `x` is a non-zero scalar, zapsmall will never perform
rounding.
>>>>>
>>>>> The man page simply states:
>>>>> zapsmall determines a digits argument dr for calling
round(x, digits >> dr) such that values close to zero (compared with the
maximal absolute
>> value) are ?zapped?, i.e., replaced by 0.
>>>>> and doesn?t provide any details about how ?close to zero?
is defined.
>>>>>
>>>>> Perhaps handling the special when `x` is a scalar (or only
contains a
>> single non-NA value)  would make sense:
>>>>> zapsmall <-
>>>>> function (x, digits = getOption("digits"))
>>>>> {
>>>>>      if (length(digits) == 0L)
>>>>>          stop("invalid 'digits'")
>>>>>      if (all(ina <- is.na(x)))
>>>>>          return(x)
>>>>>      mx <- max(abs(x[!ina]))
>>>>>      round(x, digits = if (mx > 0 &&
(length(x)-sum(ina))>1 ) max(0L,
>> digits - as.numeric(log10(mx))) else digits)
>>>>> }
>>>>>
>>>>> Yielding:
>>>>>
>>>>>> y <- 2.220446e-16
>>>>>> zapsmall(y)
>>>>> [1] 0
>>>>>
>>>>> Another edge case would be when all of the non-na values
are the same:
>>>>>
>>>>>> y <- 2.220446e-16
>>>>>> zapsmall(c(y,y))
>>>>> [1] 2.220446e-16 2.220446e-16
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>> Gregory R. Warnes, Ph.D.
>>>>> greg at warnes.net
>>>>> Eternity is a long time, take a friend!
>>>>>
>>>>>
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Serguei Sokol
Ingenieur de recherche INRAE

Cellule Math?matiques
TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 61 55 98 49
email: sokol at insa-toulouse.fr
https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/

Martin Maechler

2023-Dec-18 10:24 UTC

head link

[Rd] [External] Re: zapsmall(x) for scalar x

>>>>> Serguei Sokol via R-devel 
>>>>>     on Mon, 18 Dec 2023 10:29:02 +0100 writes:
    > Le 17/12/2023 ? 18:26, Barry Rowlingson a ?crit?:
    >> I think what's been missed is that zapsmall works relative to
the absolute
    >> largest value in the vector. Hence if there's only one
    >> item in the vector, it is the largest, so its not zapped. The
function's
    >> raison d'etre isn't to replace absolutely small values,
    >> but small values relative to the largest. Hence a vector of similar
tiny
    >> values doesn't get zapped.
    >> 
    >> Maybe the line in the docs:
    >> 
    >> " (compared with the maximal absolute value)"
    >> 
    >> needs to read:
    >> 
    >> " (compared with the maximal absolute value in the
vector)"

    > I agree that this change in the doc would clarify the situation but 
    > would not resolve proposed corner cases.

    > I think that an additional argument 'mx' (absolute max value of
    > reference) would do. Consider:

    > zapsmall2 <-
    > function (x, digits = getOption("digits"), mx=max(abs(x),
na.rm=TRUE))
    > {
    > ??? if (length(digits) == 0L)
    > ??????? stop("invalid 'digits'")
    > ??? if (all(ina <- is.na(x)))
    > ??????? return(x)
    > ??? round(x, digits = if (mx > 0) max(0L, digits - 
    > as.numeric(log10(mx))) else digits)
    > }

    > then zapsmall2() without explicit 'mx' behaves identically to
actual
    > zapsmall() and for a scalar or a vector of identical value, user can 
    > manually fix the scale of what should be considered as small:

    >> zapsmall2(y)
    > [1] 2.220446e-16
    >> zapsmall2(y, mx=1)
    > [1] 0
    >> zapsmall2(c(y, y), mx=1)
    > [1] 0 0
    >> zapsmall2(c(y, NA))
    > [1] 2.220446e-16?????????? NA
    >> zapsmall2(c(y, NA), mx=1)
    > [1]? 0 NA

    > Obviously, the name 'zapsmall2' was chosen just for this
explanation.
    > The original name 'zapsmall' could be reused as a full backward
    > compatibility is preserved.

    > Best,
    > Serguei.

Thank you, Serguei, Duncan, Barry et al.

Generally :
  Yes, zapsmall was meant and is used for zapping *relatively*
  small numbers.  In the other cases,  directly  round()ing is
  what you should use.

Specifically to Serguei's proposal of allowing the "max" value
to be user specified (in which case it is not really a true
max() anymore):

I've spent quite a a few hours on this problem in May 2022, to
make it even more flexible, e.g. allowing to use a 99%
percentile instead of the max(), or allowing to exclude +Inf
from the "mx"; but -- compared to your zapsmall2() --
to allow reproducible automatic choice :


zapsmall <- function(x, digits = getOption("digits"),
                     mFUN = function(x, ina) max(abs(x[!ina])),
		     min.d = 0L)
{
    if (length(digits) == 0L)
        stop("invalid 'digits'")
    if (all(ina <- is.na(x)))
        return(x)
    mx <- mFUN(x, ina)
    round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx)))
else digits)
}

with optional 'min.d' as I had (vaguely remember to have) found
at the time that the '0' is also not always "the only correct"
choice.

Somehow I never got to propose/discuss the above,
but it seems a good time to do so now.

Martin



    >> barry
    >> 
    >> 
    >> On Sun, Dec 17, 2023 at 2:17?PM Duncan Murdoch <murdoch.duncan
at gmail.com>
    >> wrote:
    >> 
    >>> This email originated outside the University. Check before
clicking links
    >>> or attachments.
    >>> 
    >>> I'm really confused.  Steve's example wasn't a
scalar x, it was a
    >>> vector.  Your zapsmall() proposal wouldn't zap it to zero,
and I don't
    >>> see why summary() would if it was using your proposal.
    >>> 
    >>> Duncan Murdoch
    >>> 
    >>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
    >>>> Isn?t that the correct outcome?  The user can change the
number of
    >>> digits if they want to see small values?
    >>>> 
    >>>> --
    >>>> Change your thoughts and you change the world.
    >>>> --Dr. Norman Vincent Peale
    >>>> 
    >>>>> On Dec 17, 2023, at 12:11?AM, Steve Martin
<stevemartin041 at gmail.com>
    >>> wrote:
    >>>>> ?Zapping a vector of small numbers to zero would cause
problems when
    >>>>> printing the results of summary(). For example, if
    >>>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0,
..., 0) then
    >>>>> print(summary(2.220446e-16), digits = 7) would print
    >>>>> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    >>>>> 0          0            0           0           0      
0
    >>>>> 
    >>>>> The same problem can also appear when printing the
results of
    >>>>> summary.glm() with show.residuals = TRUE if there's
little dispersion
    >>>>> in the residuals.
    >>>>> 
    >>>>> Steve
    >>>>> >>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg at
warnes.net> wrote:
    >>>>>> >>>>> I was quite suprised to discover that applying `zapsmall`
to a scalar    >>> value has no apparent effect.  For example:
    >>>>>>> y <- 2.220446e-16
    >>>>>>> zapsmall(y,)>>>>> [1] 2.2204e-16
    >>>>>> >>>>> I was expecting zapsmall(x)` to act like    >>>>>> 
    >>>>>>> round(y,
digits=getOption('digits'))>>>>> [1] 0
    >>>>>> >>>>> Looking at the current source code, indicates that
`zapsmall` is
    >>> expecting a vector:>>>>> zapsmall <-
>>>>> function (x, digits = getOption("digits"))
>>>>> {
>>>>>      if (length(digits) == 0L)
>>>>>          stop("invalid 'digits'")
>>>>>      if (all(ina <- is.na(x)))
>>>>>          return(x)
>>>>>      mx <- max(abs(x[!ina]))
>>>>>      round(x, digits = if (mx > 0) max(0L, digits -
    >>> as.numeric(log10(mx))) else digits)>>>>> }
    >>>>>> >>>>> If `x` is a non-zero scalar, zapsmall will never perform
rounding.
    >>>>>> >>>>> The man page simply states:
>>>>> zapsmall determines a digits argument dr for calling
round(x, digits     >>> dr) such that values close to zero (compared
with the maximal absolute    >>> value) are ?zapped?, i.e., replaced by
0.>>>>> and doesn?t provide any details about how ?close to zero?
is defined.
    >>>>>> >>>>> Perhaps handling the special when `x` is a scalar (or only
contains a
    >>> single non-NA value)  would make sense:>>>>> zapsmall <-
>>>>> function (x, digits = getOption("digits"))
>>>>> {
>>>>>      if (length(digits) == 0L)
>>>>>          stop("invalid 'digits'")
>>>>>      if (all(ina <- is.na(x)))
>>>>>          return(x)
>>>>>      mx <- max(abs(x[!ina]))
>>>>>      round(x, digits = if (mx > 0 &&
(length(x)-sum(ina))>1 ) max(0L,    >>> digits - as.numeric(log10(mx))) else
digits)>>>>> }
    >>>>>> >>>>> Yielding:    >>>>>> 
    >>>>>>> y <- 2.220446e-16
    >>>>>>> zapsmall(y)>>>>> [1] 0
    >>>>>> >>>>> Another edge case would be when all of the non-na values
are the same:    >>>>>> 
    >>>>>>> y <- 2.220446e-16
    >>>>>>> zapsmall(c(y,y))>>>>> [1] 2.220446e-16 2.220446e-16
    >>>>>> >>>>> Thoughts?    >>>>>> 
    >>>>>> >>>>> Gregory R. Warnes, Ph.D.
>>>>> greg at warnes.net
>>>>> Eternity is a long time, take a friend!    >>>>>> 
    >>>>>> 
    >>>>>> >>>>>          [[alternative HTML version deleted]]
    >>>>>> >>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel    >>>> [[alternative HTML version deleted]]
    >>>> 
    >>>> ______________________________________________
    >>>> R-devel at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> 
    >> [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel


    > -- 
    > Serguei Sokol
    > Ingenieur de recherche INRAE

    > Cellule Math?matiques
    > TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
    > 135 Avenue de Rangueil
    > 31077 Toulouse Cedex 04

    > tel: +33 5 61 55 98 49
    > email: sokol at insa-toulouse.fr
    >
https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Reasonably Related Threads

Search for more maybe matching threads

R devel - Dec 2023 - [External] Re: zapsmall(x) for scalar x

[Rd] [External] Re: zapsmall(x) for scalar x

[Rd] [External] Re: zapsmall(x) for scalar x

Reasonably Related Threads