thr3ads.net - R devel - [Rd] [External] Re: zapsmall(x) for scalar x [Dec 2023]

If this information is useful, please help other people find it:
Share via:

Steve Martin

2023-Dec-18 12:56 UTC

[Rd] [External] Re: zapsmall(x) for scalar x

Does mFUN() really need to be a function of x and the NA values of x? I
can't think of a case where it would be used on anything but the non-NA
values of x.

I think it would be easier to specify a different mFUN() (and document this
new argument) if the function has one argument and is applied to the non-NA
values of x.

zapsmall <- function(x,
    digits = getOption("digits"),
    mFUN = function(x) max(abs(x)),
    min.d = 0L
) {
    if (length(digits) == 0L)
        stop("invalid 'digits'")
    if (all(ina <- is.na(x)))
        return(x)
    mx <- mFUN(x[!ina])
    round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx)))
else digits)
}

Steve

On Mon, Dec 18, 2023, 05:47 Serguei Sokol via R-devel <r-devel at
r-project.org>
wrote:
> Le 18/12/2023 ? 11:24, Martin Maechler a ?crit :
> >>>>>> Serguei Sokol via R-devel
> >>>>>>      on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> >      > Le 17/12/2023 ? 18:26, Barry Rowlingson a ?crit :
> >      >> I think what's been missed is that zapsmall works
relative to
> the absolute
> >      >> largest value in the vector. Hence if there's only
one
> >      >> item in the vector, it is the largest, so its not
zapped. The
> function's
> >      >> raison d'etre isn't to replace absolutely small
values,
> >      >> but small values relative to the largest. Hence a vector
of
> similar tiny
> >      >> values doesn't get zapped.
> >      >>
> >      >> Maybe the line in the docs:
> >      >>
> >      >> " (compared with the maximal absolute value)"
> >      >>
> >      >> needs to read:
> >      >>
> >      >> " (compared with the maximal absolute value in the
vector)"
> >
> >      > I agree that this change in the doc would clarify the
situation
> but
> >      > would not resolve proposed corner cases.
> >
> >      > I think that an additional argument 'mx' (absolute
max value of
> >      > reference) would do. Consider:
> >
> >      > zapsmall2 <-
> >      > function (x, digits = getOption("digits"),
mx=max(abs(x),
> na.rm=TRUE))
> >      > {
> >      >     if (length(digits) == 0L)
> >      >         stop("invalid 'digits'")
> >      >     if (all(ina <- is.na(x)))
> >      >         return(x)
> >      >     round(x, digits = if (mx > 0) max(0L, digits -
> >      > as.numeric(log10(mx))) else digits)
> >      > }
> >
> >      > then zapsmall2() without explicit 'mx' behaves
identically to
> actual
> >      > zapsmall() and for a scalar or a vector of identical value,
user
> can
> >      > manually fix the scale of what should be considered as
small:
> >
> >      >> zapsmall2(y)
> >      > [1] 2.220446e-16
> >      >> zapsmall2(y, mx=1)
> >      > [1] 0
> >      >> zapsmall2(c(y, y), mx=1)
> >      > [1] 0 0
> >      >> zapsmall2(c(y, NA))
> >      > [1] 2.220446e-16           NA
> >      >> zapsmall2(c(y, NA), mx=1)
> >      > [1]  0 NA
> >
> >      > Obviously, the name 'zapsmall2' was chosen just for
this
> explanation.
> >      > The original name 'zapsmall' could be reused as a
full backward
> >      > compatibility is preserved.
> >
> >      > Best,
> >      > Serguei.
> >
> > Thank you, Serguei, Duncan, Barry et al.
> >
> > Generally :
> >    Yes, zapsmall was meant and is used for zapping *relatively*
> >    small numbers.  In the other cases,  directly  round()ing is
> >    what you should use.
> >
> > Specifically to Serguei's proposal of allowing the "max"
value
> > to be user specified (in which case it is not really a true
> > max() anymore):
> >
> > I've spent quite a a few hours on this problem in May 2022, to
> > make it even more flexible, e.g. allowing to use a 99%
> > percentile instead of the max(), or allowing to exclude +Inf
> > from the "mx"; but -- compared to your zapsmall2() --
> > to allow reproducible automatic choice :
> >
> >
> > zapsmall <- function(x, digits = getOption("digits"),
> >                       mFUN = function(x, ina) max(abs(x[!ina])),
> >                    min.d = 0L)
> > {
> >      if (length(digits) == 0L)
> >          stop("invalid 'digits'")
> >      if (all(ina <- is.na(x)))
> >          return(x)
> >      mx <- mFUN(x, ina)
> >      round(x, digits = if(mx > 0) max(min.d, digits -
> as.numeric(log10(mx))) else digits)
> > }
> >
> > with optional 'min.d' as I had (vaguely remember to have)
found
> > at the time that the '0' is also not always "the only
correct" choice.
> Do you have a case or two where min.d could be useful?
>
> Serguei.
>
> >
> > Somehow I never got to propose/discuss the above,
> > but it seems a good time to do so now.
> >
> > Martin
> >
> >
> >
> >      >> barry
> >      >>
> >      >>
> >      >> On Sun, Dec 17, 2023 at 2:17?PM Duncan Murdoch <
> murdoch.duncan at gmail.com>
> >      >> wrote:
> >      >>
> >      >>> This email originated outside the University. Check
before
> clicking links
> >      >>> or attachments.
> >      >>>
> >      >>> I'm really confused.  Steve's example
wasn't a scalar x, it was
> a
> >      >>> vector.  Your zapsmall() proposal wouldn't zap
it to zero, and
> I don't
> >      >>> see why summary() would if it was using your
proposal.
> >      >>>
> >      >>> Duncan Murdoch
> >      >>>
> >      >>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
> >      >>>> Isn?t that the correct outcome?  The user can
change the
> number of
> >      >>> digits if they want to see small values?
> >      >>>>
> >      >>>> --
> >      >>>> Change your thoughts and you change the world.
> >      >>>> --Dr. Norman Vincent Peale
> >      >>>>
> >      >>>>> On Dec 17, 2023, at 12:11?AM, Steve Martin
<
> stevemartin041 at gmail.com>
> >      >>> wrote:
> >      >>>>> ?Zapping a vector of small numbers to zero
would cause
> problems when
> >      >>>>> printing the results of summary(). For
example, if
> >      >>>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16))
== c(0, ..., 0)
> then
> >      >>>>> print(summary(2.220446e-16), digits = 7)
would print
> >      >>>>> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >      >>>>> 0          0            0           0       
0          0
> >      >>>>>
> >      >>>>> The same problem can also appear when
printing the results of
> >      >>>>> summary.glm() with show.residuals = TRUE if
there's little
> dispersion
> >      >>>>> in the residuals.
> >      >>>>>
> >      >>>>> Steve
> >      >>>>>
> >>>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes
<greg at warnes.net>
> wrote:
> >      >>>>>>
> >>>>>> I was quite suprised to discover that applying
`zapsmall` to a
> scalar
> >      >>> value has no apparent effect.  For example:
> >      >>>>>>> y <- 2.220446e-16
> >      >>>>>>> zapsmall(y,)
> >>>>>> [1] 2.2204e-16
> >      >>>>>>
> >>>>>> I was expecting zapsmall(x)` to act like
> >      >>>>>>
> >      >>>>>>> round(y,
digits=getOption('digits'))
> >>>>>> [1] 0
> >      >>>>>>
> >>>>>> Looking at the current source code, indicates that
`zapsmall` is
> >      >>> expecting a vector:
> >>>>>> zapsmall <-
> >>>>>> function (x, digits =
getOption("digits"))
> >>>>>> {
> >>>>>>       if (length(digits) == 0L)
> >>>>>>           stop("invalid
'digits'")
> >>>>>>       if (all(ina <- is.na(x)))
> >>>>>>           return(x)
> >>>>>>       mx <- max(abs(x[!ina]))
> >>>>>>       round(x, digits = if (mx > 0) max(0L,
digits -
> >      >>> as.numeric(log10(mx))) else digits)
> >>>>>> }
> >      >>>>>>
> >>>>>> If `x` is a non-zero scalar, zapsmall will never
perform rounding.
> >      >>>>>>
> >>>>>> The man page simply states:
> >>>>>> zapsmall determines a digits argument dr for
calling round(x,
> digits > >      >>> dr) such that values close to zero
(compared with the maximal
> absolute
> >      >>> value) are ?zapped?, i.e., replaced by 0.
> >>>>>> and doesn?t provide any details about how ?close
to zero? is
> defined.
> >      >>>>>>
> >>>>>> Perhaps handling the special when `x` is a scalar
(or only contains
> a
> >      >>> single non-NA value)  would make sense:
> >>>>>> zapsmall <-
> >>>>>> function (x, digits =
getOption("digits"))
> >>>>>> {
> >>>>>>       if (length(digits) == 0L)
> >>>>>>           stop("invalid
'digits'")
> >>>>>>       if (all(ina <- is.na(x)))
> >>>>>>           return(x)
> >>>>>>       mx <- max(abs(x[!ina]))
> >>>>>>       round(x, digits = if (mx > 0 &&
(length(x)-sum(ina))>1 )
> max(0L,
> >      >>> digits - as.numeric(log10(mx))) else digits)
> >>>>>> }
> >      >>>>>>
> >>>>>> Yielding:
> >      >>>>>>
> >      >>>>>>> y <- 2.220446e-16
> >      >>>>>>> zapsmall(y)
> >>>>>> [1] 0
> >      >>>>>>
> >>>>>> Another edge case would be when all of the non-na
values are the
> same:
> >      >>>>>>
> >      >>>>>>> y <- 2.220446e-16
> >      >>>>>>> zapsmall(c(y,y))
> >>>>>> [1] 2.220446e-16 2.220446e-16
> >      >>>>>>
> >>>>>> Thoughts?
> >      >>>>>>
> >      >>>>>>
> >>>>>> Gregory R. Warnes, Ph.D.
> >>>>>> greg at warnes.net
> >>>>>> Eternity is a long time, take a friend!
> >      >>>>>>
> >      >>>>>>
> >      >>>>>>
> >>>>>>           [[alternative HTML version deleted]]
> >      >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-devel at r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >>>> [[alternative HTML version deleted]]
> >      >>>>
> >      >>>> ______________________________________________
> >      >>>> R-devel at r-project.org mailing list
> >      >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >>> ______________________________________________
> >      >>> R-devel at r-project.org mailing list
> >      >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >>>
> >      >> [[alternative HTML version deleted]]
> >      >>
> >      >> ______________________________________________
> >      >> R-devel at r-project.org mailing list
> >      >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >      > --
> >      > Serguei Sokol
> >      > Ingenieur de recherche INRAE
> >
> >      > Cellule Math?matiques
> >      > TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
> >      > 135 Avenue de Rangueil
> >      > 31077 Toulouse Cedex 04
> >
> >      > tel: +33 5 61 55 98 49
> >      > email: sokol at insa-toulouse.fr
> >      >
>
https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/
> >
> >      > ______________________________________________
> >      > R-devel at r-project.org mailing list
> >      > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Martin Maechler

2023-Dec-19 16:25 UTC

head link

[Rd] [External] Re: zapsmall(x) for scalar x

>>>>> Steve Martin 
>>>>>     on Mon, 18 Dec 2023 07:56:46 -0500 writes:
    > Does mFUN() really need to be a function of x and the NA values of x? I
    > can't think of a case where it would be used on anything but the
non-NA
    > values of x.

    > I think it would be easier to specify a different mFUN() (and document
this
    > new argument) if the function has one argument and is applied to the
non-NA
    > values of x.

    > zapsmall <- function(x,
    >     digits = getOption("digits"),
    >     mFUN = function(x) max(abs(x)),
    >     min.d = 0L) {
    >     if (length(digits) == 0L)
    >         stop("invalid 'digits'")
    >     if (all(ina <- is.na(x)))
    >         return(x)
    >     mx <- mFUN(x[!ina])
    >     round(x, digits = if(mx > 0) max(min.d, digits -
as.numeric(log10(mx)))
    > else digits)
    > }

    > Steve

Thank you, Steve,
you are right that it would look simpler to do it that way.

On the other hand, in your case, mFUN() no longer sees the
original  n observations, and would not know if there where NAs
in that case how many NAs there were in the original data.

The examples I have on my version of zapsmall's help page (see below)
uses a robust mFUN, "the upper hinge of a box plot":

   mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5]

and if you inspect boxplot.stats() you may know that indeed it
also wants to use the full data 'x' to compute its statistics and
then deal with NAs directly.  Your simplified mFUN interface
would not be fully consistent with boxplot(), and I think could
not be made so,  hence my more flexible 2-argument "design" for 
mFUN().

.... and BTW, these examples also exemplify the use of  `min.d`
about which  Serguei Sokol asked for an example or two.

Here I repeat my definition of zapsmall, and then my current set
of examples:

zapsmall <- function(x, digits = getOption("digits"),
                     mFUN = function(x, ina) max(abs(x[!ina])), min.d = 0L)
{
    if (length(digits) == 0L)
        stop("invalid 'digits'")
    if (all(ina <- is.na(x)))
        return(x)
    mx <- mFUN(x, ina)
    round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx)))
else digits)
}


##--- \examples{
x2 <- pi * 100^(-2:2)/10
   print(  x2, digits = 4)
zapsmall(  x2) # automatical digits
zapsmall(  x2, digits = 4)
zapsmall(c(x2, Inf)) # round()s to integer ..
zapsmall(c(x2, Inf), min.d=-Inf) # everything  is small wrt  Inf

(z <- exp(1i*0:4*pi/2))
zapsmall(z)

zapShow <- function(x, ...) rbind(orig = x, zapped = zapsmall(x, ...))
zapShow(x2)

## using a *robust* mFUN
mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5]
## with robust mFUN(), 'Inf' is no longer distorting the picture:
zapShow(c(x2, Inf), mFUN = mF_rob)
zapShow(c(x2, Inf), mFUN = mF_rob, min.d = -5) # the same
zapShow(c(x2, 999), mFUN = mF_rob) # same *rounding* as w/ Inf
zapShow(c(x2, 999), mFUN = mF_rob, min.d =  3) # the same
zapShow(c(x2, 999), mFUN = mF_rob, min.d =  8) # small diff
##--- }



    > On Mon, Dec 18, 2023, 05:47 Serguei Sokol via R-devel <r-devel at
r-project.org>
    > wrote:
> Le 18/12/2023 ? 11:24, Martin Maechler a ?crit :
> >>>>>> Serguei Sokol via R-devel
> >>>>>>      on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> >      > Le 17/12/2023 ? 18:26, Barry Rowlingson a ?crit :
> >      >> I think what's been missed is that zapsmall works
relative to the absolute
> >      >> largest value in the vector. Hence if there's only
one
> >      >> item in the vector, it is the largest, so its not
zapped. The function's
> >      >> raison d'etre isn't to replace absolutely small
values,
> >      >> but small values relative to the largest. Hence a vector
of similar tiny
> >      >> values doesn't get zapped.
> >      >>
> >      >> Maybe the line in the docs:
> >      >>
> >      >> " (compared with the maximal absolute value)"
> >      >>
> >      >> needs to read:
> >      >>
> >      >> " (compared with the maximal absolute value in the
vector)"
> >
> >      > I agree that this change in the doc would clarify the
situation but
> >      > would not resolve proposed corner cases.
> >
> >      > I think that an additional argument 'mx' (absolute
max value of
> >      > reference) would do. Consider:
> >
> >      > zapsmall2 <-
> >      > function (x, digits = getOption("digits"),
mx=max(abs(x),  na.rm=TRUE))
> >      > {
> >      >     if (length(digits) == 0L)
> >      >         stop("invalid 'digits'")
> >      >     if (all(ina <- is.na(x)))
> >      >         return(x)
> >      >     round(x, digits = if (mx > 0) max(0L, digits -
as.numeric(log10(mx))) else digits)
> >      > }
> >
> >      > then zapsmall2() without explicit 'mx' behaves
> >      > identically to actual
> >      > zapsmall() and for a scalar or a vector of identical value,
user
> can
> >      > manually fix the scale of what should be considered as
small:
> >
> >      >> zapsmall2(y)
> >      > [1] 2.220446e-16
> >      >> zapsmall2(y, mx=1)
> >      > [1] 0
> >      >> zapsmall2(c(y, y), mx=1)
> >      > [1] 0 0
> >      >> zapsmall2(c(y, NA))
> >      > [1] 2.220446e-16           NA
> >      >> zapsmall2(c(y, NA), mx=1)
> >      > [1]  0 NA
> >
> >      > Obviously, the name 'zapsmall2' was chosen just for
this
> explanation.
> >      > The original name 'zapsmall' could be reused as a
full backward
> >      > compatibility is preserved.
> >
> >      > Best,
> >      > Serguei.[.......................]

Reasonably Related Threads

Search for more reasonably related threads

R devel - Dec 2023 - [External] Re: zapsmall(x) for scalar x

[Rd] [External] Re: zapsmall(x) for scalar x

[Rd] [External] Re: zapsmall(x) for scalar x

Reasonably Related Threads