I think what's been missed is that zapsmall works relative to the absolute largest value in the vector. Hence if there's only one item in the vector, it is the largest, so its not zapped. The function's raison d'etre isn't to replace absolutely small values, but small values relative to the largest. Hence a vector of similar tiny values doesn't get zapped. Maybe the line in the docs: " (compared with the maximal absolute value)" needs to read: " (compared with the maximal absolute value in the vector)" Barry On Sun, Dec 17, 2023 at 2:17?PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> This email originated outside the University. Check before clicking links > or attachments. > > I'm really confused. Steve's example wasn't a scalar x, it was a > vector. Your zapsmall() proposal wouldn't zap it to zero, and I don't > see why summary() would if it was using your proposal. > > Duncan Murdoch > > On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote: > > Isn?t that the correct outcome? The user can change the number of > digits if they want to see small values? > > > > > > -- > > Change your thoughts and you change the world. > > --Dr. Norman Vincent Peale > > > >> On Dec 17, 2023, at 12:11?AM, Steve Martin <stevemartin041 at gmail.com> > wrote: > >> > >> ?Zapping a vector of small numbers to zero would cause problems when > >> printing the results of summary(). For example, if > >> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then > >> print(summary(2.220446e-16), digits = 7) would print > >> Min. 1st Qu. Median Mean 3rd Qu. Max. > >> 0 0 0 0 0 0 > >> > >> The same problem can also appear when printing the results of > >> summary.glm() with show.residuals = TRUE if there's little dispersion > >> in the residuals. > >> > >> Steve > >> > >>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg at warnes.net> wrote: > >>> > >>> I was quite suprised to discover that applying `zapsmall` to a scalar > value has no apparent effect. For example: > >>> > >>>> y <- 2.220446e-16 > >>>> zapsmall(y,) > >>> [1] 2.2204e-16 > >>> > >>> I was expecting zapsmall(x)` to act like > >>> > >>>> round(y, digits=getOption('digits')) > >>> [1] 0 > >>> > >>> Looking at the current source code, indicates that `zapsmall` is > expecting a vector: > >>> > >>> zapsmall <- > >>> function (x, digits = getOption("digits")) > >>> { > >>> if (length(digits) == 0L) > >>> stop("invalid 'digits'") > >>> if (all(ina <- is.na(x))) > >>> return(x) > >>> mx <- max(abs(x[!ina])) > >>> round(x, digits = if (mx > 0) max(0L, digits - > as.numeric(log10(mx))) else digits) > >>> } > >>> > >>> If `x` is a non-zero scalar, zapsmall will never perform rounding. > >>> > >>> The man page simply states: > >>> zapsmall determines a digits argument dr for calling round(x, digits > dr) such that values close to zero (compared with the maximal absolute > value) are ?zapped?, i.e., replaced by 0. > >>> > >>> and doesn?t provide any details about how ?close to zero? is defined. > >>> > >>> Perhaps handling the special when `x` is a scalar (or only contains a > single non-NA value) would make sense: > >>> > >>> zapsmall <- > >>> function (x, digits = getOption("digits")) > >>> { > >>> if (length(digits) == 0L) > >>> stop("invalid 'digits'") > >>> if (all(ina <- is.na(x))) > >>> return(x) > >>> mx <- max(abs(x[!ina])) > >>> round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L, > digits - as.numeric(log10(mx))) else digits) > >>> } > >>> > >>> Yielding: > >>> > >>>> y <- 2.220446e-16 > >>>> zapsmall(y) > >>> [1] 0 > >>> > >>> Another edge case would be when all of the non-na values are the same: > >>> > >>>> y <- 2.220446e-16 > >>>> zapsmall(c(y,y)) > >>> [1] 2.220446e-16 2.220446e-16 > >>> > >>> Thoughts? > >>> > >>> > >>> Gregory R. Warnes, Ph.D. > >>> greg at warnes.net > >>> Eternity is a long time, take a friend! > >>> > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-devel at r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Sorry for being unclear. I was commenting on the edge case that Gregory brought up when calling zapsmall() with a vector of small values. I thought Gregory was asking for thoughts on that as well, but maybe I misunderstood. IMO it would be weird for zapsmall() to make a small scalar zero but not a vector of the identical values. The example with summary() was meant to show that zapping a vector of small values to 0 could change the current printing behavior for certain objects. Ducan is right that zapping only a scalar to zero wouldn't do anything.>>> Isn?t that the correct outcome? The user can change the number of digits if they want to see small values?I'm not sure a user would be able to change the digits without updating other functions. If xx[finite] <- zapsmall(x[finite]) in print.summaryDefault() makes a vector of 0s (e.g., zapsmall(x) works like round(x, digits = getOption("digits")) and getOptions("digits") is 7) then calling print(summary(2.220446e-16), digits = 16) would still print a vector of 0s. The digits argument to print() wouldn't do anything. In any case, I just wanted to point out that changes to zapsmall() in the corner case Gregory brought up could affect the way certain objects are printed, both changing the current behavior and perhaps requiring changes to some other functions. Steve On Sun, 17 Dec 2023 at 12:26, Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote:> > I think what's been missed is that zapsmall works relative to the absolute largest value in the vector. Hence if there's only one > item in the vector, it is the largest, so its not zapped. The function's raison d'etre isn't to replace absolutely small values, > but small values relative to the largest. Hence a vector of similar tiny values doesn't get zapped. > > Maybe the line in the docs: > > " (compared with the maximal absolute value)" > > needs to read: > > " (compared with the maximal absolute value in the vector)" > > Barry > > > > > > On Sun, Dec 17, 2023 at 2:17?PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> >> This email originated outside the University. Check before clicking links or attachments. >> >> I'm really confused. Steve's example wasn't a scalar x, it was a >> vector. Your zapsmall() proposal wouldn't zap it to zero, and I don't >> see why summary() would if it was using your proposal. >> >> Duncan Murdoch >> >> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote: >> > Isn?t that the correct outcome? The user can change the number of digits if they want to see small values? >> > >> > >> > -- >> > Change your thoughts and you change the world. >> > --Dr. Norman Vincent Peale >> > >> >> On Dec 17, 2023, at 12:11?AM, Steve Martin <stevemartin041 at gmail.com> wrote: >> >> >> >> ?Zapping a vector of small numbers to zero would cause problems when >> >> printing the results of summary(). For example, if >> >> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then >> >> print(summary(2.220446e-16), digits = 7) would print >> >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> >> 0 0 0 0 0 0 >> >> >> >> The same problem can also appear when printing the results of >> >> summary.glm() with show.residuals = TRUE if there's little dispersion >> >> in the residuals. >> >> >> >> Steve >> >> >> >>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg at warnes.net> wrote: >> >>> >> >>> I was quite suprised to discover that applying `zapsmall` to a scalar value has no apparent effect. For example: >> >>> >> >>>> y <- 2.220446e-16 >> >>>> zapsmall(y,) >> >>> [1] 2.2204e-16 >> >>> >> >>> I was expecting zapsmall(x)` to act like >> >>> >> >>>> round(y, digits=getOption('digits')) >> >>> [1] 0 >> >>> >> >>> Looking at the current source code, indicates that `zapsmall` is expecting a vector: >> >>> >> >>> zapsmall <- >> >>> function (x, digits = getOption("digits")) >> >>> { >> >>> if (length(digits) == 0L) >> >>> stop("invalid 'digits'") >> >>> if (all(ina <- is.na(x))) >> >>> return(x) >> >>> mx <- max(abs(x[!ina])) >> >>> round(x, digits = if (mx > 0) max(0L, digits - as.numeric(log10(mx))) else digits) >> >>> } >> >>> >> >>> If `x` is a non-zero scalar, zapsmall will never perform rounding. >> >>> >> >>> The man page simply states: >> >>> zapsmall determines a digits argument dr for calling round(x, digits = dr) such that values close to zero (compared with the maximal absolute value) are ?zapped?, i.e., replaced by 0. >> >>> >> >>> and doesn?t provide any details about how ?close to zero? is defined. >> >>> >> >>> Perhaps handling the special when `x` is a scalar (or only contains a single non-NA value) would make sense: >> >>> >> >>> zapsmall <- >> >>> function (x, digits = getOption("digits")) >> >>> { >> >>> if (length(digits) == 0L) >> >>> stop("invalid 'digits'") >> >>> if (all(ina <- is.na(x))) >> >>> return(x) >> >>> mx <- max(abs(x[!ina])) >> >>> round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L, digits - as.numeric(log10(mx))) else digits) >> >>> } >> >>> >> >>> Yielding: >> >>> >> >>>> y <- 2.220446e-16 >> >>>> zapsmall(y) >> >>> [1] 0 >> >>> >> >>> Another edge case would be when all of the non-na values are the same: >> >>> >> >>>> y <- 2.220446e-16 >> >>>> zapsmall(c(y,y)) >> >>> [1] 2.220446e-16 2.220446e-16 >> >>> >> >>> Thoughts? >> >>> >> >>> >> >>> Gregory R. Warnes, Ph.D. >> >>> greg at warnes.net >> >>> Eternity is a long time, take a friend! >> >>> >> >>> >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> ______________________________________________ >> >>> R-devel at r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel
Le 17/12/2023 ? 18:26, Barry Rowlingson a ?crit?:> I think what's been missed is that zapsmall works relative to the absolute > largest value in the vector. Hence if there's only one > item in the vector, it is the largest, so its not zapped. The function's > raison d'etre isn't to replace absolutely small values, > but small values relative to the largest. Hence a vector of similar tiny > values doesn't get zapped. > > Maybe the line in the docs: > > " (compared with the maximal absolute value)" > > needs to read: > > " (compared with the maximal absolute value in the vector)"I agree that this change in the doc would clarify the situation but would not resolve proposed corner cases. I think that an additional argument 'mx' (absolute max value of reference) would do. Consider: zapsmall2 <- function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE)) { ??? if (length(digits) == 0L) ??????? stop("invalid 'digits'") ??? if (all(ina <- is.na(x))) ??????? return(x) ??? round(x, digits = if (mx > 0) max(0L, digits - as.numeric(log10(mx))) else digits) } then zapsmall2() without explicit 'mx' behaves identically to actual zapsmall() and for a scalar or a vector of identical value, user can manually fix the scale of what should be considered as small: > zapsmall2(y) [1] 2.220446e-16 > zapsmall2(y, mx=1) [1] 0 > zapsmall2(c(y, y), mx=1) [1] 0 0 > zapsmall2(c(y, NA)) [1] 2.220446e-16?????????? NA > zapsmall2(c(y, NA), mx=1) [1]? 0 NA Obviously, the name 'zapsmall2' was chosen just for this explanation. The original name 'zapsmall' could be reused as a full backward compatibility is preserved. Best, Serguei.> > Barry > > > > > > On Sun, Dec 17, 2023 at 2:17?PM Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: > >> This email originated outside the University. Check before clicking links >> or attachments. >> >> I'm really confused. Steve's example wasn't a scalar x, it was a >> vector. Your zapsmall() proposal wouldn't zap it to zero, and I don't >> see why summary() would if it was using your proposal. >> >> Duncan Murdoch >> >> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote: >>> Isn?t that the correct outcome? The user can change the number of >> digits if they want to see small values? >>> >>> -- >>> Change your thoughts and you change the world. >>> --Dr. Norman Vincent Peale >>> >>>> On Dec 17, 2023, at 12:11?AM, Steve Martin <stevemartin041 at gmail.com> >> wrote: >>>> ?Zapping a vector of small numbers to zero would cause problems when >>>> printing the results of summary(). For example, if >>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then >>>> print(summary(2.220446e-16), digits = 7) would print >>>> Min. 1st Qu. Median Mean 3rd Qu. Max. >>>> 0 0 0 0 0 0 >>>> >>>> The same problem can also appear when printing the results of >>>> summary.glm() with show.residuals = TRUE if there's little dispersion >>>> in the residuals. >>>> >>>> Steve >>>> >>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg at warnes.net> wrote: >>>>> >>>>> I was quite suprised to discover that applying `zapsmall` to a scalar >> value has no apparent effect. For example: >>>>>> y <- 2.220446e-16 >>>>>> zapsmall(y,) >>>>> [1] 2.2204e-16 >>>>> >>>>> I was expecting zapsmall(x)` to act like >>>>> >>>>>> round(y, digits=getOption('digits')) >>>>> [1] 0 >>>>> >>>>> Looking at the current source code, indicates that `zapsmall` is >> expecting a vector: >>>>> zapsmall <- >>>>> function (x, digits = getOption("digits")) >>>>> { >>>>> if (length(digits) == 0L) >>>>> stop("invalid 'digits'") >>>>> if (all(ina <- is.na(x))) >>>>> return(x) >>>>> mx <- max(abs(x[!ina])) >>>>> round(x, digits = if (mx > 0) max(0L, digits - >> as.numeric(log10(mx))) else digits) >>>>> } >>>>> >>>>> If `x` is a non-zero scalar, zapsmall will never perform rounding. >>>>> >>>>> The man page simply states: >>>>> zapsmall determines a digits argument dr for calling round(x, digits >> dr) such that values close to zero (compared with the maximal absolute >> value) are ?zapped?, i.e., replaced by 0. >>>>> and doesn?t provide any details about how ?close to zero? is defined. >>>>> >>>>> Perhaps handling the special when `x` is a scalar (or only contains a >> single non-NA value) would make sense: >>>>> zapsmall <- >>>>> function (x, digits = getOption("digits")) >>>>> { >>>>> if (length(digits) == 0L) >>>>> stop("invalid 'digits'") >>>>> if (all(ina <- is.na(x))) >>>>> return(x) >>>>> mx <- max(abs(x[!ina])) >>>>> round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L, >> digits - as.numeric(log10(mx))) else digits) >>>>> } >>>>> >>>>> Yielding: >>>>> >>>>>> y <- 2.220446e-16 >>>>>> zapsmall(y) >>>>> [1] 0 >>>>> >>>>> Another edge case would be when all of the non-na values are the same: >>>>> >>>>>> y <- 2.220446e-16 >>>>>> zapsmall(c(y,y)) >>>>> [1] 2.220446e-16 2.220446e-16 >>>>> >>>>> Thoughts? >>>>> >>>>> >>>>> Gregory R. Warnes, Ph.D. >>>>> greg at warnes.net >>>>> Eternity is a long time, take a friend! >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-devel at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Serguei Sokol Ingenieur de recherche INRAE Cellule Math?matiques TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504 135 Avenue de Rangueil 31077 Toulouse Cedex 04 tel: +33 5 61 55 98 49 email: sokol at insa-toulouse.fr https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/