Thank you Andrew. Is there any reason not to make: include.lowest = TRUE the default? Regarding the NA: The user still has to suspect that some values were not included and run that test. Leonard On 9/18/2021 12:53 AM, Andrew Simmons wrote:> Regarding your first point, argument 'include.lowest' already handles > this specific case, see ?.bincode > > Your second point, maybe it could be helpful, but since both > 'cut.default' and '.bincode' return NA if a value isn't within a bin, > you could make something like this on your own. > Might be worth pitching to R-bugs on the wishlist. > > > > On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help > <r-help at r-project.org <mailto:r-help at r-project.org>> wrote: > > Hello List members, > > > the following improvements would be useful for function cut (and > .bincode): > > > 1.) Argument: Include extremes > extremes = TRUE > if(right == FALSE) { > ??? # include also right for last interval; > } else { > ??? # include also left for first interval; > } > > > 2.) Argument: warn = TRUE > > Warn if any values are not included in the intervals. > > > Motivation: > - reduce risk of errors when using function cut(); > > > Sincerely, > > > Leonard > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
While it is not explicitly mentioned anywhere in the documentation for .bincode, I suspect 'include.lowest = FALSE' is the default to keep the definitions of the bins consistent. For example: x <- 0:20 breaks1 <- seq.int(0, 16, 4) breaks2 <- seq.int(0, 20, 4) cbind( .bincode(x, breaks1, right = FALSE, include.lowest = TRUE), .bincode(x, breaks2, right = FALSE, include.lowest = TRUE) ) by having 'include.lowest = TRUE' with different ends, you can get inconsistent behaviour. While this probably wouldn't be an issue with 'real' data, this would seem like something you'd want to avoid by default. The definitions of the bins are [0, 4) [4, 8) [8, 12) [12, 16] and [0, 4) [4, 8) [8, 12) [12, 16) [16, 20] so you can see where the inconsistent behaviour comes from. You might be able to get R-core to add argument 'warn', but probably not to change the default of 'include.lowest'. I hope this helps On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.mada at syonic.eu> wrote:> Thank you Andrew. > > > Is there any reason not to make: include.lowest = TRUE the default? > > > Regarding the NA: > > The user still has to suspect that some values were not included and run > that test. > > > Leonard > > > On 9/18/2021 12:53 AM, Andrew Simmons wrote: > > Regarding your first point, argument 'include.lowest' already handles this > specific case, see ?.bincode > > Your second point, maybe it could be helpful, but since both 'cut.default' > and '.bincode' return NA if a value isn't within a bin, you could make > something like this on your own. > Might be worth pitching to R-bugs on the wishlist. > > > > On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help <r-help at r-project.org> > wrote: > >> Hello List members, >> >> >> the following improvements would be useful for function cut (and >> .bincode): >> >> >> 1.) Argument: Include extremes >> extremes = TRUE >> if(right == FALSE) { >> # include also right for last interval; >> } else { >> # include also left for first interval; >> } >> >> >> 2.) Argument: warn = TRUE >> >> Warn if any values are not included in the intervals. >> >> >> Motivation: >> - reduce risk of errors when using function cut(); >> >> >> Sincerely, >> >> >> Leonard >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Re your objection that "the user has to suspect that some values were not included" applies equally to your proposed warn option. There are a lot of ways to introduce NAs... in real projects all analysts should be suspecting this problem. On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R-help <r-help at r-project.org> wrote:>Thank you Andrew. > > >Is there any reason not to make: include.lowest = TRUE the default? > > >Regarding the NA: > >The user still has to suspect that some values were not included and run >that test. > > >Leonard > > >On 9/18/2021 12:53 AM, Andrew Simmons wrote: >> Regarding your first point, argument 'include.lowest' already handles >> this specific case, see ?.bincode >> >> Your second point, maybe it could be helpful, but since both >> 'cut.default' and '.bincode' return NA if a value isn't within a bin, >> you could make something like this on your own. >> Might be worth pitching to R-bugs on the wishlist. >> >> >> >> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help >> <r-help at r-project.org <mailto:r-help at r-project.org>> wrote: >> >> Hello List members, >> >> >> the following improvements would be useful for function cut (and >> .bincode): >> >> >> 1.) Argument: Include extremes >> extremes = TRUE >> if(right == FALSE) { >> ??? # include also right for last interval; >> } else { >> ??? # include also left for first interval; >> } >> >> >> 2.) Argument: warn = TRUE >> >> Warn if any values are not included in the intervals. >> >> >> Motivation: >> - reduce risk of errors when using function cut(); >> >> >> Sincerely, >> >> >> Leonard >> >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- >> To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.