Matthew Dowle
2012-Oct-30 11:03 UTC
[Rd] There is pmin and pmax each taking na.rm, how about psum?
Hi, Please consider the following : x = c(1,3,NA,5) y = c(2,NA,4,1) min(x,y,na.rm=TRUE) # ok [1] 1 max(x,y,na.rm=TRUE) # ok [1] 5 sum(x,y,na.rm=TRUE) # ok [1] 16 pmin(x,y,na.rm=TRUE) # ok [1] 1 3 4 1 pmax(x,y,na.rm=TRUE) # ok [1] 2 3 4 5 psum(x,y,na.rm=TRUE) [1] 3 3 4 6 # expected result Error: could not find function "psum" # actual result I realise that + is already like psum, but what about NA? x+y [1] 3 NA NA 6 # can't supply `na.rm=TRUE` to `+` Is there a case to add psum? Or have I missed something. This question survived when I asked on Stack Overflow : http://stackoverflow.com/questions/13123638/there-is-pmin-and-pmax-each-taking-na-rm-why-no-psum And a search of the archives found that has Gabor has suggested it too as an aside : http://r.789695.n4.nabble.com/How-to-do-it-without-for-loops-tp794745p794750.html If someone from R core is willing to sponsor the idea, I am willing to write, test and submit the code for psum. Implemented in a very similar fashion to pmin and pmax. Or perhaps it exists already in a package somewhere (I searched but didn't find it). Matthew
ONKELINX, Thierry
2012-Oct-30 12:13 UTC
[Rd] There is pmin and pmax each taking na.rm, how about psum?
Why don't you make a matrix and use colSums or rowSums? x = c(1,3,NA,5) y = c(2,NA,4,1) colSums(rbind(x, y), na.rm = TRUE) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] Namens Matthew Dowle Verzonden: dinsdag 30 oktober 2012 12:03 Aan: r-devel at r-project.org Onderwerp: [Rd] There is pmin and pmax each taking na.rm, how about psum? Hi, Please consider the following : x = c(1,3,NA,5) y = c(2,NA,4,1) min(x,y,na.rm=TRUE) # ok [1] 1 max(x,y,na.rm=TRUE) # ok [1] 5 sum(x,y,na.rm=TRUE) # ok [1] 16 pmin(x,y,na.rm=TRUE) # ok [1] 1 3 4 1 pmax(x,y,na.rm=TRUE) # ok [1] 2 3 4 5 psum(x,y,na.rm=TRUE) [1] 3 3 4 6 # expected result Error: could not find function "psum" # actual result I realise that + is already like psum, but what about NA? x+y [1] 3 NA NA 6 # can't supply `na.rm=TRUE` to `+` Is there a case to add psum? Or have I missed something. This question survived when I asked on Stack Overflow : http://stackoverflow.com/questions/13123638/there-is-pmin-and-pmax-each-taking-na-rm-why-no-psum And a search of the archives found that has Gabor has suggested it too as an aside : http://r.789695.n4.nabble.com/How-to-do-it-without-for-loops-tp794745p794750.html If someone from R core is willing to sponsor the idea, I am willing to write, test and submit the code for psum. Implemented in a very similar fashion to pmin and pmax. Or perhaps it exists already in a package somewhere (I searched but didn't find it). Matthew ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Hadley Wickham
2012-Oct-30 13:47 UTC
[Rd] There is pmin and pmax each taking na.rm, how about psum?
> Is there a case to add psum? Or have I missed something.If psum, then why not pdiff (-), pprod (*) and precip (/) ? And similarly, what about equivalent functions for ^, %%, %/%, &, and | ? Hadley -- RStudio / Rice University http://had.co.nz/
Justin Talbot
2012-Oct-31 15:38 UTC
[Rd] There is pmin and pmax each taking na.rm, how about psum?
> Because that's inconsistent with pmin and pmax when two NAs are summed. > > x = c(1,3,NA,NA,5) > y = c(2,NA,4,NA,1) > colSums(rbind(x, y), na.rm = TRUE) > [1] 3 3 4 0 6 # actual > [1] 3 3 4 NA 6 # desiredBut your desired result would be inconsistent with sum: sum(NA,NA,na.rm=TRUE) [1] 0>From a language definition perspective I think having psum return 0here is right choice. R consistently distinguishes between operators that have a sensible identity (+:0, *:1, &:TRUE, |:FALSE) which return the identity if removing NAs results in no items, and those that kind of don't (pmin, pmax) which return NA. Let's not break that. (I would argue that pmin and pmax should return their actual identities too: Inf and -Inf respectively, but I can understand the current behavior.) My 2 cents on psum: R has a natural set of associative & commutative operators: +, *, &, |, pmin, pmax. These correspond directly to the reduction functions: sum, prod, all, any, min, max The current problem is that pmin and pmax are more powerful than +, *, &, and |. The right fix is to extend the rest of the associative & commutative operators to have the same power as pmin and pmax. Thus, + should have the signature: `+`(..., na.rm=FALSE), which would allow you to do things like: `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6) If you don't like typing `+`, you could always alias psum to `+`. Additionally, R currently has two simple reduction functions that don't have corresponding operators: range and length. Having a prange operator and a plength operator would nicely round out the language. Justin
Matthew Dowle
2012-Nov-01 15:48 UTC
[Rd] There is pmin and pmax each taking na.rm, how about psum?
Justin Talbot <jtalbot <at> stanford.edu> writes:> > > Because that's inconsistent with pmin and pmax when two NAs are summed. > > > > x = c(1,3,NA,NA,5) > > y = c(2,NA,4,NA,1) > > colSums(rbind(x, y), na.rm = TRUE) > > [1] 3 3 4 0 6 # actual > > [1] 3 3 4 NA 6 # desired > > But your desired result would be inconsistent with sum: > sum(NA,NA,na.rm=TRUE) > [1] 0 > > >From a language definition perspective I think having psum return 0 > here is right choice.Ok, you've sold me. psum(NA,NA,na.rm=TRUE) returning 0 sounds good. And pprod(NA,NA,na.rm=TRUE) returning 1, consistent with prod then. Then the case for psum is more for convenience and speed -vs- colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a new matrix. The case for pprod is similar, plus colProds doesn't exist.> Thus, + should have the signature: `+`(..., na.rm=FALSE), which would > allow you to do things like: > > `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6) > > If you don't like typing `+`, you could always alias psum to `+`.But there would be a cost, wouldn't there? `+` is a dyadic .Primitive. Changing that to take `...` and `na.rm` could slow it down (iiuc), and any changes to the existing language are risky. For example : `+`(1,2,3) is currently an error. Changing that to do something might have implications for some of the 4,000 packages (some might rely on that being an error), with a possible speed cost too. In contrast, adding two functions that didn't exist before: psum and pprod, seems to be a safer and simpler proposition. Matthew