I have found a problem in R version 1.1.1 when using apply with the median function. The problem can be illustrated with the following data matrix: X1 X2 X3 1 2 3 4 5 6 7 8 NA Enter this data matrix as X and then try apply(X,2,median,na.rm=T) The problem here is that the median function returns a named scalar if the number of observations is odd, but returns an unnamed scalar if the number of observations is even. This confuses the apply function in this case at: ans.names <- names(ans[[1]]) if (!ans.list) ans.list <- any(unlist(lapply(ans, length)) != l.ans) if (!ans.list && length(ans.names)) { all.same <- sapply(ans, function(x) all(names(x) == ans.names)) #here is the offending line if (!all(all.same)) ans.names <- NULL } This problem does not occur with S-Plus. My quick solution was to use the quantile function instead of the median function: apply(X,2,quantile,probs=.5,na.rm=T) One way of fixing the problem then is to redefine median as median <- function(x,na.rm=F,names=T) quantile(x,probs=.5,na.rm=na.rm,names=names) I don't know if this is a long-term solution though, since there may be other functions with inconsistent naming policies that can confuse apply as it is currently written. Larry Ammann Professor of Mathematical Sciences University of Texas at Dallas http://www.utdallas.edu/~ammann -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Larry Ammann <ammann@metronet.com> writes:> I have found a problem in R version 1.1.1 when using apply with the > median function. > The problem can be illustrated with the following data matrix: > > X1 X2 X3 > 1 2 3 > 4 5 6 > 7 8 NA > > Enter this data matrix as X and then try > apply(X,2,median,na.rm=T) > The problem here is that the median function returns a named scalar if > the number of > observations is odd, but returns an unnamed scalar if the number of > observations is > even. This confuses the apply function in this case at:for ( i in 1:1000 ) cat("Data frames are not matrices!\n")> X<-matrix(c(1:8,NA),3,3) > X[,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 NA> apply(X,2,median,na.rm=T)[1] 2.0 5.0 7.5 Works fine. However,> Z<-data.frame(X) > apply(Z,2,median,na.rm=T)Error in names(x) == ans.names : comparison (1) is possible only for vector types Question is whether this is a bug. Apply() never promised to work on non-arrays. When applied columnwise, it would anyway be more obvious to use> sapply(Z,median,na.rm=T)X1 X2 X3 2.0 5.0 7.5 -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:(Sep 23): PD> Larry Ammann <ammann@metronet.com> writes: >> I have found a problem in R version 1.1.1 when using apply with the >> median function. >> The problem can be illustrated with the following data matrix: >> >> X1 X2 X3 >> 1 2 3 >> 4 5 6 >> 7 8 NA >> >> Enter this data matrix as X and then try >> apply(X,2,median,na.rm=T) >> The problem here is that the median function returns a named scalar if >> the number of >> observations is odd, but returns an unnamed scalar if the number of >> observations is >> even. This confuses the apply function in this case at: PD> for ( i in 1:1000 ) cat("Data frames are not matrices!\n") ============================= >> X<-matrix(c(1:8,NA),3,3) >> X PD> [,1] [,2] [,3] PD> [1,] 1 4 7 PD> [2,] 2 5 8 PD> [3,] 3 6 NA >> apply(X,2,median,na.rm=T) PD> [1] 2.0 5.0 7.5 PD> Works fine. PD> However, >> Z<-data.frame(X) >> apply(Z,2,median,na.rm=T) PD> Error in names(x) == ans.names : comparison (1) is possible only for vector types PD> Question is whether this is a bug. Apply() never promised to work on PD> non-arrays. I think we should either make it promiss and do this or make it produce a warning when used with data.frames The above problem is too common for non-experts to stumble on.. PD> When applied columnwise, it would anyway be more obvious to use >> sapply(Z,median,na.rm=T) PD> X1 X2 X3 PD> 2.0 5.0 7.5 This could be what apply(<dataframe>, 2, * ) should do whereas we could have apply(dfr, 1, *) === apply(data.matrix(dfr), 1, *) when dfr is a data.frame. Opinions ? -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Martin Maechler
2000-Sep-27 08:37 UTC
[Rd] bug in apply with median -- fixed in R-devel (NULL == ..)
>>>>> "LarAm" == Larry Ammann <ammann@metronet.com> writes:LarAm> I have found a problem in R version 1.1.1 when using apply with the LarAm> median function. LarAm> The problem can be illustrated with the following data matrix: LarAm> X1 X2 X3 LarAm> 1 2 3 LarAm> 4 5 6 LarAm> 7 8 NA LarAm> Enter this data matrix as X and then try LarAm> apply(X,2,median,na.rm=T) Reproducible by X <- matrix(c(1:8,NA),3,3, dimnames = list(1:3,paste("V",1:3,sep=""))) apply(X,2,median,na.rm=T) Thank you! LarAm> The problem here is that the median function returns a named LarAm> scalar if the number of observations is odd, but returns an LarAm> unnamed scalar if the number of observations is even. This LarAm> confuses the apply function in this case at: LarAm> ans.names <- names(ans[[1]]) LarAm> if (!ans.list) LarAm> ans.list <- any(unlist(lapply(ans, length)) != l.ans) LarAm> if (!ans.list && length(ans.names)) { LarAm> all.same <- sapply(ans, function(x) all(names(x) == ans.names)) LarAm> #here is the offending line LarAm> if (!all(all.same)) LarAm> ans.names <- NULL LarAm> } Yes, your analysis is correct. The reason this problem is now fixed (in R-devel, aka "1.2 unstable"), is explained by the following entry in the BUG FIXES part of the ./NEWS file : o NULL == ... now gives logical(0) instead of an error. This fixes a bug with e.g. apply(X,2,median, na.rm = TRUE) and all(NULL == NULL) is now TRUE. LarAm> This problem does not occur with S-Plus. My quick solution was to use LarAm> the quantile function LarAm> instead of the median function: LarAm> apply(X,2,quantile,probs=.5,na.rm=T) LarAm> One way of fixing the problem then is to redefine median as LarAm> median <- function(x,na.rm=F,names=T) LarAm> quantile(x,probs=.5,na.rm=na.rm,names=names) LarAm> I don't know if this is a long-term solution though, since there may be LarAm> other functions with inconsistent LarAm> naming policies that can confuse apply as it is currently written. yes. and the above fix should fix all these as well! LarAm> Larry Ammann LarAm> Professor of Mathematical Sciences LarAm> University of Texas at Dallas Thanks again! Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO D10 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <>< -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._