Hello, I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct.> d<-data.frame(d1 = letters[1:3],+ d2 = c(1,2,3), + d3 = c(NA,NA,6))> > dd1 d2 d3 1 a 1 NA 2 b 2 NA 3 c 3 6> > # results are incorrect > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))d1 d2 d3 FALSE TRUE TRUE> > # results are correct > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))d2 d3 TRUE FALSE> > # results are correct > for(i in names(d)){+ print(all(d[!is.na(d[,i]),i] <= 3)) + } [1] FALSE [1] TRUE [1] FALSE Finally, if I remove the NA values from d3 and include the character column in apply, it is correct.> d<-data.frame(d1 = letters[1:3],+ d2 = c(1,2,3), + d3 = c(4,5,6))> > dd1 d2 d3 1 a 1 4 2 b 2 5 3 c 3 6> > # results are correct > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))d1 d2 d3 FALSE TRUE FALSE Can someone help me understand what's happening?
Hi, I guess this can tell you what happens behind the scene> d<-data.frame(d1 = letters[1:3],+ d2 = c(1,2,3), + d3 = c(NA,NA,6))> apply(d, 2, FUN=function(x)x)d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6"> "a"<=3[1] FALSE> "2"<=3[1] TRUE> "6"<=3[1] FALSE Note that there is an additional space in the character value " 6", that's why your comparison fails. I do not understand why but this might be a bug in R Best, Jiefei On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help <r-help at r-project.org> wrote:> > Hello, > > I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > > > d > d1 d2 d3 > 1 a 1 NA > 2 b 2 NA > 3 c 3 6 > > > > # results are incorrect > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d1 d2 d3 > FALSE TRUE TRUE > > > > # results are correct > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d2 d3 > TRUE FALSE > > > > # results are correct > > for(i in names(d)){ > + print(all(d[!is.na(d[,i]),i] <= 3)) > + } > [1] FALSE > [1] TRUE > [1] FALSE > > > Finally, if I remove the NA values from d3 and include the character column in apply, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(4,5,6)) > > > > d > d1 d2 d3 > 1 a 1 4 > 2 b 2 5 > 3 c 3 6 > > > > # results are correct > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d1 d2 d3 > FALSE TRUE FALSE > > > Can someone help me understand what's happening? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, The issue comes that 'apply' tries to coerce its argument to a matrix. This means that all your columns will become character class, and the result will not be what you wanted. I would suggest something more like: sapply(d, function(x) all(x[!is.na(x)] <= 3)) or vapply(d, function(x) all(x[!is.na(x)] <= 3), NA) Also, here is a different method that might look cleaner: sapply(d, function(x) all(x <= 3, na.rm = TRUE)) vapply(d, function(x) all(x <= 3, na.rm = TRUE), NA) It's up to you which you choose. I hope this helps! On Fri, Oct 8, 2021 at 1:50 PM Derickson, Ryan, VHA NCOD via R-help < r-help at r-project.org> wrote:> Hello, > > I'm seeing unexpected behavior when using apply() compared to a for loop > when a character vector is part of the data subjected to the apply > statement. Below, I check whether all non-missing values are <= 3. If I > include a character column, apply incorrectly returns TRUE for d3. If I > only pass the numeric columns to apply, it is correct for d3. If I use a > for loop, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > > > d > d1 d2 d3 > 1 a 1 NA > 2 b 2 NA > 3 c 3 6 > > > > # results are incorrect > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d1 d2 d3 > FALSE TRUE TRUE > > > > # results are correct > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d2 d3 > TRUE FALSE > > > > # results are correct > > for(i in names(d)){ > + print(all(d[!is.na(d[,i]),i] <= 3)) > + } > [1] FALSE > [1] TRUE > [1] FALSE > > > Finally, if I remove the NA values from d3 and include the character > column in apply, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(4,5,6)) > > > > d > d1 d2 d3 > 1 a 1 4 > 2 b 2 5 > 3 c 3 6 > > > > # results are correct > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > d1 d2 d3 > FALSE TRUE FALSE > > > Can someone help me understand what's happening? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]