Ok, it turns out that this is documented, even though it looks surprising. First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is> as.matrix.data.frame(d)d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6" Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character> format(c(NA, 6))[1] "NA" " 6" That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want.> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))$d1 [1] FALSE $d2 [1] TRUE $d3 [1] FALSE Everything should work as you expect. Best, Jiefei On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 at gmail.com> wrote:> > Hi, > > I guess this can tell you what happens behind the scene > > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > apply(d, 2, FUN=function(x)x) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > "a"<=3 > [1] FALSE > > "2"<=3 > [1] TRUE > > "6"<=3 > [1] FALSE > > Note that there is an additional space in the character value " 6", > that's why your comparison fails. I do not understand why but this > might be a bug in R > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > <r-help at r-project.org> wrote: > > > > Hello, > > > > I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 NA > > 2 b 2 NA > > 3 c 3 6 > > > > > > # results are incorrect > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE TRUE > > > > > > # results are correct > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d2 d3 > > TRUE FALSE > > > > > > # results are correct > > > for(i in names(d)){ > > + print(all(d[!is.na(d[,i]),i] <= 3)) > > + } > > [1] FALSE > > [1] TRUE > > [1] FALSE > > > > > > Finally, if I remove the NA values from d3 and include the character column in apply, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(4,5,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 4 > > 2 b 2 5 > > 3 c 3 6 > > > > > > # results are correct > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE FALSE > > > > > > Can someone help me understand what's happening? > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
Derickson, Ryan, VHA NCOD
2021-Oct-08 18:24 UTC
[R] [EXTERNAL] Re: unexpected behavior in apply
This is interesting and does seem suboptimal. Especially because if I start with a matrix from the beginning, it behaves as expected.> d<-data.frame(d1 = letters[1:3],+ d2 = c("1","2","3"), + d3 = c(NA,NA,"6"))> > str(d)'data.frame': 3 obs. of 3 variables: $ d1: chr "a" "b" "c" $ d2: chr "1" "2" "3" $ d3: chr NA NA "6"> > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))d1 d2 d3 FALSE TRUE FALSE -----Original Message----- From: Jiefei Wang <szwjf08 at gmail.com> Sent: Friday, October 8, 2021 2:22 PM To: Derickson, Ryan, VHA NCOD <Ryan.Derickson at va.gov> Cc: r-help at r-project.org Subject: [EXTERNAL] Re: [R] unexpected behavior in apply Ok, it turns out that this is documented, even though it looks surprising. First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is> as.matrix.data.frame(d)d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6" Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character> format(c(NA, 6))[1] "NA" " 6" That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want.> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))$d1 [1] FALSE $d2 [1] TRUE $d3 [1] FALSE Everything should work as you expect. Best, Jiefei On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 at gmail.com> wrote:> > Hi, > > I guess this can tell you what happens behind the scene > > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > apply(d, 2, FUN=function(x)x) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > "a"<=3 > [1] FALSE > > "2"<=3 > [1] TRUE > > "6"<=3 > [1] FALSE > > Note that there is an additional space in the character value " 6", > that's why your comparison fails. I do not understand why but this > might be a bug in R > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > <r-help at r-project.org> wrote: > > > > Hello, > > > > I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 NA > > 2 b 2 NA > > 3 c 3 6 > > > > > > # results are incorrect > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE TRUE > > > > > > # results are correct > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d2 d3 > > TRUE FALSE > > > > > > # results are correct > > > for(i in names(d)){ > > + print(all(d[!is.na(d[,i]),i] <= 3)) } > > [1] FALSE > > [1] TRUE > > [1] FALSE > > > > > > Finally, if I remove the NA values from d3 and include the character column in apply, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(4,5,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 4 > > 2 b 2 5 > > 3 c 3 6 > > > > > > # results are correct > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > d1 d2 d3 > > FALSE TRUE FALSE > > > > > > Can someone help me understand what's happening? > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7C%7Cd4c50 > > d8f8da547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7 > > C0%7C637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3KAp > > Y5pdxAh5BzVZvjyrQKTpqkigQmW8N7pmU7DQGcU%3D&reserved=0 > > PLEASE do read the posting guide > > https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww > > .r-project.org%2Fposting-guide.html&data=04%7C01%7C%7Cd4c50d8f8d > > a547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7C0%7C > > 637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI > > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mgrquTpZU > > SQt7cGywiHtaKWrdqAjvaG4gFx9aD7nRlA%3D&reserved=0 > > and provide commented, minimal, self-contained, reproducible code.
Hi it is not surprising at all. from apply documentation Arguments X an array, including a matrix. data.frame is not matrix or array (even if it rather resembles one) So if you put a cake into oven you cannot expect getting fried potatoes from it. For data frames sapply or lapply is preferable as it is designed for lists and data frame is (again from documentation) A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame".> sapply(d,function(x) all(x[!is.na(x)]<=3))d1 d2 d3 FALSE TRUE FALSE Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Jiefei Wang > Sent: Friday, October 8, 2021 8:22 PM > To: Derickson, Ryan, VHA NCOD <Ryan.Derickson at va.gov> > Cc: r-help at r-project.org > Subject: Re: [R] unexpected behavior in apply > > Ok, it turns out that this is documented, even though it looks surprising. > > First of all, the apply function will try to convert any object with thedim> attribute to a matrix(my intuition agrees with you that there should be no > conversion), so the first step of the apply function is > > > as.matrix.data.frame(d) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > Since the data frame `d` is a mixture of character and non-charactervalues,> the non-character value will be converted to the character using thefunction> `format`. However, the problem is that the NA value will also be formattedto> the character > > > format(c(NA, 6)) > [1] "NA" " 6" > > That's where the space comes from. It is purely for making the resultpretty...> The character NA will be removed later, but the space is not stripped. Iwould> say this is not a good design, and it might be worth not including the NAvalue> in the format function. At the current stage, I will suggest using thefunction> `lapply` to do what you want. > > > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) > $d1 > [1] FALSE > $d2 > [1] TRUE > $d3 > [1] FALSE > > Everything should work as you expect. > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 at gmail.com> wrote: > > > > Hi, > > > > I guess this can tell you what happens behind the scene > > > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > apply(d, 2, FUN=function(x)x) > > d1 d2 d3 > > [1,] "a" "1" NA > > [2,] "b" "2" NA > > [3,] "c" "3" " 6" > > > "a"<=3 > > [1] FALSE > > > "2"<=3 > > [1] TRUE > > > "6"<=3 > > [1] FALSE > > > > Note that there is an additional space in the character value " 6", > > that's why your comparison fails. I do not understand why but this > > might be a bug in R > > > > Best, > > Jiefei > > > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > > <r-help at r-project.org> wrote: > > > > > > Hello, > > > > > > I'm seeing unexpected behavior when using apply() compared to a for > loop when a character vector is part of the data subjected to the apply > statement. Below, I check whether all non-missing values are <= 3. If I > include a character column, apply incorrectly returns TRUE for d3. If Ionly> pass the numeric columns to apply, it is correct for d3. If I use a forloop, it is> correct. > > > > > > > d<-data.frame(d1 = letters[1:3], > > > + d2 = c(1,2,3), > > > + d3 = c(NA,NA,6)) > > > > > > > > d > > > d1 d2 d3 > > > 1 a 1 NA > > > 2 b 2 NA > > > 3 c 3 6 > > > > > > > > # results are incorrect > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > > d1 d2 d3 > > > FALSE TRUE TRUE > > > > > > > > # results are correct > > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > > d2 d3 > > > TRUE FALSE > > > > > > > > # results are correct > > > > for(i in names(d)){ > > > + print(all(d[!is.na(d[,i]),i] <= 3)) } > > > [1] FALSE > > > [1] TRUE > > > [1] FALSE > > > > > > > > > Finally, if I remove the NA values from d3 and include the character > column in apply, it is correct. > > > > > > > d<-data.frame(d1 = letters[1:3], > > > + d2 = c(1,2,3), > > > + d3 = c(4,5,6)) > > > > > > > > d > > > d1 d2 d3 > > > 1 a 1 4 > > > 2 b 2 5 > > > 3 c 3 6 > > > > > > > > # results are correct > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > > d1 d2 d3 > > > FALSE TRUE FALSE > > > > > > > > > Can someone help me understand what's happening? > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.