Ok, it turns out that this is documented, even though it looks surprising.
First of all, the apply function will try to convert any object with
the dim attribute to a matrix(my intuition agrees with you that there
should be no conversion), so the first step of the apply function is
> as.matrix.data.frame(d)
d1 d2 d3
[1,] "a" "1" NA
[2,] "b" "2" NA
[3,] "c" "3" " 6"
Since the data frame `d` is a mixture of character and non-character
values, the non-character value will be converted to the character
using the function `format`. However, the problem is that the NA value
will also be formatted to the character
> format(c(NA, 6))
[1] "NA" " 6"
That's where the space comes from. It is purely for making the result
pretty... The character NA will be removed later, but the space is not
stripped. I would say this is not a good design, and it might be worth
not including the NA value in the format function. At the current
stage, I will suggest using the function `lapply` to do what you want.
> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
$d1
[1] FALSE
$d2
[1] TRUE
$d3
[1] FALSE
Everything should work as you expect.
Best,
Jiefei
On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 at gmail.com>
wrote:>
> Hi,
>
> I guess this can tell you what happens behind the scene
>
>
> > d<-data.frame(d1 = letters[1:3],
> + d2 = c(1,2,3),
> + d3 = c(NA,NA,6))
> > apply(d, 2, FUN=function(x)x)
> d1 d2 d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> > "a"<=3
> [1] FALSE
> > "2"<=3
> [1] TRUE
> > "6"<=3
> [1] FALSE
>
> Note that there is an additional space in the character value "
6",
> that's why your comparison fails. I do not understand why but this
> might be a bug in R
>
> Best,
> Jiefei
>
> On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> <r-help at r-project.org> wrote:
> >
> > Hello,
> >
> > I'm seeing unexpected behavior when using apply() compared to a
for loop when a character vector is part of the data subjected to the apply
statement. Below, I check whether all non-missing values are <= 3. If I
include a character column, apply incorrectly returns TRUE for d3. If I only
pass the numeric columns to apply, it is correct for d3. If I use a for loop, it
is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > + d2 = c(1,2,3),
> > + d3 = c(NA,NA,6))
> > >
> > > d
> > d1 d2 d3
> > 1 a 1 NA
> > 2 b 2 NA
> > 3 c 3 6
> > >
> > > # results are incorrect
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > d1 d2 d3
> > FALSE TRUE TRUE
> > >
> > > # results are correct
> > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > d2 d3
> > TRUE FALSE
> > >
> > > # results are correct
> > > for(i in names(d)){
> > + print(all(d[!is.na(d[,i]),i] <= 3))
> > + }
> > [1] FALSE
> > [1] TRUE
> > [1] FALSE
> >
> >
> > Finally, if I remove the NA values from d3 and include the character
column in apply, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > + d2 = c(1,2,3),
> > + d3 = c(4,5,6))
> > >
> > > d
> > d1 d2 d3
> > 1 a 1 4
> > 2 b 2 5
> > 3 c 3 6
> > >
> > > # results are correct
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > d1 d2 d3
> > FALSE TRUE FALSE
> >
> >
> > Can someone help me understand what's happening?
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.