thr3ads.net - R devel - [Rd] na.omit inconsistent with is.na on list [Aug 2021]

If this information is useful, please help other people find it:
Share via:

Gabriel Becker

2021-Aug-12 20:18 UTC

[Rd] na.omit inconsistent with is.na on list

Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

        return(object)


So it is explicitly just returning the object in non-atomic cases, which
includes lists. I was not involved in this decision (obviously) but my
guess is that it is due to the fact that what constitutes an observation
"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It seems, at
least to me, unclear. A small change to the documentation to to add "atomic
(in the sense of is.atomic returning \code{TRUE})" in front of
"vectors"
or similar  where what types of objects are supported seems justified,
though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>
wrote:
> Also, the na.omit method for data.frame with list column seems to be
> inconsistent with is.na,
>
> > L <- list(NULL, NA, 0)
> > str(f <- data.frame(I(L)))
> 'data.frame': 3 obs. of  1 variable:
>  $ L:List of 3
>   ..$ : NULL
>   ..$ : logi NA
>   ..$ : num 0
>   ..- attr(*, "class")= chr "AsIs"
> > is.na(f)
>          L
> [1,] FALSE
> [2,]  TRUE
> [3,] FALSE
> > na.omit(f)
>    L
> 1
> 2 NA
> 3  0
>
> On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at gmail.com>
wrote:
>
> > na.omit is documented as "na.omit returns the object with
incomplete
> cases
> > removed." and "At present these will handle vectors,"
so I expected that
> > when it is used on a list, it should return the same thing as if we
> subset
> > via is.na; however I observed the following,
> >
> > > L <- list(NULL, NA, 0)
> > > str(L[!is.na(L)])
> > List of 2
> >  $ : NULL
> >  $ : num 0
> > > str(na.omit(L))
> > List of 3
> >  $ : NULL
> >  $ : logi NA
> >  $ : num 0
> >
> > Should na.omit be fixed so that it returns a result that is consistent
> > with is.na? I assume that is.na is the canonical definition of what
> > should be considered a missing value in R.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Toby Hocking

2021-Aug-12 23:30 UTC

head link

[Rd] na.omit inconsistent with is.na on list

Hi Gabe thanks for the feedback.

On Thu, Aug 12, 2021 at 1:19 PM Gabriel Becker <gabembecker at gmail.com>
wrote:
> Hi Toby,
>
> This definitely appears intentional, the first  expression of
> stats:::na.omit.default is
>
>    if (!is.atomic(object))
>
>         return(object)
>
> Based on this code it does seem that the documentation could be clarifiedto say atomic vectors.
>
> So it is explicitly just returning the object in non-atomic cases, which
> includes lists. I was not involved in this decision (obviously) but my
> guess is that it is due to the fact that what constitutes an observation
> "being complete" in unclear in the list case. What should
>
> na.omit(list(5, NA, c(NA, 5)))
>
> return? Just the first element, or the first and the last? It seems, at
> least to me, unclear.
>I agree in principle/theory that it is unclear, but in practice is.na has
an un-ambiguous answer (if list element is scalar NA then it is considered
missing, otherwise not).
> A small change to the documentation to to add "atomic (in the sense of
> is.atomic returning \code{TRUE})" in front of "vectors"  or
similar  where
> what types of objects are supported seems justified, though, imho, as the
> current documentation is either ambiguous or technically incorrect,
> depending on what we take "vector" to mean.
>
> Best,
> ~G
>
> On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>
wrote:
>
>> Also, the na.omit method for data.frame with list column seems to be
>> inconsistent with is.na,
>>
>> > L <- list(NULL, NA, 0)
>> > str(f <- data.frame(I(L)))
>> 'data.frame': 3 obs. of  1 variable:
>>  $ L:List of 3
>>   ..$ : NULL
>>   ..$ : logi NA
>>   ..$ : num 0
>>   ..- attr(*, "class")= chr "AsIs"
>> > is.na(f)
>>          L
>> [1,] FALSE
>> [2,]  TRUE
>> [3,] FALSE
>> > na.omit(f)
>>    L
>> 1
>> 2 NA
>> 3  0
>>
>> On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at
gmail.com> wrote:
>>
>> > na.omit is documented as "na.omit returns the object with
incomplete
>> cases
>> > removed." and "At present these will handle
vectors," so I expected that
>> > when it is used on a list, it should return the same thing as if
we
>> subset
>> > via is.na; however I observed the following,
>> >
>> > > L <- list(NULL, NA, 0)
>> > > str(L[!is.na(L)])
>> > List of 2
>> >  $ : NULL
>> >  $ : num 0
>> > > str(na.omit(L))
>> > List of 3
>> >  $ : NULL
>> >  $ : logi NA
>> >  $ : num 0
>> >
>> > Should na.omit be fixed so that it returns a result that is
consistent
>> > with is.na? I assume that is.na is the canonical definition of
what
>> > should be considered a missing value in R.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
	[[alternative HTML version deleted]]

Iñaki Ucar

2021-Aug-13 07:26 UTC

head link

[Rd] na.omit inconsistent with is.na on list

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com>
wrote:>
> Hi Toby,
>
> This definitely appears intentional, the first  expression of
> stats:::na.omit.default is
>
>    if (!is.atomic(object))
>
>         return(object)
I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.
> So it is explicitly just returning the object in non-atomic cases, which
> includes lists. I was not involved in this decision (obviously) but my
> guess is that it is due to the fact that what constitutes an observation
> "being complete" in unclear in the list case. What should
>
> na.omit(list(5, NA, c(NA, 5)))
>
> return? Just the first element, or the first and the last? It seems, at
> least to me, unclear. A small change to the documentation to to add
"atomic
> is.na(list(5, NA, c(NA, 5)))[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki
> (in the sense of is.atomic returning \code{TRUE})" in front of
"vectors"
> or similar  where what types of objects are supported seems justified,
> though, imho, as the current documentation is either ambiguous or
> technically incorrect, depending on what we take "vector" to
mean.
>
> Best,
> ~G
>
> On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>
wrote:
>
> > Also, the na.omit method for data.frame with list column seems to be
> > inconsistent with is.na,
> >
> > > L <- list(NULL, NA, 0)
> > > str(f <- data.frame(I(L)))
> > 'data.frame': 3 obs. of  1 variable:
> >  $ L:List of 3
> >   ..$ : NULL
> >   ..$ : logi NA
> >   ..$ : num 0
> >   ..- attr(*, "class")= chr "AsIs"
> > > is.na(f)
> >          L
> > [1,] FALSE
> > [2,]  TRUE
> > [3,] FALSE
> > > na.omit(f)
> >    L
> > 1
> > 2 NA
> > 3  0
> >
> > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at
gmail.com> wrote:
> >
> > > na.omit is documented as "na.omit returns the object with
incomplete
> > cases
> > > removed." and "At present these will handle
vectors," so I expected that
> > > when it is used on a list, it should return the same thing as if
we
> > subset
> > > via is.na; however I observed the following,
> > >
> > > > L <- list(NULL, NA, 0)
> > > > str(L[!is.na(L)])
> > > List of 2
> > >  $ : NULL
> > >  $ : num 0
> > > > str(na.omit(L))
> > > List of 3
> > >  $ : NULL
> > >  $ : logi NA
> > >  $ : num 0
> > >
> > > Should na.omit be fixed so that it returns a result that is
consistent
> > > with is.na? I assume that is.na is the canonical definition of
what
> > > should be considered a missing value in R.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
I?aki ?car

R devel - Aug 2021 - na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list