thr3ads.net - R devel - [Rd] na.omit inconsistent with is.na on list [Aug 2021]

If this information is useful, please help other people find it:
Share via:

Iñaki Ucar

2021-Aug-13 07:26 UTC

[Rd] na.omit inconsistent with is.na on list

On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at gmail.com>
wrote:>
> Hi Toby,
>
> This definitely appears intentional, the first  expression of
> stats:::na.omit.default is
>
>    if (!is.atomic(object))
>
>         return(object)
I don't follow your point. This only means that the *default* method
is not intended for non-atomic cases, but it doesn't mean it shouldn't
exist a method for lists.
> So it is explicitly just returning the object in non-atomic cases, which
> includes lists. I was not involved in this decision (obviously) but my
> guess is that it is due to the fact that what constitutes an observation
> "being complete" in unclear in the list case. What should
>
> na.omit(list(5, NA, c(NA, 5)))
>
> return? Just the first element, or the first and the last? It seems, at
> least to me, unclear. A small change to the documentation to to add
"atomic
> is.na(list(5, NA, c(NA, 5)))[1] FALSE  TRUE FALSE

Following Toby's argument, it's clear to me: the first and the last.

I?aki
> (in the sense of is.atomic returning \code{TRUE})" in front of
"vectors"
> or similar  where what types of objects are supported seems justified,
> though, imho, as the current documentation is either ambiguous or
> technically incorrect, depending on what we take "vector" to
mean.
>
> Best,
> ~G
>
> On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at gmail.com>
wrote:
>
> > Also, the na.omit method for data.frame with list column seems to be
> > inconsistent with is.na,
> >
> > > L <- list(NULL, NA, 0)
> > > str(f <- data.frame(I(L)))
> > 'data.frame': 3 obs. of  1 variable:
> >  $ L:List of 3
> >   ..$ : NULL
> >   ..$ : logi NA
> >   ..$ : num 0
> >   ..- attr(*, "class")= chr "AsIs"
> > > is.na(f)
> >          L
> > [1,] FALSE
> > [2,]  TRUE
> > [3,] FALSE
> > > na.omit(f)
> >    L
> > 1
> > 2 NA
> > 3  0
> >
> > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at
gmail.com> wrote:
> >
> > > na.omit is documented as "na.omit returns the object with
incomplete
> > cases
> > > removed." and "At present these will handle
vectors," so I expected that
> > > when it is used on a list, it should return the same thing as if
we
> > subset
> > > via is.na; however I observed the following,
> > >
> > > > L <- list(NULL, NA, 0)
> > > > str(L[!is.na(L)])
> > > List of 2
> > >  $ : NULL
> > >  $ : num 0
> > > > str(na.omit(L))
> > > List of 3
> > >  $ : NULL
> > >  $ : logi NA
> > >  $ : num 0
> > >
> > > Should na.omit be fixed so that it returns a result that is
consistent
> > > with is.na? I assume that is.na is the canonical definition of
what
> > > should be considered a missing value in R.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
I?aki ?car

Hugh Parsonage

2021-Aug-13 08:09 UTC

head link

[Rd] na.omit inconsistent with is.na on list

The data.frame method deliberately skips non-atomic columns before
invoking is.na(x) so I think it is fair to assume this behaviour is
intentional and assumed.

Not so clear to me that there is a sensible answer for list columns.
(List columns seem to collide with the expectation that in each
variable every observation will be of the same type)

Consider your list L as

L <- list(NULL, NA, c(NA, NA))

Seems like every observation could have a claim to be 'missing' here.
Concretely, if a data.frame had a list column representing the lat-lon
of an observation, we might only be able to represent missing values
like c(NA, NA).

On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org>
wrote:>
> On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at
gmail.com> wrote:
> >
> > Hi Toby,
> >
> > This definitely appears intentional, the first  expression of
> > stats:::na.omit.default is
> >
> >    if (!is.atomic(object))
> >
> >         return(object)
>
> I don't follow your point. This only means that the *default* method
> is not intended for non-atomic cases, but it doesn't mean it
shouldn't
> exist a method for lists.
>
> > So it is explicitly just returning the object in non-atomic cases,
which
> > includes lists. I was not involved in this decision (obviously) but my
> > guess is that it is due to the fact that what constitutes an
observation
> > "being complete" in unclear in the list case. What should
> >
> > na.omit(list(5, NA, c(NA, 5)))
> >
> > return? Just the first element, or the first and the last? It seems,
at
> > least to me, unclear. A small change to the documentation to to add
"atomic
>
> > is.na(list(5, NA, c(NA, 5)))
> [1] FALSE  TRUE FALSE
>
> Following Toby's argument, it's clear to me: the first and the
last.
>
> I?aki
>
> > (in the sense of is.atomic returning \code{TRUE})" in front of
"vectors"
> > or similar  where what types of objects are supported seems justified,
> > though, imho, as the current documentation is either ambiguous or
> > technically incorrect, depending on what we take "vector" to
mean.
> >
> > Best,
> > ~G
> >
> > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at
gmail.com> wrote:
> >
> > > Also, the na.omit method for data.frame with list column seems to
be
> > > inconsistent with is.na,
> > >
> > > > L <- list(NULL, NA, 0)
> > > > str(f <- data.frame(I(L)))
> > > 'data.frame': 3 obs. of  1 variable:
> > >  $ L:List of 3
> > >   ..$ : NULL
> > >   ..$ : logi NA
> > >   ..$ : num 0
> > >   ..- attr(*, "class")= chr "AsIs"
> > > > is.na(f)
> > >          L
> > > [1,] FALSE
> > > [2,]  TRUE
> > > [3,] FALSE
> > > > na.omit(f)
> > >    L
> > > 1
> > > 2 NA
> > > 3  0
> > >
> > > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at
gmail.com> wrote:
> > >
> > > > na.omit is documented as "na.omit returns the object
with incomplete
> > > cases
> > > > removed." and "At present these will handle
vectors," so I expected that
> > > > when it is used on a list, it should return the same thing
as if we
> > > subset
> > > > via is.na; however I observed the following,
> > > >
> > > > > L <- list(NULL, NA, 0)
> > > > > str(L[!is.na(L)])
> > > > List of 2
> > > >  $ : NULL
> > > >  $ : num 0
> > > > > str(na.omit(L))
> > > > List of 3
> > > >  $ : NULL
> > > >  $ : logi NA
> > > >  $ : num 0
> > > >
> > > > Should na.omit be fixed so that it returns a result that is
consistent
> > > > with is.na? I assume that is.na is the canonical definition
of what
> > > > should be considered a missing value in R.
> > > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> I?aki ?car
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

R devel - Aug 2021 - na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list