thr3ads.net - R devel - [Rd] na.omit inconsistent with is.na on list [Aug 2021]

If this information is useful, please help other people find it:
Share via:

Toby Hocking

2021-Aug-14 20:48 UTC

[Rd] na.omit inconsistent with is.na on list

Some relevant information from ?is.na: the behavior for lists is
documented,

     For is.na, elementwise the result is false unless that element
     is a length-one atomic vector and the single element of that
     vector is regarded as NA or NaN (note that any is.na method
     for the class of the element is ignored).

Also there are other functions anyNA and is.na<- which are consistent with
is.na. That is, anyNA only returns TRUE if the list has an element which is
a scalar NA. And is.na<- sets list elements to logical NA to indicate
missingness.

On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage at
gmail.com>
wrote:
> The data.frame method deliberately skips non-atomic columns before
> invoking is.na(x) so I think it is fair to assume this behaviour is
> intentional and assumed.
>
> Not so clear to me that there is a sensible answer for list columns.
> (List columns seem to collide with the expectation that in each
> variable every observation will be of the same type)
>
> Consider your list L as
>
> L <- list(NULL, NA, c(NA, NA))
>
> Seems like every observation could have a claim to be 'missing'
here.
> Concretely, if a data.frame had a list column representing the lat-lon
> of an observation, we might only be able to represent missing values
> like c(NA, NA).
>
> On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at fedoraproject.org>
wrote:
> >
> > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at
gmail.com>
> wrote:
> > >
> > > Hi Toby,
> > >
> > > This definitely appears intentional, the first  expression of
> > > stats:::na.omit.default is
> > >
> > >    if (!is.atomic(object))
> > >
> > >         return(object)
> >
> > I don't follow your point. This only means that the *default*
method
> > is not intended for non-atomic cases, but it doesn't mean it
shouldn't
> > exist a method for lists.
> >
> > > So it is explicitly just returning the object in non-atomic
cases,
> which
> > > includes lists. I was not involved in this decision (obviously)
but my
> > > guess is that it is due to the fact that what constitutes an
> observation
> > > "being complete" in unclear in the list case. What
should
> > >
> > > na.omit(list(5, NA, c(NA, 5)))
> > >
> > > return? Just the first element, or the first and the last? It
seems, at
> > > least to me, unclear. A small change to the documentation to to
add
> "atomic
> >
> > > is.na(list(5, NA, c(NA, 5)))
> > [1] FALSE  TRUE FALSE
> >
> > Following Toby's argument, it's clear to me: the first and the
last.
> >
> > I?aki
> >
> > > (in the sense of is.atomic returning \code{TRUE})" in front
of
> "vectors"
> > > or similar  where what types of objects are supported seems
justified,
> > > though, imho, as the current documentation is either ambiguous or
> > > technically incorrect, depending on what we take
"vector" to mean.
> > >
> > > Best,
> > > ~G
> > >
> > > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at
gmail.com>
> wrote:
> > >
> > > > Also, the na.omit method for data.frame with list column
seems to be
> > > > inconsistent with is.na,
> > > >
> > > > > L <- list(NULL, NA, 0)
> > > > > str(f <- data.frame(I(L)))
> > > > 'data.frame': 3 obs. of  1 variable:
> > > >  $ L:List of 3
> > > >   ..$ : NULL
> > > >   ..$ : logi NA
> > > >   ..$ : num 0
> > > >   ..- attr(*, "class")= chr "AsIs"
> > > > > is.na(f)
> > > >          L
> > > > [1,] FALSE
> > > > [2,]  TRUE
> > > > [3,] FALSE
> > > > > na.omit(f)
> > > >    L
> > > > 1
> > > > 2 NA
> > > > 3  0
> > > >
> > > > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 at
gmail.com>
> wrote:
> > > >
> > > > > na.omit is documented as "na.omit returns the
object with
> incomplete
> > > > cases
> > > > > removed." and "At present these will handle
vectors," so I
> expected that
> > > > > when it is used on a list, it should return the same
thing as if we
> > > > subset
> > > > > via is.na; however I observed the following,
> > > > >
> > > > > > L <- list(NULL, NA, 0)
> > > > > > str(L[!is.na(L)])
> > > > > List of 2
> > > > >  $ : NULL
> > > > >  $ : num 0
> > > > > > str(na.omit(L))
> > > > > List of 3
> > > > >  $ : NULL
> > > > >  $ : logi NA
> > > > >  $ : num 0
> > > > >
> > > > > Should na.omit be fixed so that it returns a result
that is
> consistent
> > > > > with is.na? I assume that is.na is the canonical
definition of
> what
> > > > > should be considered a missing value in R.
> > > > >
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
> > --
> > I?aki ?car
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Gabriel Becker

2021-Aug-15 00:15 UTC

head link

[Rd] na.omit inconsistent with is.na on list

I understand what is.na does, the issue I have is that its task is not
equivalent to the conceptual task na.omit is doing, in my opinion, as
illustrated by what the data.frame method does.

Thus what i was getting at above about it not being clear that lst[is.na(lst)]
being the correct thing for na.omit to do

~G

~G

On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 at gmail.com> wrote:
> Some relevant information from ?is.na: the behavior for lists is
> documented,
>
>      For is.na, elementwise the result is false unless that element
>      is a length-one atomic vector and the single element of that
>      vector is regarded as NA or NaN (note that any is.na method
>      for the class of the element is ignored).
>
> Also there are other functions anyNA and is.na<- which are consistent
with
> is.na. That is, anyNA only returns TRUE if the list has an element which
> is
> a scalar NA. And is.na<- sets list elements to logical NA to indicate
> missingness.
>
> On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage at
gmail.com>
> wrote:
>
> > The data.frame method deliberately skips non-atomic columns before
> > invoking is.na(x) so I think it is fair to assume this behaviour is
> > intentional and assumed.
> >
> > Not so clear to me that there is a sensible answer for list columns.
> > (List columns seem to collide with the expectation that in each
> > variable every observation will be of the same type)
> >
> > Consider your list L as
> >
> > L <- list(NULL, NA, c(NA, NA))
> >
> > Seems like every observation could have a claim to be
'missing' here.
> > Concretely, if a data.frame had a list column representing the lat-lon
> > of an observation, we might only be able to represent missing values
> > like c(NA, NA).
> >
> > On Fri, 13 Aug 2021 at 17:27, I?aki Ucar <iucar at
fedoraproject.org>
> wrote:
> > >
> > > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker at
gmail.com>
> > wrote:
> > > >
> > > > Hi Toby,
> > > >
> > > > This definitely appears intentional, the first  expression
of
> > > > stats:::na.omit.default is
> > > >
> > > >    if (!is.atomic(object))
> > > >
> > > >         return(object)
> > >
> > > I don't follow your point. This only means that the *default*
method
> > > is not intended for non-atomic cases, but it doesn't mean it
shouldn't
> > > exist a method for lists.
> > >
> > > > So it is explicitly just returning the object in non-atomic
cases,
> > which
> > > > includes lists. I was not involved in this decision
(obviously) but
> my
> > > > guess is that it is due to the fact that what constitutes an
> > observation
> > > > "being complete" in unclear in the list case. What
should
> > > >
> > > > na.omit(list(5, NA, c(NA, 5)))
> > > >
> > > > return? Just the first element, or the first and the last?
It seems,
> at
> > > > least to me, unclear. A small change to the documentation to
to add
> > "atomic
> > >
> > > > is.na(list(5, NA, c(NA, 5)))
> > > [1] FALSE  TRUE FALSE
> > >
> > > Following Toby's argument, it's clear to me: the first
and the last.
> > >
> > > I?aki
> > >
> > > > (in the sense of is.atomic returning \code{TRUE})" in
front of
> > "vectors"
> > > > or similar  where what types of objects are supported seems
> justified,
> > > > though, imho, as the current documentation is either
ambiguous or
> > > > technically incorrect, depending on what we take
"vector" to mean.
> > > >
> > > > Best,
> > > > ~G
> > > >
> > > > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 at
gmail.com>
> > wrote:
> > > >
> > > > > Also, the na.omit method for data.frame with list
column seems to
> be
> > > > > inconsistent with is.na,
> > > > >
> > > > > > L <- list(NULL, NA, 0)
> > > > > > str(f <- data.frame(I(L)))
> > > > > 'data.frame': 3 obs. of  1 variable:
> > > > >  $ L:List of 3
> > > > >   ..$ : NULL
> > > > >   ..$ : logi NA
> > > > >   ..$ : num 0
> > > > >   ..- attr(*, "class")= chr "AsIs"
> > > > > > is.na(f)
> > > > >          L
> > > > > [1,] FALSE
> > > > > [2,]  TRUE
> > > > > [3,] FALSE
> > > > > > na.omit(f)
> > > > >    L
> > > > > 1
> > > > > 2 NA
> > > > > 3  0
> > > > >
> > > > > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking
<tdhock5 at gmail.com>
> > wrote:
> > > > >
> > > > > > na.omit is documented as "na.omit returns the
object with
> > incomplete
> > > > > cases
> > > > > > removed." and "At present these will
handle vectors," so I
> > expected that
> > > > > > when it is used on a list, it should return the
same thing as if
> we
> > > > > subset
> > > > > > via is.na; however I observed the following,
> > > > > >
> > > > > > > L <- list(NULL, NA, 0)
> > > > > > > str(L[!is.na(L)])
> > > > > > List of 2
> > > > > >  $ : NULL
> > > > > >  $ : num 0
> > > > > > > str(na.omit(L))
> > > > > > List of 3
> > > > > >  $ : NULL
> > > > > >  $ : logi NA
> > > > > >  $ : num 0
> > > > > >
> > > > > > Should na.omit be fixed so that it returns a
result that is
> > consistent
> > > > > > with is.na? I assume that is.na is the canonical
definition of
> > what
> > > > > > should be considered a missing value in R.
> > > > > >
> > > > >
> > > > >         [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > R-devel at r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > > >
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > >
> > >
> > > --
> > > I?aki ?car
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

R devel - Aug 2021 - na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list

[Rd] na.omit inconsistent with is.na on list