thr3ads.net - R devel - [Rd] xyTable(x,y) versus table(x,y) with NAs [Apr 2023]

If this information is useful, please help other people find it:
Share via:

Viechtbauer, Wolfgang (NP)

2023-Apr-25 08:24 UTC

[Rd] xyTable(x,y) versus table(x,y) with NAs

Hi all,

Posted this many years ago
(https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html), but either
this slipped under the radar or my feeble mind is unable to understand what
xyTable() is doing here and nobody bothered to correct me. I now stumbled again
across this issue.

x <- c(1, 1, 2, 2,  2, 3)
y <- c(1, 2, 1, 3, NA, 3)
table(x, y, useNA="always")
xyTable(x, y)

Why does xyTable() report that there are NA instances of (2,3)? I could
understand the logic that the NA could be anything, including a 3, so the
$number value for (2,3) is therefore unknown, but then the same should apply so
(2,1), but here $number is 1, so the logic is then inconsistent.

I stared at the xyTable code for a while and I suspect this is coming from
order() using na.last=TRUE by default, but in any case, to me the behavior above
is surprising.

Best,
Wolfgang

Serguei Sokol

2023-Apr-25 09:30 UTC

head link

[Rd] xyTable(x,y) versus table(x,y) with NAs

Le 25/04/2023 ? 10:24, Viechtbauer, Wolfgang (NP) a
?crit?:> Hi all,
>
> Posted this many years ago
(https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html), but either
this slipped under the radar or my feeble mind is unable to understand what
xyTable() is doing here and nobody bothered to correct me. I now stumbled again
across this issue.
>
> x <- c(1, 1, 2, 2,  2, 3)
> y <- c(1, 2, 1, 3, NA, 3)
> table(x, y, useNA="always")
> xyTable(x, y)
>
> Why does xyTable() report that there are NA instances of (2,3)? I could
understand the logic that the NA could be anything, including a 3, so the
$number value for (2,3) is therefore unknown, but then the same should apply so
(2,1), but here $number is 1, so the logic is then inconsistent.
>
> I stared at the xyTable code for a while and I suspect this is coming from
order() using na.last=TRUE by default, but in any case, to me the behavior above
is surprising.Not really. The variable 'first' in xyTable() is supposed to detect 
positions of first values in repeated pair sequences. Then it is used to 
retained only their indexes in a vector of type 1:n. Finally, by taking 
diff(), a number of repeated pairs is obtained. However, as 'first' will
contain one NA? for your example, the diff() call will produce two NAs 
by taking the difference with precedent and following number. Hence, the 
result.

Here is a slightly modified code ox xyTable to handle NA too.

xyTableNA <- function (x, y = NULL, digits)
{
 ??? x <- xy.coords(x, y, setLab = FALSE)
 ??? y <- signif(x$y, digits = digits)
 ??? x <- signif(x$x, digits = digits)
 ??? n <- length(x)
 ??? number <- if (n > 0) {
 ??????? orderxy <- order(x, y)
 ??????? x <- x[orderxy]
 ??????? y <- y[orderxy]
 ??????? first <- c(TRUE, (x[-1L] != x[-n]) | (y[-1L] != y[-n]))
 ??????? firstNA <- c(TRUE, xor(is.na(x[-1L]), is.na(x[-n])) | 
xor(is.na(y[-1L]), is.na(y[-n])))
 ??????? first[firstNA] <- TRUE
 ??????? first[is.na(first) | isFALSE(first)] <- FALSE
 ??????? x <- x[first]
 ??????? y <- y[first]
 ??????? diff(c((1L:n)[first], n + 1L))
 ??? }
 ??? else integer()
 ??? list(x = x, y = y, number = number)
}

Best,
Serguei.
>
> Best,
> Wolfgang
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Bill Dunlap

2023-Apr-25 15:39 UTC

head link

[Rd] xyTable(x,y) versus table(x,y) with NAs

x <- c(1, 1, 2, 2,  2, 3)
y <- c(1, 2, 1, 3, NA, 3)> str(xyTable(x,y))List of 3
 $ x     : num [1:6] 1 1 2 2 NA 3
 $ y     : num [1:6] 1 2 1 3 NA 3
 $ number: int [1:6] 1 1 1 NA NA 1


How many (2,3)s do we have?  At least one, the third entry, but the fourth
entry, (2,NA), is possibly a (2,3) so we don't know and make the count NA.
I suspect this is not the intended logic, but a byproduct of finding value
changes in a sorted vector with the idiom x[-1]!=x[-length(x).  Also the
following does follow that logic:
> x <- c(1, 1, 2, 2,  5, 6)
> y <- c(2, 2, 2, 4, NA, 3)
> str(xyTable(x,y))List of 3
 $ x     : num [1:5] 1 2 2 5 6
 $ y     : num [1:5] 2 2 4 NA 3
 $ number: int [1:5] 2 1 1 1 1



table() does not use this logic, as one NA in a vector would make all the
counts NA.  Should xyTable have a way to handle NAs the way table() does?

-Bill

On Tue, Apr 25, 2023 at 1:26?AM Viechtbauer, Wolfgang (NP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> Hi all,
>
> Posted this many years ago (
> https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html), but
> either this slipped under the radar or my feeble mind is unable to
> understand what xyTable() is doing here and nobody bothered to correct me.
> I now stumbled again across this issue.
>
> x <- c(1, 1, 2, 2,  2, 3)
> y <- c(1, 2, 1, 3, NA, 3)
> table(x, y, useNA="always")
> xyTable(x, y)
>
> Why does xyTable() report that there are NA instances of (2,3)? I could
> understand the logic that the NA could be anything, including a 3, so the
> $number value for (2,3) is therefore unknown, but then the same should
> apply so (2,1), but here $number is 1, so the logic is then inconsistent.
>
> I stared at the xyTable code for a while and I suspect this is coming from
> order() using na.last=TRUE by default, but in any case, to me the behavior
> above is surprising.
>
> Best,
> Wolfgang
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R devel - Apr 2023 - xyTable(x,y) versus table(x,y) with NAs

[Rd] xyTable(x,y) versus table(x,y) with NAs

[Rd] xyTable(x,y) versus table(x,y) with NAs

[Rd] xyTable(x,y) versus table(x,y) with NAs

Seemingly Similar Threads