thr3ads.net - R devel - [Rd] 'merge' function: behavior w.r.t. NAs in the key column [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Simon Anders

2008-Mar-14 17:16 UTC

[Rd] 'merge' function: behavior w.r.t. NAs in the key column

Hi,

I recently ran into a problem with 'merge' that stems from the way how 
missing values in the key column (i.e., the column specified
in the "by" argument) are handled. I wonder whether the current
behavior
is fully consistent.

Please have a look at this example:
> x <- data.frame( key = c(1:3,3,NA,NA), val = 10+1:6 )
> y <- data.frame( key = c(NA,2:5,3,NA), val = 20+1:7 )
> x   key val
1   1  11
2   2  12
3   3  13
4   3  14
5  NA  15
6  NA  16
> y   key val
1  NA  21
2   2  22
3   3  23
4   4  24
5   5  25
6   3  26
7  NA  27
> merge( x, y, by="key" )   key val.x val.y
1   2    12    22
2   3    13    23
3   3    13    26
4   3    14    23
5   3    14    26
6  NA    15    21
7  NA    15    27
8  NA    16    21
9  NA    16    27

As one should expect, there are now four lines with key value '3',
because the key '3' appears twice both in x and in y. According to the
logic of merge, a row should be produced in the output for each pairing
of a row from x and a row from y where the values of 'key' are equal.

However, the 'NA' values are treated exactly the same way. It seems that
'merge' considers the pairing of lines with 'NA' in both
'key' columns
an allowed match. IMHO, this runs against the convention that two NAs 
are not considered equal. ('NA==NA' does not evaluate to
'TRUE'.)

Is might be more consistent if merge did not include any rows into the 
output with an "NA" in the key column.

Maybe, one could add a flag argument to 'merge' to switch between this 
behaviour and the current one? A note in the help page might be nice, too.

Best regards
   Simon



+---
| Dr. Simon Anders, Dipl. Phys.
| European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
| office phone +44-1223-494478, mobile phone +44-7505-841692
| preferred (permanent) e-mail: sanders at fs.tum.de

Bill Dunlap

2008-Mar-14 23:57 UTC

head link

[Rd] 'merge' function: behavior w.r.t. NAs in the key column

On Fri, 14 Mar 2008, Simon Anders wrote:
> I recently ran into a problem with 'merge' that stems from the way
how
> missing values in the key column (i.e., the column specified
> in the "by" argument) are handled. I wonder whether the current
behavior
> is fully consistent.
> ...
> > x <- data.frame( key = c(1:3,3,NA,NA), val = 10+1:6 )
> > y <- data.frame( key = c(NA,2:5,3,NA), val = 20+1:7 )
> ...
> > merge( x, y, by="key" )
>    key val.x val.y
> 1   2    12    22
> 2   3    13    23
> 3   3    13    26
> 4   3    14    23
> 5   3    14    26
> 6  NA    15    21
> 7  NA    15    27
> 8  NA    16    21
> 9  NA    16    27
>
> As one should expect, there are now four lines with key value '3',
> because the key '3' appears twice both in x and in y. According to
the
> logic of merge, a row should be produced in the output for each pairing
> of a row from x and a row from y where the values of 'key' are
equal.
>
> However, the 'NA' values are treated exactly the same way. It seems
that
> 'merge' considers the pairing of lines with 'NA' in both
'key' columns
> an allowed match. IMHO, this runs against the convention that two NAs
> are not considered equal. ('NA==NA' does not evaluate to
'TRUE'.)
>
> Is might be more consistent if merge did not include any rows into the
> output with an "NA" in the key column.
>
> Maybe, one could add a flag argument to 'merge' to switch between
this
> behaviour and the current one? A note in the help page might be nice, too.
Splus (versions 8.0, 7.0, and 6.2) gives:
   > merge( x, y, by="key" )
     key val.x val.y
   1   2    12    22
   2   3    13    23
   3   3    14    23
   4   3    13    26
   5   3    14    26
Is that what you expect?  There is no argument
to Splus's merge to make it include the NA's
in the way R's merge does.  Should there be such
an argument?

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and
do
 not necessarily reflect Insightful Corporation policy or position."

Seemingly Similar Threads

Search for more apparently analagous threads

R devel - Mar 2008 - 'merge' function: behavior w.r.t. NAs in the key column

[Rd] 'merge' function: behavior w.r.t. NAs in the key column

[Rd] 'merge' function: behavior w.r.t. NAs in the key column

Seemingly Similar Threads