thr3ads.net - R devel - [Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Scott Ritchie

2018-Feb-15 22:08 UTC

[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Hi,

I was unable to find a bug report for this with a cursory search, but would
like clarification if this is intended or unavoidable behaviour:

```{r}
# Create example data.frames
parents <- data.frame(name=c("Sarah", "Max",
"Qin", "Lex"),
                      sex=c("F", "M", "F",
"M"),
                      age=c(41, 43, 36, 51))
children <- data.frame(parent=c("Sarah", "Max",
"Qin"),
                       name=c("Oliver", "Sebastian",
"Kai-lee"),
                       sex=c("M", "M", "F"),
                       age=c(5,8,7))

# Merge() creates a duplicated "name" column:
merge(parents, children, by.x = "name", by.y = "parent")
```

Output:
```
   name sex.x age.x      name sex.y age.y
1   Max     M    43 Sebastian     M     8
2   Qin     F    36   Kai-lee     F     7
3 Sarah     F    41    Oliver     M     5
Warning message:
In merge.data.frame(parents, children, by.x = "name", by.y =
"parent") :
  column name ?name? is duplicated in the result
```

Kind Regards,

Scott Ritchie

	[[alternative HTML version deleted]]

frederik at ofb.net

2018-Feb-16 16:53 UTC

head link

[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Hi Scott,

It seems like reasonable behavior to me. What result would you expect?
That the second "name" should be called "name.y"?

The "merge" documentation says:

    If the columns in the data frames not used in merging have any
    common names, these have ?suffixes? (?".x"? and ?".y"?
by default)
    appended to try to make the names of the result unique.

Since the first "name" column was used in merging, leaving both
without a suffix seems consistent with the documentation...

Frederick

On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie
wrote:> Hi,
> 
> I was unable to find a bug report for this with a cursory search, but would
> like clarification if this is intended or unavoidable behaviour:
> 
> ```{r}
> # Create example data.frames
> parents <- data.frame(name=c("Sarah", "Max",
"Qin", "Lex"),
>                       sex=c("F", "M", "F",
"M"),
>                       age=c(41, 43, 36, 51))
> children <- data.frame(parent=c("Sarah", "Max",
"Qin"),
>                        name=c("Oliver", "Sebastian",
"Kai-lee"),
>                        sex=c("M", "M", "F"),
>                        age=c(5,8,7))
> 
> # Merge() creates a duplicated "name" column:
> merge(parents, children, by.x = "name", by.y =
"parent")
> ```
> 
> Output:
> ```
>    name sex.x age.x      name sex.y age.y
> 1   Max     M    43 Sebastian     M     8
> 2   Qin     F    36   Kai-lee     F     7
> 3 Sarah     F    41    Oliver     M     5
> Warning message:
> In merge.data.frame(parents, children, by.x = "name", by.y =
"parent") :
>   column name ?name? is duplicated in the result
> ```
> 
> Kind Regards,
> 
> Scott Ritchie
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Scott Ritchie

2018-Feb-17 00:15 UTC

head link

[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Hi Frederick,

I would expect that any duplicate names in the resulting data.frame would
have the suffixes appended to them, regardless of whether or not they are
used as the join key. So in my example I would expect "names.x" and
"names.y" to indicate their source data.frame.

While careful reading of the documentation reveals this is not the case, I
would argue the intent of the suffixes functionality should equally be
applied to this type of case.

If you agree this would be useful, I'm happy to write a patch for
merge.data.frame that will add suffixes in this case - I intend to do the
same for merge.data.table in the data.table package where I initially
encountered the edge case.

Best,

Scott

On 17 February 2018 at 03:53, <frederik at ofb.net> wrote:
> Hi Scott,
>
> It seems like reasonable behavior to me. What result would you expect?
> That the second "name" should be called "name.y"?
>
> The "merge" documentation says:
>
>     If the columns in the data frames not used in merging have any
>     common names, these have ?suffixes? (?".x"? and
?".y"? by default)
>     appended to try to make the names of the result unique.
>
> Since the first "name" column was used in merging, leaving both
> without a suffix seems consistent with the documentation...
>
> Frederick
>
> On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie wrote:
> > Hi,
> >
> > I was unable to find a bug report for this with a cursory search, but
> would
> > like clarification if this is intended or unavoidable behaviour:
> >
> > ```{r}
> > # Create example data.frames
> > parents <- data.frame(name=c("Sarah", "Max",
"Qin", "Lex"),
> >                       sex=c("F", "M",
"F", "M"),
> >                       age=c(41, 43, 36, 51))
> > children <- data.frame(parent=c("Sarah", "Max",
"Qin"),
> >                        name=c("Oliver",
"Sebastian", "Kai-lee"),
> >                        sex=c("M", "M",
"F"),
> >                        age=c(5,8,7))
> >
> > # Merge() creates a duplicated "name" column:
> > merge(parents, children, by.x = "name", by.y =
"parent")
> > ```
> >
> > Output:
> > ```
> >    name sex.x age.x      name sex.y age.y
> > 1   Max     M    43 Sebastian     M     8
> > 2   Qin     F    36   Kai-lee     F     7
> > 3 Sarah     F    41    Oliver     M     5
> > Warning message:
> > In merge.data.frame(parents, children, by.x = "name", by.y =
"parent") :
> >   column name ?name? is duplicated in the result
> > ```
> >
> > Kind Regards,
> >
> > Scott Ritchie
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - Feb 2018 - Duplicate column names created by base::merge() when by.x has the same name as a column in y

[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

[Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Possibly Parallel Threads