`merge` lists sorted as if by character, not by the actual class of the
by-columns.
> tmp <-
merge(data.frame(f=ordered(c("a","b","b","a","b"),
levels=c("b","a")),
x=1:5),
data.frame(f=ordered(c("a","b"),
levels=c("b","a")),
y=c(10,20)))> tmp
f x y
1 a 1 10
2 a 4 10
3 b 2 20
4 b 3 20
5 b 5 20
> tmp[order(tmp$f),]
f x y
3 b 2 20
4 b 3 20
5 b 5 20
1 a 1 10
2 a 4 10
I expected the second order, not the first.
I actually ran into this issue when merging zoo yearmon columns, but
that adds a package dependency. In that context, I observed different
behavior depending on whether I had one key or two:
> library(zoo)
> d1 <- data.frame(date=as.yearmon(2000 + (0:5)/12), icpn=500, foo=1:6)
> d2 <- data.frame(date=as.yearmon(2000 + (0:5)/12), icpn=500, bar=10*1:6)
> merge(d1,d2)
date icpn foo bar
1 Apr 2000 500 4 40
2 Feb 2000 500 2 20
3 Jan 2000 500 1 10
4 Jun 2000 500 6 60
5 Mar 2000 500 3 30
6 May 2000 500 5 50
> d1 <- data.frame(date=as.yearmon(2000 + (0:5)/12), foo=1:6)
> d2 <- data.frame(date=as.yearmon(2000 + (0:5)/12), bar=10*1:6)
> merge(d1,d2)
date foo bar
1 Jan 2000 1 10
2 Feb 2000 2 20
3 Mar 2000 3 30
4 Apr 2000 4 40
5 May 2000 5 50
6 Jun 2000 6 60
The first example appears to sort by the name of the date, not by the
actual date value.
The documentation of `merge` says the sort is "lexicographic", but I
assumed that was in the cartesian-product sense, not in some
convert-everything-to-character sense.
Is this behavior expected?
Thanks,
Johann
P.S.
> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] grid splines stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ggplot2_0.8.8 reshape_0.8.3 Rauto_1.0 plyr_1.1
[5] zoo_1.6-4 Hmisc_3.7-0 survival_2.35-8 ascii_0.7
[9] proto_0.3-8
loaded via a namespace (and not attached):
[1] cluster_1.12.1 digest_0.4.2 lattice_0.17-26 tools_2.10.1