Sean O'Riordain
2006-May-20 12:32 UTC
[R] merge problem... extra lines appear in the presence of NAs
Good morning!
I've searched the docs etc... Am I doing something wrong or is this a bug?
I'm doing a merge of two dataframes and getting extra rows in the
resulting dataframe - the dataframes being merged might have NAs...
count <- 10
nacount <- 3
a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
names(a1) <- "mdate"
a1$value <- runif(count)
a1[floor(runif(nacount)*count),]$value <- NA
a2 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
names(a2) <- "mdate"
a2$value2 <- runif(count)
#a2[floor(runif(nacount)*count),]$value2 <- NA
> a1
mdate value
1 2005-06-09 NA
2 2005-06-02 0.5287683
3 2005-06-03 0.7563833
4 2005-06-09 NA
5 2005-06-05 0.1027646
6 2005-06-06 0.7775884
7 2005-06-07 0.2993592
8 2005-06-09 NA
9 2005-06-09 0.7434682
10 2005-06-10 0.2096477> a2
mdate value2
1 2005-06-01 0.5347852
2 2005-06-02 0.9322765
3 2005-06-03 0.9106499
4 2005-06-04 0.6810564
5 2005-06-05 0.5871867
6 2005-06-06 0.8123808
7 2005-06-07 0.9675379
8 2005-06-08 0.9470369
9 2005-06-09 0.7493767
10 2005-06-10 0.8864103> atot <- merge(a1,a2,all=T)
However, I find the following results to be quite un-intuitive - are
they correct? May I draw your attention to lines 9:12... Should
lines 9:11 be there?
> atot
mdate value value2
1 2005-06-01 NA 0.5347852
2 2005-06-02 0.5287683 0.9322765
3 2005-06-03 0.7563833 0.9106499
4 2005-06-04 NA 0.6810564
5 2005-06-05 0.1027646 0.5871867
6 2005-06-06 0.7775884 0.8123808
7 2005-06-07 0.2993592 0.9675379
8 2005-06-08 NA 0.9470369
9 2005-06-09 NA 0.7493767
10 2005-06-09 NA 0.7493767
11 2005-06-09 NA 0.7493767
12 2005-06-09 0.7434682 0.7493767
13 2005-06-10 0.2096477 0.8864103
Note with no NAs, it works perfectly and as expected...> a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> names(a1) <- "mdate"
> a1$value <- runif(count)
> #a1[floor(runif(nacount)*count),]$value <- NA
>
> atot <- merge(a1,a2,all=T)
>
> atot
mdate value value2
1 2005-06-01 0.35002519 0.5347852
2 2005-06-02 0.76318940 0.9322765
3 2005-06-03 0.32759570 0.9106499
4 2005-06-04 0.47218729 0.6810564
5 2005-06-05 0.74435374 0.5871867
6 2005-06-06 0.81415290 0.8123808
7 2005-06-07 0.04774783 0.9675379
8 2005-06-08 0.21799101 0.9470369
9 2005-06-09 0.99472758 0.7493767
10 2005-06-10 0.41974293 0.8864103
R started in each case with --vanilla
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status Patched
major 2
minor 3.0
year 2006
month 05
day 11
svn rev 38037
language R
version.string Version 2.3.0 Patched (2006-05-11 r38037)
win-xp-pro sp2 - binary installs from CRAN
it works in a similar way if I say
atot <- merge(a1,a2,by.x="mdate",by.y="mdate",all=T)
or even
atot <- merge(a1,a2,by="mdate",all=T)
also tested on versions 2.2.1, 2.3.0
cheers,
Sean O'Riordain
(ps. ctrl-v paste wouldn't work on 2.4.0-dev downloaded this morning -
didn't try very hard though)
Prof Brian Ripley
2006-May-20 12:58 UTC
[R] (Nothing to do with) merge problem... extra lines appear in the presence of NAs
I think you forgot to read over your own message before sending it: take a look at a1 which has FOUR rows with mdate == 2005-06-09. Those correspond to rows to 9:12 in the result, as you are merging on 'mdate'. You example is not reproducible, of course, since you used random values. Perhaps you intended a1[floor(runif(nacount)*count), "value"] <- NA On Sat, 20 May 2006, Sean O'Riordain wrote:> Good morning![Or afternoon in Europe, ....]> I've searched the docs etc... Am I doing something wrong or is this a bug? > > I'm doing a merge of two dataframes and getting extra rows in the > resulting dataframe - the dataframes being merged might have NAs... > > count <- 10 > nacount <- 3 > a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1)) > names(a1) <- "mdate" > a1$value <- runif(count) > a1[floor(runif(nacount)*count),]$value <- NA > > a2 <- as.data.frame(as.Date("2005-06-01")+0:(count-1)) > names(a2) <- "mdate" > a2$value2 <- runif(count) > #a2[floor(runif(nacount)*count),]$value2 <- NA > >> a1 > mdate value > 1 2005-06-09 NA > 2 2005-06-02 0.5287683 > 3 2005-06-03 0.7563833 > 4 2005-06-09 NA > 5 2005-06-05 0.1027646 > 6 2005-06-06 0.7775884 > 7 2005-06-07 0.2993592 > 8 2005-06-09 NA > 9 2005-06-09 0.7434682 > 10 2005-06-10 0.2096477 >> a2 > mdate value2 > 1 2005-06-01 0.5347852 > 2 2005-06-02 0.9322765 > 3 2005-06-03 0.9106499 > 4 2005-06-04 0.6810564 > 5 2005-06-05 0.5871867 > 6 2005-06-06 0.8123808 > 7 2005-06-07 0.9675379 > 8 2005-06-08 0.9470369 > 9 2005-06-09 0.7493767 > 10 2005-06-10 0.8864103 >> atot <- merge(a1,a2,all=T) > > However, I find the following results to be quite un-intuitive - are > they correct? May I draw your attention to lines 9:12... Should > lines 9:11 be there? > >> atot > mdate value value2 > 1 2005-06-01 NA 0.5347852 > 2 2005-06-02 0.5287683 0.9322765 > 3 2005-06-03 0.7563833 0.9106499 > 4 2005-06-04 NA 0.6810564 > 5 2005-06-05 0.1027646 0.5871867 > 6 2005-06-06 0.7775884 0.8123808 > 7 2005-06-07 0.2993592 0.9675379 > 8 2005-06-08 NA 0.9470369 > 9 2005-06-09 NA 0.7493767 > 10 2005-06-09 NA 0.7493767 > 11 2005-06-09 NA 0.7493767 > 12 2005-06-09 0.7434682 0.7493767 > 13 2005-06-10 0.2096477 0.8864103 > > Note with no NAs, it works perfectly and as expected... >> a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1)) >> names(a1) <- "mdate" >> a1$value <- runif(count) >> #a1[floor(runif(nacount)*count),]$value <- NA >> >> atot <- merge(a1,a2,all=T) >> >> atot > mdate value value2 > 1 2005-06-01 0.35002519 0.5347852 > 2 2005-06-02 0.76318940 0.9322765 > 3 2005-06-03 0.32759570 0.9106499 > 4 2005-06-04 0.47218729 0.6810564 > 5 2005-06-05 0.74435374 0.5871867 > 6 2005-06-06 0.81415290 0.8123808 > 7 2005-06-07 0.04774783 0.9675379 > 8 2005-06-08 0.21799101 0.9470369 > 9 2005-06-09 0.99472758 0.7493767 > 10 2005-06-10 0.41974293 0.8864103 > > R started in each case with --vanilla > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status Patched > major 2 > minor 3.0 > year 2006 > month 05 > day 11 > svn rev 38037 > language R > version.string Version 2.3.0 Patched (2006-05-11 r38037) > > win-xp-pro sp2 - binary installs from CRAN > > > it works in a similar way if I say > atot <- merge(a1,a2,by.x="mdate",by.y="mdate",all=T) > or even > atot <- merge(a1,a2,by="mdate",all=T) > > also tested on versions 2.2.1, 2.3.0 > > cheers, > Sean O'Riordain > > (ps. ctrl-v paste wouldn't work on 2.4.0-dev downloaded this morning - > didn't try very hard though) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595