Kari Ruohonen
2009-Sep-18 07:18 UTC
[R] merging data frames with matrix objects when missing cases
Hi, I have faced a problem with the merge() function when trying to merge two data frames that have a common index but the second one does not have cases for all indexes in the first one. With usual variables R fills in the missing cases with NA if all=T is requested. But if the variable is a matrix R seems to insert NA only to the first column of the matrix and fill in the rest of the columns by recycling the values. Here is a toy example:> df1<-data.frame(a=1:3,X1=I(matrix(1:6,ncol=2))) > df2<-data.frame(a=1:2,X2=I(matrix(11:14,ncol=2))) > merge(df1,df2)a X1.1 X1.2 X2.1 X2.2 1 1 1 4 11 13 2 2 2 5 12 14 # no all=T, missing cases are dropped> merge(df1,df2,all=T)a X1.1 X1.2 X2.1 X2.2 1 1 1 4 11 13 2 2 2 5 12 14 3 3 3 6 NA 13 # X2.1 set to NA correctly but X2.2 set to 13 by recycling. Can I somehow get the behaviour that the third row of the second matrix X2 in the above example would be filled with NA for all columns? None of the merge() options does not seem to provide a solution. regards, Kari
johannes rara
2009-Sep-18 17:41 UTC
[R] merging data frames with matrix objects when missing cases
This has something to do with your data.frame structure see> str(df1)'data.frame': 3 obs. of 2 variables: $ a : int 1 2 3 $ X1: 'AsIs' int [1:3, 1:2] 1 2 3 4 5 6> str(df2)'data.frame': 2 obs. of 2 variables: $ a : int 1 2 $ X2: 'AsIs' int [1:2, 1:2] 11 12 13 14 This seems to work> df1<-data.frame(a=1:3, b = 1:3, c = 4:6) > str(df1)'data.frame': 3 obs. of 3 variables: $ a: int 1 2 3 $ b: int 1 2 3 $ c: int 4 5 6> df2<-data.frame(a=1:2, d = 11:12, e = 13:14) > str(df2)'data.frame': 2 obs. of 3 variables: $ a: int 1 2 $ d: int 11 12 $ e: int 13 14> merge(df1,df2)a b c d e 1 1 1 4 11 13 2 2 2 5 12 14> merge(df1, df2, all=T)a b c d e 1 1 1 4 11 13 2 2 2 5 12 14 3 3 3 6 NA NA>2009/9/18 Kari Ruohonen <kari.ruohonen at utu.fi>:> Hi, > I have faced a problem with the merge() function when trying to merge > two data frames that have a common index but the second one does not > have cases for all indexes in the first one. With usual variables R > fills in the missing cases with NA if all=T is requested. But if the > variable is a matrix R seems to insert NA only to the first column of > the matrix and fill in the rest of the columns by recycling the values. > Here is a toy example: > >> df1<-data.frame(a=1:3,X1=I(matrix(1:6,ncol=2))) >> df2<-data.frame(a=1:2,X2=I(matrix(11:14,ncol=2))) >> merge(df1,df2) > ?a X1.1 X1.2 X2.1 X2.2 > 1 1 ? ?1 ? ?4 ? 11 ? 13 > 2 2 ? ?2 ? ?5 ? 12 ? 14 > # no all=T, missing cases are dropped > >> merge(df1,df2,all=T) > ?a X1.1 X1.2 X2.1 X2.2 > 1 1 ? ?1 ? ?4 ? 11 ? 13 > 2 2 ? ?2 ? ?5 ? 12 ? 14 > 3 3 ? ?3 ? ?6 ? NA ? 13 > # X2.1 set to NA correctly but X2.2 set to 13 by recycling. > > Can I somehow get the behaviour that the third row of the second matrix > X2 in the above example would be filled with NA for all columns? None of > the merge() options does not seem to provide a solution. > > regards, Kari > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >