Marc MarĂ Dell'Olmo
2013-Nov-08 00:01 UTC
[R] Merging two dataframes with a condition involving variables of both dataframes
Dear all, I would like to merge two dataframes using two conditions. For example, if I have the dataframes df1 and df2:> (df1 <- data.frame(var1=c("a","b","d","e","g"), var2=c(25,14,53,26,84),infodf1=c(1,1,1,1,1))) var1 var2 infodf1 1 a 25 1 2 b 14 1 3 d 53 1 4 e 26 1 5 g 84 1> (df2 <- data.frame(var1=c("a", "a", "c", "d", "e", "h","i"), var3=c(10,32, 14,55,2,53,6), var4=c(40,37, 54,70,30,98,10), infodf2=c(2,2,2,2,2,2,2))) var1 var3 var4 infodf2 1 a 10 40 2 2 a 32 37 2 3 c 14 54 2 4 d 55 70 2 5 e 2 30 2 6 h 53 98 2 7 i 6 10 2 I would like to obtain a new dataframe df3 merging df1 and df2 if var1(of df1)==var1(of df2) and if var3(of df2)<=var2(of df1)<=var4(of df2). Moreover, I would like to obtain a new data frame with all rows from df1 and df2 (i.e. full outer join: http://www.w3schools.com/sql/sql_join_full.asp) df3 should be: var1 var2 infodf1 var1 var3 var4 infodf2 1 a 25 1 a 10 40 2 2 e 26 1 e 2 30 2 3 b 14 1 NA NA NA NA 4 d 53 1 NA NA NA NA 5 g 84 1 NA NA NA NA 6 NA NA NA a 32 37 2 7 NA NA NA c 14 54 2 8 NA NA NA d 55 70 2 9 NA NA NA h 53 98 2 10 NA NA NA i 6 10 2 I cannot use "merge" because this function doesn't allow conditions. On the other hand, I have tried to use SQL code using the package sqldf with the follwing R syntax:> (df3 <- sqldf("select a.*, b.* FROM df1 a, df2 b WHERE a.var1 = b.var1AND b.var3 <= a.var2 AND a.var2 <= b.var4 ") ) var1 var2 infodf1 var1 var3 var4 infodf2 1 a 25 1 a 10 40 2 2 e 26 1 e 2 30 2 But sqldf doesn't support the use of the option: "full outer join" option.> (df3 <- sqldf("select a.*, b.* FROM df1 a full outer join df2 b WHEREa.var1 = b.var1 AND b.var3 <= a.var2 AND a.var2 <= b.var4 ") ) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: RIGHT and FULL OUTER JOINs are not currently supported) Does anyone know how I can solve my problem? thank you very much! Marc [[alternative HTML version deleted]]
Collin Lynch
2013-Nov-08 02:37 UTC
[R] Merging two dataframes with a condition involving variables of both dataframes
You might need to implement it as a nested pair of for loops using rbind. In essence iterate over the rows in df1 and each time find the matching row in df2. If none is found then add the df1 row by itself to the result. If one is then remove it from df2 and rbind both of them. Once done just merge in all rows that remain in df2. This would likely be slower than a sql-based method but is essentially the same algorithm. You can find advice on for-loops in R here: http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/ Best, Collin.