Marine Regis
2016-Jul-31 03:45 UTC
[R] Replace any values in a data frame based on another data frame
Hello, I have two data frames with different sizes but with the same number of columns.> df1 <- data.frame(col1 = c(1:6), col2 = c(rep("a", 3), rep("b", 3)), col3 = c(rep("AA", 2), rep("BB", 2), rep("CC", 2)), col4=c(1,8,6,9,7,6))> df1col1 col2 col3 col4 1 1 a AA 1 2 2 a AA 8 3 3 a BB 6 4 4 b BB 9 5 5 b CC 7 6 6 b CC 6> df2< - data.frame(col1 = c(1,3,5,6), col2 = c(rep("a", 2), rep("b", 2)), col3 = c(rep("AA", 1), rep("EE", 1), rep("FF", 1), rep("CC", 1)), col4=c(1,8,5,9))> df2col1 col2 col3 col4 1 1 a AA 1 2 3 a EE 8 3 5 b FF 5 4 6 b CC 9 Based on col1 and col2, how can I replace any values in col3 and col4 of df1 which don't match with those of df2 by the values of df2 In this example, the result will be:> df1col1 col2 col3 col4 1 1 a AA 1 2 2 a AA 8 3 3 a EE 8 4 4 b BB 9 5 5 b FF 5 6 6 b CC 9 Thanks a lot for your help. Have a nice day Marine [[alternative HTML version deleted]]
Bert Gunter
2016-Jul-31 06:15 UTC
[R] Replace any values in a data frame based on another data frame
Marine: Thanks for the reproducible example. I would not have fooled with this otherwise! 1. First note that your specification that you wish to replace only those values in col3 and col4 of df1 that don't match with those of df2 is irrelevant: if you replace those that do match you don't change them. This means that you want to replace the rows of df1 for which the first 2 columns of df1 match those of df2 with the corresponding rows of df2. 2. You need to understand and deal with factors to do this. By default, non-numeric columns will be read into a data frame as factors, which is what happens in your example. You need to first convert all such columns to character to make things work properly (or at least, I had to doing it as shown below). Also note that there was a typo in your code: "df2< -" should be "df2<-" (you inserted an extra space). Anyway, here is one way to go about it, but I suggest that you wait for other solutions before accepting it, as I would not be surprised if there were better ways. However, this should be reasonably fast, as everything is vectorized.> df1 <- data.frame(col1 = c(1:6), col2 = c(rep("a", 3), rep("b", 3)), col3 = c(rep("AA", 2), rep("BB", 2), rep("CC", 2)), col4=c(1,8,6,9,7,6)) > > df2<- data.frame(col1 = c(1,3,5,6), col2 = c(rep("a", 2), rep("b", 2)), col3 = c(rep("AA", 1), rep("EE", 1), rep("FF", 1), rep("CC", 1)), col4=c(1,8,5,9)) > > ## convert factors to character vectors > df1[,3] <- as.character(df1[,3]) > df2[,3] <- as.character(df2[,3]) > > ## concatenate first 2 columns of both df's to identify rows for matching > first <- with(df1,paste0(col1,col2)) > second <- with(df2,paste0(col1,col2)) > > ## now find the rows of the df1 that match those of df2 > wh <-match(first,second) > > ## Use indexing to substitute > df1[!is.na(wh),] <- df2[na.omit(wh),] > > df1col1 col2 col3 col4 1 1 a AA 1 2 2 a AA 8 3 3 a EE 8 4 4 b BB 9 5 5 b FF 5 6 6 b CC 9 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Jul 30, 2016 at 8:45 PM, Marine Regis <marine.regis at hotmail.fr> wrote:> Hello, > > I have two data frames with different sizes but with the same number of columns. > >> df1 <- data.frame(col1 = c(1:6), col2 = c(rep("a", 3), rep("b", 3)), col3 = c(rep("AA", 2), rep("BB", 2), rep("CC", 2)), col4=c(1,8,6,9,7,6)) > >> df1 > > col1 col2 col3 col4 > > 1 1 a AA 1 > > 2 2 a AA 8 > > 3 3 a BB 6 > > 4 4 b BB 9 > > 5 5 b CC 7 > > 6 6 b CC 6 > > > >> df2< - data.frame(col1 = c(1,3,5,6), col2 = c(rep("a", 2), rep("b", 2)), col3 = c(rep("AA", 1), rep("EE", 1), rep("FF", 1), rep("CC", 1)), col4=c(1,8,5,9)) > > > >> df2 > > col1 col2 col3 col4 > > 1 1 a AA 1 > > 2 3 a EE 8 > > 3 5 b FF 5 > > 4 6 b CC 9 > > > > > > Based on col1 and col2, how can I replace any values in col3 and col4 of df1 which don't match with those of df2 by the values of df2 > > > > In this example, the result will be: > > > >> df1 > > col1 col2 col3 col4 > > 1 1 a AA 1 > > 2 2 a AA 8 > > 3 3 a EE 8 > > 4 4 b BB 9 > > 5 5 b FF 5 > > 6 6 b CC 9 > > > > Thanks a lot for your help. > > Have a nice day > > Marine > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2016-Jul-31 06:38 UTC
[R] Replace any values in a data frame based on another data frame
Your use of HTML email corrupted your example slightly, but I was able to fix it. Please follow the Posting Guide and set your emails to Plain Text mode when posting to this mailing list in the future. Here is one way: # you have to be careful about mucking with factors # convert columns to factors after you have finished # changing values in them # start by not creating factors in the first place df1 <- data.frame( col1 = 1:6 , col2 = c( rep( "a", 3 ) , rep( "b", 3 ) ) , col3 = c( rep( "AA", 2 ) , rep( "BB", 2 ) , rep( "CC", 2 ) ) , col4 = c( 1, 8, 6, 9, 7, 6 ) , stringsAsFactors = FALSE ) df2 <- data.frame( col1 = c( 1, 3, 5, 6 ) , col2 = c( rep( "a", 2 ) , rep( "b", 2 ) ) , col3 = c( rep( "AA", 1 ) , rep( "EE", 1 ) , rep( "FF", 1 ) , rep( "CC", 1 ) ) , col4 = c( 1, 8, 5, 9 ) , stringsAsFactors = FALSE ) df3 <- merge( df1, df2, by = c( "col1", "col2" ), all = TRUE ) idx <- !is.na( df3$col3.y ) df3$col3.x[ idx ] <- df3$col3.y[ idx ] df3$col4.x[ idx ] <- df3$col4.y[ idx ] df3$col3.y <- df3$col4.y <- NULL names( df3 )[ 3:4 ] <- c( "col3", "col4" ) df3 On Sun, 31 Jul 2016, Marine Regis wrote:> Hello, > > I have two data frames with different sizes but with the same number of columns. > >> df1 <- data.frame(col1 = c(1:6), col2 = c(rep("a", 3), rep("b", 3)), col3 = c(rep("AA", 2), rep("BB", 2), rep("CC", 2)), col4=c(1,8,6,9,7,6)) > >> df1 > > col1 col2 col3 col4 > > 1 1 a AA 1 > > 2 2 a AA 8 > > 3 3 a BB 6 > > 4 4 b BB 9 > > 5 5 b CC 7 > > 6 6 b CC 6 > > > >> df2< - data.frame(col1 = c(1,3,5,6), col2 = c(rep("a", 2), rep("b", 2)), col3 = c(rep("AA", 1), rep("EE", 1), rep("FF", 1), rep("CC", 1)), col4=c(1,8,5,9)) > > > >> df2 > > col1 col2 col3 col4 > > 1 1 a AA 1 > > 2 3 a EE 8 > > 3 5 b FF 5 > > 4 6 b CC 9 > > > > > > Based on col1 and col2, how can I replace any values in col3 and col4 of df1 which don't match with those of df2 by the values of df2 > > > > In this example, the result will be: > > > >> df1 > > col1 col2 col3 col4 > > 1 1 a AA 1 > > 2 2 a AA 8 > > 3 3 a EE 8 > > 4 4 b BB 9 > > 5 5 b FF 5 > > 6 6 b CC 9 > > > > Thanks a lot for your help. > > Have a nice day > > Marine > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k