Jake Elmstedt
2021-Jun-14 10:42 UTC
[R] Need to compare two columns in two data.frames and return all rows from df where rows values are missing
set.seed(123) df1 <- data.frame(name = sample(letters[1:4], 12, TRUE), score = runif(12)) head(df1) #> name score #> 1 c 0.67757064 #> 2 c 0.57263340 #> 3 c 0.10292468 #> 4 b 0.89982497 #> 5 c 0.24608773 #> 6 b 0.04205953 table(df1[["name"]]) #> #> a b c d #> 1 5 5 1 df2 <- data.frame(name = sample(letters[3:6], 12, TRUE), score = runif(12)) head(df2) #> name score #> 1 c 0.7584595 #> 2 e 0.2164079 #> 3 f 0.3181810 #> 4 d 0.2316258 #> 5 e 0.1428000 #> 6 d 0.4145463 table(df2[["name"]]) #> #> c d e f #> 3 4 3 2 df3 <- rbind(df1[!df1[["name"]] %in% df2[["name"]], ], df2[!df2[["name"]] %in% df1[["name"]], ]) head(df3) #> name score #> 4 b 0.89982497 #> 6 b 0.04205953 #> 7 b 0.32792072 #> 8 b 0.95450365 #> 10 a 0.69280341 #> 12 b 0.99426978 table(df3[["name"]]) #> #> a b e f #> 1 5 3 2
Bert Gunter
2021-Jun-15 02:50 UTC
[R] Need to compare two columns in two data.frames and return all rows from df where rows values are missing
merge(..., all = TRUE) essentially does this for you. rows with NA's are the non-matches: merge(df1,df2,by = "name",all = TRUE) name score.x score.y 1 a 0.69280341 NA 2 b 0.04205953 NA 3 b 0.32792072 NA 4 b 0.89982497 NA 5 b 0.95450365 NA 6 b 0.99426978 NA 7 c 0.67757064 0.4659625 8 c 0.67757064 0.4137243 9 c 0.67757064 0.7584595 10 c 0.24608773 0.4659625 11 c 0.24608773 0.4137243 12 c 0.24608773 0.7584595 13 c 0.10292468 0.4659625 14 c 0.10292468 0.4137243 15 c 0.10292468 0.7584595 16 c 0.88953932 0.4659625 17 c 0.88953932 0.4137243 18 c 0.88953932 0.7584595 19 c 0.57263340 0.4659625 20 c 0.57263340 0.4137243 21 c 0.57263340 0.7584595 22 d 0.64050681 0.2316258 23 d 0.64050681 0.4145463 24 d 0.64050681 0.2330341 25 d 0.64050681 0.3688455 26 e NA 0.1428000 27 e NA 0.2164079 28 e NA 0.1524447 29 f NA 0.3181810 30 f NA 0.1388061 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 14, 2021 at 7:09 PM Jake Elmstedt <jake.elmstedt at gmail.com> wrote:> set.seed(123) > > df1 <- data.frame(name = sample(letters[1:4], 12, TRUE), score = runif(12)) > head(df1) > #> name score > #> 1 c 0.67757064 > #> 2 c 0.57263340 > #> 3 c 0.10292468 > #> 4 b 0.89982497 > #> 5 c 0.24608773 > #> 6 b 0.04205953 > table(df1[["name"]]) > #> > #> a b c d > #> 1 5 5 1 > > df2 <- data.frame(name = sample(letters[3:6], 12, TRUE), score = runif(12)) > head(df2) > #> name score > #> 1 c 0.7584595 > #> 2 e 0.2164079 > #> 3 f 0.3181810 > #> 4 d 0.2316258 > #> 5 e 0.1428000 > #> 6 d 0.4145463 > table(df2[["name"]]) > #> > #> c d e f > #> 3 4 3 2 > > df3 <- rbind(df1[!df1[["name"]] %in% df2[["name"]], ], > df2[!df2[["name"]] %in% df1[["name"]], ]) > head(df3) > #> name score > #> 4 b 0.89982497 > #> 6 b 0.04205953 > #> 7 b 0.32792072 > #> 8 b 0.95450365 > #> 10 a 0.69280341 > #> 12 b 0.99426978 > table(df3[["name"]]) > #> > #> a b e f > #> 1 5 3 2 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]