Hi all; Im working with two datasets in R, say data1 and data2. Both datasets are composed of several rows and columns (dataframe) and some of the rows are identical in both datasets. Im wondering if there is any way to remove from one set, say data1, the rows that are identical in the other set, say data2, using R? Thanks for any hint in advance Christian
You have not given enough info. Do the data sets have the same columns? If not, you need to tell us more about how you can tell whether one row of a data frame is `identical' to some row of another. Assuming the columns are the same between the two, the basic idea is to combine all columns into a single vector for each, then check which elements of one is in the other. Something like (code untested!): id1 <- do.call("paste", c(data1, sep=":") id2 <- do.call("paste", c(data2, sep=":") ## Rows of data1 that are in data2: r1 <- which(id1 %in% id2) ## Remove: data1.reduced <- data1[-r1,] Andy> From: Christian Mora > > Hi all; > Im working with two datasets in R, say data1 and data2. Both datasets > are composed of several rows and columns (dataframe) and some of the > rows are identical in both datasets. Im wondering if there is > any way to > remove from one set, say data1, the rows that are identical > in the other > set, say data2, using R? > Thanks for any hint in advance > Christian > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
In short, merge with all=FALSE followed by removal of redundant columns might do the trick. If rownames serve as common key, use the argument by=0. See http://tolstoy.newcastle.edu.au/R/help/04/07/1250.html and many other hits on http://maths.newcastle.edu.au/~rking/R/ On Tue, 2004-08-10 at 23:44, Christian Mora wrote:> Hi all; > Im working with two datasets in R, say data1 and data2. Both datasets > are composed of several rows and columns (dataframe) and some of the > rows are identical in both datasets. Im wondering if there is any way to > remove from one set, say data1, the rows that are identical in the other > set, say data2, using R? > Thanks for any hint in advance > Christian > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >