I would like to compare two data sets saved as text files (example below) to determine if both sets are identical(or if dat2 is missing information that is included in dat1) and if they are not identical list what information is different between the two sets(ie output "a1", "a3" as the differing information). The overall purpose would be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the same. My R abilities are somewhat limited so any suggestions are greatly appreciated. Alysta dat1 a1 a2 a3 a4 a5 a6 dat2 a2 a4 a5 a6
Here is how to find the values in common:> dat1 <- paste('a', 1:6, sep='') > dat2 <- paste('a', c(2,4:6), sep='') > # find the data in common > intersect(dat1, dat2)[1] "a2" "a4" "a5" "a6">On 3/25/08, amarkey at uiuc.edu <amarkey at uiuc.edu> wrote:> I would like to compare two data sets saved as text files (example below) to determine if both sets are identical(or if dat2 is missing information that is included in dat1) and if they are not identical list what information is different between the two sets(ie output "a1", "a3" as the differing information). The overall purpose would be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the same. > My R abilities are somewhat limited so any suggestions are greatly appreciated. > > Alysta > > dat1 > a1 > a2 > a3 > a4 > a5 > a6 > > dat2 > a2 > a4 > a5 > a6 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
<amarkey at uiuc.edu> wrote in news:20080325101909.BDK93111 at expms2.cites.uiuc.edu:> I would like to compare two data sets saved as text files (example > below) to determine if both sets are identical(or if dat2 is missing > information that is included in dat1) and if they are not identical > list what information is different between the two sets(ie output > "a1", "a3" as the differing information). The overall purpose would > be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the > same. My R abilities are somewhat limited so any suggestions are > greatly appreciated.I do not understand what it would mean to remove elements so "they would look the same". Why wouldn't you just use the smaller set?> > Alysta > > dat1 > a1 > a2 > a3 > a4 > a5 > a6 > > dat2 > a2 > a4 > a5 > a6You might want to look at the %in% function. These examples created with neither dat1 nor dat2 being proper subsets of the other. dat1 <- paste('a', 1:6, sep='') dat2 <- paste('a', c(2,4:6,8,9,10), sep='')> dat1[1] "a1" "a2" "a3" "a4" "a5" "a6"> dat2[1] "a2" "a4" "a5" "a6" "a8" "a9" "a10" dat2 %in% dat1 #[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE dat1 %in% dat2 #[1] FALSE TRUE FALSE TRUE TRUE TRUE ### And then use the logical vectors as index arguments ### to first get the common elements> dat1[dat1 %in% dat2][1] "a2" "a4" "a5" "a6"> dat2[dat2 %in% dat1][1] "a2" "a4" "a5" "a6" ### And then to find the non-shared elements> dat2[!(dat2 %in% dat1)][1] "a8" "a9" "a10"> dat1[!(dat1 %in% dat2)][1] "a1" "a3" -- David Winsemius