Dear R Gurus, I am currently working on the two dataset ( A and B), they both have the same fields: ID , REGION, OFFICE, CSTART, CEND, NCYCLE, STATUS and CB. I want to merge the two data set by ID. The problem I have is that the in data A, the ID's are unique. However in the data set B, the ID's are not unique, thus some repeat themselves. How do I the merge or retrieve the common ones? Please advise. Kind Regards Peter South Africa +27 12 422 7357 +27 82 456 4669 Please Note: This email and its contents are subject to our email legal notice which can be viewed at http://www.sars.gov.za/Email_Disclaimer.pdf [[alternative HTML version deleted]]
try the following "merge" command. merge(A,B, by = intersect(names(A), names(B)), all.x=FALSE, all.y=FALSE) or merge(A,B, by = "ID", all.x=FALSE, all.y=FALSE) Dannemora On Wed, Aug 25, 2010 at 5:35 AM, Mangalani Peter Makananisa < pmakananisa@sars.gov.za> wrote:> Dear R Gurus, > > > > I am currently working on the two dataset ( A and B), they both have the > same fields: ID , REGION, OFFICE, CSTART, CEND, NCYCLE, STATUS and > CB. > > I want to merge the two data set by ID. The problem I have is that the > in data A, the ID's are unique. However in the data set B, the ID's are > not unique, thus some repeat themselves. > > > > How do I the merge or retrieve the common ones? > > Please advise. > > > > Kind Regards > > > > Peter > > > > South Africa > > +27 12 422 7357 > > +27 82 456 4669 > > > > > > > > > > > > > > > > > Please Note: This email and its contents are subject to our email legal > notice which can be viewed at http://www.sars.gov.za/Email_Disclaimer.pdf > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
First you need to clarify what you'd like to happen when the ID in B is not unique. What do you want the resulting dataframe to look like? Some possible answers involve using different options for merge() or using unique() to remove duplicates from B before merging. But at least to me, "merge or retrieve the common ones" isn't clear enough to be able to say which. Sarah On Wed, Aug 25, 2010 at 5:35 AM, Mangalani Peter Makananisa <pmakananisa at sars.gov.za> wrote:> Dear R Gurus, > > > > I am currently working on the two dataset ( A and B), they both have the > same fields: ? ?ID , REGION, OFFICE, CSTART, CEND, NCYCLE, STATUS and > CB. > > I want to merge the two data set by ID. The problem I have is that the > in data A, the ID's are unique. However in the data set B, the ID's are > not unique, thus some repeat themselves. > > > > How do I the merge or retrieve the common ones? > > Please advise. > > > > Kind Regards > > > > Peter >-- Sarah Goslee http://www.functionaldiversity.org
What do you want to happen when there are duplicates? A: ID X 1 a 2 b 3 c B: ID Y 1 x 2 y 2 z What happens to ID 1? 2? 3? in your desired output? The all.x and all.y options might be of use. Sarah On Wed, Aug 25, 2010 at 8:00 AM, Mangalani Peter Makananisa <pmakananisa at sars.gov.za> wrote:> I want to merge data set A and B, by merge(A,B, by = "ID"), however I am getting ?error massages, because the some ID's in A repeat themselves several time in data set B. Even if the ID's in B repeat themselves I want to be able to merge the two dataset and retrieve the intersection. > > Please help. > > -----Original Message----- > From: Sarah Goslee [mailto:sarah.goslee at gmail.com] > Sent: 25 August 2010 01:52 PM > To: Mangalani Peter Makananisa > Cc: r-help at r-project.org > Subject: Re: [R] Merging two data set in R, > > First you need to clarify what you'd like to happen when the ID in B > is not unique. What do you want the resulting dataframe to look > like? > > Some possible answers involve using different options for merge() or > using unique() to remove duplicates from B before merging. But > at least to me, "merge or retrieve the common ones" isn't clear > enough to be able to say which. > > Sarah > > On Wed, Aug 25, 2010 at 5:35 AM, Mangalani Peter Makananisa > <pmakananisa at sars.gov.za> wrote: >> Dear R Gurus, >> >> >> >> I am currently working on the two dataset ( A and B), they both have the >> same fields: ? ?ID , REGION, OFFICE, CSTART, CEND, NCYCLE, STATUS and >> CB. >> >> I want to merge the two data set by ID. The problem I have is that the >> in data A, the ID's are unique. However in the data set B, the ID's are >> not unique, thus some repeat themselves. >> >> >> >> How do I the merge or retrieve the common ones? >> >> Please advise. >> >> >> >> Kind Regards >> >> >> >> Peter >> >-- Sarah Goslee http://www.functionaldiversity.org
Almost. You'll need to handle the duplicate ID yourself since R has no way of knowing which one(s) to change to NA. As I already suggested, you can use unique() in conjunction with whatever logical rules you require for choosing those values. As I also already suggested, all.y and all.x are the options to merge() that you need to consider.> A <- data.frame(ID = c(1,2,3), X = c('a','b','c')) > B <- data.frame(ID = c(1,2,2), Y = c('x','y','z')) > merge(A, B, all.x=FALSE, all.y=TRUE)ID X Y 1 1 a x 2 2 b y 3 2 b z Just think how much easier this process would have been if you had provided a clear question with toy data and examples of what you'd tried in your first question. Sarah On Wed, Aug 25, 2010 at 8:24 AM, Mangalani Peter Makananisa <pmakananisa at sars.gov.za> wrote:> A: > ID X > 1 ?a > 2 ?b > 3 ?c > > > B: > ID Y > 1 ?x > 2 ?y > 2 ?z > > I would like to see something like this: > > Common = Merge(A,B) > Common > > ID ?X ? ?Y > 1 ? a ? ?x > 2 ? b ? ?y > 2 ?N/A ? z > > If it is possible, > >-- Sarah Goslee http://www.functionaldiversity.org