Hello all, I am fairly new to R and am trying to bring together data from multiple sources. Here is one problem that I cannot seem to crack – I hope somebody can help. Let me simplify the problem: Let’s say I have two datasets: DATA1 and DATA2. I would like to work with all the cases in DATA2. I have additional variables on these cases in DATA1, which is a larger data set with many additional cases. I know how to merge data sets if the datasets contain the same cases. However, I want to eliminate all the cases from DATA1 that are not present in DATA2 and then merge. The CASEID is my matching variable, and there are no duplicate variable names. Any guidance would be greatly appreciated. Thanks in advance, Brian [[alternative HTML version deleted]]
Brian Perron wrote:> Hello all, > > I am fairly new to R and am trying to bring together data from multiple sources. Here is one problem that I cannot seem to crack ? I hope somebody can help. Let me simplify the problem: Let?s say I have two datasets: DATA1 and DATA2. I would like to work with all the cases in DATA2. I have additional variables on these cases in DATA1, which is a larger data set with many additional cases. I know how to merge data sets if the datasets contain the same cases. However, I want to eliminate all the cases from DATA1 that are not present in DATA2 and then merge. The CASEID is my matching variable, and there are no duplicate variable names. > Any guidance would be greatly appreciated.Take closer look at the all.x and all.y arguments in ?merge. Does this give what you want? merge(DATA1, DATA2, by="CASEID", all.x=FALSE, all.y=TRUE) -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894
Something like this?> data1 <- data.frame(id=c(1, 3, 5), x=runif(3)) > data2 <- data.frame(id=1:10, y=runif(10)) > data3 <- merge(data1, data2, by="id", all.x=TRUE, all.y=FALSE) > data3id x y 1 1 0.9533341 0.1803271 2 3 0.9143624 0.5033228 3 5 0.2866931 0.4233733 Andy From: Brian Perron> > Hello all, > > I am fairly new to R and am trying to bring together data > from multiple sources. Here is one problem that I cannot > seem to crack - I hope somebody can help. Let me simplify > the problem: Let's say I have two datasets: DATA1 and > DATA2. I would like to work with all the cases in DATA2. I > have additional variables on these cases in DATA1, which is a > larger data set with many additional cases. I know how to > merge data sets if the datasets contain the same cases. > However, I want to eliminate all the cases from DATA1 that > are not present in DATA2 and then merge. The CASEID is my > matching variable, and there are no duplicate variable names. > Any guidance would be greatly appreciated. > > Thanks in advance, > Brian > > > > > [[alternative HTML version deleted]] > >