Chris Beeley
2011-Jun-30 13:35 UTC
[R] Match strings across two differently sized dataframes and copy corresponding row to dataframe
Hello- Sorry, this is a bit of a noob question, but I can't seem to progress it any further. I have two dataframes which contain a series of strings which exactly match. The problem is one has more rows than the other (more cases have been added) and they have been sorted so that they are not in the same order. The smaller dataframe, though, contains in another column which has codes classifying the strings. So, for every row of the larger dataframe, I want to look up the string in the smaller dataframe, and then use that row number to copy across the code for the string into the larger dataframe. Here's my idea so far: # comments is the smaller dataframe with the codes, mydata is the larger dataframe to which I would like to copy it. commvec=charmatch(comments$ImproveOne, mydata$Improve) # this is the match between the strings one way datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the match the other way mydata$ImproveCat1=NA # produce a variable to hold the copied codes mydata$ImproveCat1[datavec[!is.na(datavec)]]comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non missing row numbers identified in the larger dataframe- # copy the corresponding code from the smaller dataframe (which lives in comments$ImproveCat However, the last command doesn't work because the variables are not the same length. They nearly are though, not sure if that's coincidence or shows I'm close length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567 length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512 I'm sorry, I did try to construct an example dataframe, but ironically I can't make that work either! Sorry! Any help gratefully received. Many thanks! Chris Beeley Institute of Mental Health, UK
jim holtman
2011-Jun-30 15:36 UTC
[R] Match strings across two differently sized dataframes and copy corresponding row to dataframe
?merge On Thu, Jun 30, 2011 at 9:35 AM, Chris Beeley <chris.beeley at gmail.com> wrote:> Hello- > > Sorry, this is a bit of a noob question, but I can't seem to progress > it any further. > > I have two dataframes which contain a series of strings which exactly > match. The problem is one has more rows than the other (more cases > have been added) and they have been sorted so that they are not in the > same order. The smaller dataframe, though, contains in another column > which has codes classifying the strings. > > So, for every row of the larger dataframe, I want to look up the > string in the smaller dataframe, and then use that row number to copy > across the code for the string into the larger dataframe. Here's my > idea so far: > > # comments is the smaller dataframe with the codes, mydata is the > larger dataframe to which I would like to copy it. > > commvec=charmatch(comments$ImproveOne, mydata$Improve) ?# this is the > match between the strings one way > datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the > match the other way > > mydata$ImproveCat1=NA # produce a variable to hold the copied codes > > mydata$ImproveCat1[datavec[!is.na(datavec)]]> comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non > missing row numbers identified in the larger dataframe- > # copy the corresponding code from the smaller dataframe (which lives > in comments$ImproveCat > > However, the last command doesn't work because the variables are not > the same length. They nearly are though, not sure if that's > coincidence or shows I'm close > > length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567 > > length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512 > > I'm sorry, I did try to construct an example dataframe, but ironically > I can't make that work either! Sorry! > > Any help gratefully received. > > Many thanks! > > Chris Beeley > Institute of Mental Health, UK > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?