Juliet Hannah
2010-Aug-10 18:39 UTC
[R] partial match of one column in data frame to another character vector
Here is some data (dput output below)> myDataid group 1 D599 A 2 002-0004 B 3 F01932 A 18 F16 B 19 F28 A 20 A94 B and a vector of IDs (the full label).> fullID[1] "F16-284" "ACC-A94-AB" "ADAD599" "002-0004BCC" "CDCF01932.AB" "F28DDB" "NOMATCH-EX" For each id in myData, there could be a partial match in fullID. For example D599 in myData matches ADAD599. I would like to add a column to myData that contains the corresponding fullID or NA if a match was not found. Thanks for your help. Juliet # #Data # myData <- structure(list(id = structure(c(6L, 5L, 1L, 2L, 3L, 4L), .Label = c(" F01932", " F16 ", " F28 ", " A94", " 002-0004", " D599"), class = "factor"), group = structure(c(5L, 4L, 1L, 3L, 2L, 3L), .Label = c(" A", " A", " B", " B", " A"), class = "factor")), .Names = c("id", "group" ), class = "data.frame", row.names = c("1", "2", "3", "18", "19 ", "20 ")) fullID <- c("F16-284", "ACC-A94-AB", "ADAD599", "002-0004BCC", "CDCF01932.AB", "F28DDB", "NOMATCH-EX")
Henrique Dallazuanna
2010-Aug-10 18:52 UTC
[R] partial match of one column in data frame to another character vector
Try this: myData$fullID <- sapply(gsub("^ +| +$", "", myData$id), grep, x = fullID, value = TRUE) On Tue, Aug 10, 2010 at 3:39 PM, Juliet Hannah <juliet.hannah@gmail.com>wrote:> Here is some data (dput output below) > > > myData > id group > 1 D599 A > 2 002-0004 B > 3 F01932 A > 18 F16 B > 19 F28 A > 20 A94 B > > > and a vector of IDs (the full label). > > > fullID > [1] "F16-284" "ACC-A94-AB" "ADAD599" "002-0004BCC" > "CDCF01932.AB" "F28DDB" "NOMATCH-EX" > > For each id in myData, there could be a partial match in fullID. For > example D599 in myData matches ADAD599. I would like to add a column > to myData that contains the corresponding fullID or NA if a match was > not found. > > Thanks for your help. > > Juliet > > # > #Data > # > > myData <- structure(list(id = structure(c(6L, 5L, 1L, 2L, 3L, 4L), > .Label = c(" F01932", > " F16 ", " F28 ", " > A94", > " 002-0004", " D599"), class = "factor"), group = structure(c(5L, > 4L, 1L, 3L, 2L, 3L), .Label = c(" A", > " A", " B", > " B", " A"), class = "factor")), .Names = c("id", "group" > ), class = "data.frame", row.names = c("1", "2", "3", "18", "19 ", > "20 ")) > > fullID <- c("F16-284", "ACC-A94-AB", "ADAD599", "002-0004BCC", > "CDCF01932.AB", > "F28DDB", "NOMATCH-EX") > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]