Juliet Hannah
2010-Aug-10 18:39 UTC
[R] partial match of one column in data frame to another character vector
Here is some data (dput output below)> myDataid group 1 D599 A 2 002-0004 B 3 F01932 A 18 F16 B 19 F28 A 20 A94 B and a vector of IDs (the full label).> fullID[1] "F16-284" "ACC-A94-AB" "ADAD599" "002-0004BCC" "CDCF01932.AB" "F28DDB" "NOMATCH-EX" For each id in myData, there could be a partial match in fullID. For example D599 in myData matches ADAD599. I would like to add a column to myData that contains the corresponding fullID or NA if a match was not found. Thanks for your help. Juliet # #Data # myData <- structure(list(id = structure(c(6L, 5L, 1L, 2L, 3L, 4L), .Label = c(" F01932", " F16 ", " F28 ", " A94", " 002-0004", " D599"), class = "factor"), group = structure(c(5L, 4L, 1L, 3L, 2L, 3L), .Label = c(" A", " A", " B", " B", " A"), class = "factor")), .Names = c("id", "group" ), class = "data.frame", row.names = c("1", "2", "3", "18", "19 ", "20 ")) fullID <- c("F16-284", "ACC-A94-AB", "ADAD599", "002-0004BCC", "CDCF01932.AB", "F28DDB", "NOMATCH-EX")
Henrique Dallazuanna
2010-Aug-10 18:52 UTC
[R] partial match of one column in data frame to another character vector
Try this:
myData$fullID <- sapply(gsub("^ +| +$", "", myData$id),
grep, x = fullID,
value = TRUE)
On Tue, Aug 10, 2010 at 3:39 PM, Juliet Hannah
<juliet.hannah@gmail.com>wrote:
> Here is some data (dput output below)
>
> > myData
> id group
> 1 D599 A
> 2 002-0004 B
> 3 F01932 A
> 18 F16 B
> 19 F28 A
> 20 A94 B
>
>
> and a vector of IDs (the full label).
>
> > fullID
> [1] "F16-284" "ACC-A94-AB" "ADAD599"
"002-0004BCC"
> "CDCF01932.AB" "F28DDB" "NOMATCH-EX"
>
> For each id in myData, there could be a partial match in fullID. For
> example D599 in myData matches ADAD599. I would like to add a column
> to myData that contains the corresponding fullID or NA if a match was
> not found.
>
> Thanks for your help.
>
> Juliet
>
> #
> #Data
> #
>
> myData <- structure(list(id = structure(c(6L, 5L, 1L, 2L, 3L, 4L),
> .Label = c(" F01932",
> " F16 ", " F28 ",
"
> A94",
> " 002-0004", " D599"), class = "factor"),
group = structure(c(5L,
> 4L, 1L, 3L, 2L, 3L), .Label = c(" A",
> " A", "
B",
> " B", " A"), class = "factor")),
.Names = c("id", "group"
> ), class = "data.frame", row.names = c("1",
"2", "3", "18", "19 ",
> "20 "))
>
> fullID <- c("F16-284", "ACC-A94-AB",
"ADAD599", "002-0004BCC",
> "CDCF01932.AB",
> "F28DDB", "NOMATCH-EX")
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]