Hi! Does anybody know a string function that would calculate how many characters two strings share? I.e. ("Hello World","Hello Peter") would be 7. Thanks. Laetitia
On 1/9/10, Laetitia Schmid <laetitia.schmid at gmx.ch> wrote:> Does anybody know a string function that would calculate how many characters two strings share? I.e. ("Hello World","Hello Peter") would be 7. >Perhaps package ?stringr? has something related? Liviu
Laetitia, One approach: lettermatch <- function(stringA, stringB) { sum(unique(unlist(strsplit(stringA, ""))) %in% unique(unlist(strsplit(stringB, "")))) } lettermatch("Hello World","Hello Peter") yields 6, as the l is only singly counted. This treats uppercase and lowercase as different letters and counts how many of the unique letters in stringA show up in stringB. In another approach, letters are set to lowercase first. This I think gives you what you want: lettermatch2 <- function(stringA, stringB) { tb <- merge(as.data.frame(table(strsplit(tolower(stringA), ""))), as.data.frame(table(strsplit(tolower(stringB), ""))), by="Var1") sum(apply(tb[-1], 1, min)) } lettermatch("Hello World","Hello Peter") yields 7. Greg On 1/9/10 1:51 PM, Laetitia Schmid wrote:> Hi! > Does anybody know a string function that would calculate how many > characters two strings share? I.e. ("Hello World","Hello Peter") would > be 7. > Thanks. > Laetitia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Greg Hirson ghirson at ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616
Maybe I don't understand the question. I can think of four ways to count, none of which give me 7: a <- "Hello World" b <- "Hello Peter" #counting duplicates and the space: sa <- strsplit(a, split="")[[1]] sb <- strsplit(b, split="")[[1]] length(which(sb %in% sa == TRUE)) #counting the space but not the duplicates: sa <- unique(strsplit(a, split="")[[1]]) sb <- unique(strsplit(b, split="")[[1]]) length(which(sb %in% sa == TRUE)) #counting the duplicates but not the space: sa <- strsplit(a, split="")[[1]] sa <- sa[-which(sa == " ")] sb <- strsplit(b, split="")[[1]] sb <- sb[-which(sb ==" ")] length(which(sb %in% sa == TRUE)) #not counting duplicates or the space: sa <- unique(sa) sb <- unique(sb) length(which(sb %in% sa == TRUE)) What am I missing? On Sat, Jan 9, 2010 at 4:51 PM, Laetitia Schmid <laetitia.schmid at gmx.ch> wrote:> Hi! > Does anybody know a string function that would calculate how many characters > two strings share? I.e. ("Hello World","Hello Peter") would be 7. > Thanks. > Laetitia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org