Rui Barradas
2022-Dec-03 08:48 UTC
[R] interval between specific characters in a string...
?s 17:18 de 02/12/2022, Evan Cooch escreveu:> Was wondering if there is an 'efficient/elegant' way to do the following > (without tidyverse). Take a string > > abaaabbaaaaabaaab > > Its easy enough to count the number of times the character 'b' shows up > in the string, but...what I'm looking for is outputing the 'intervals' > between occurrences of 'b' (starting the counter at the beginning of the > string). So, for the preceding example, 'b' shows up in positions > > 2, 6, 7, 13, 17 > > So, the interval data would be: 2, 4, 1, 6, 4 > > My main approach has been to simply output positions (say, something > like unlist(gregexpr('b', target_string))), and 'do the math' between > successive positions. Can anyone suggest a more elegant approach? > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, I don't find your solution inelegant, it's even easy to write it as a one-line function. char_interval <- function(x, s) { lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y))) } target_string <-"abaaabbaaaaabaaab" char_interval('b', target_string) #> [[1]] #> [1] 2 4 1 6 4 Hope this helps, Rui Barradas
Perhaps it is worth pointing out that looping constructs like lapply() can be avoided and the procedure vectorized by mimicking Martin Morgan's solution: ## s is the string to be searched. diff(c(0,grep('b',strsplit(s,'')[[1]]))) However, Martin's solution is simpler and likely even faster as the regex engine is unneeded: diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely vectorized This seems much preferable to me. -- Bert On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> ?s 17:18 de 02/12/2022, Evan Cooch escreveu: > > Was wondering if there is an 'efficient/elegant' way to do the following > > (without tidyverse). Take a string > > > > abaaabbaaaaabaaab > > > > Its easy enough to count the number of times the character 'b' shows up > > in the string, but...what I'm looking for is outputing the 'intervals' > > between occurrences of 'b' (starting the counter at the beginning of the > > string). So, for the preceding example, 'b' shows up in positions > > > > 2, 6, 7, 13, 17 > > > > So, the interval data would be: 2, 4, 1, 6, 4 > > > > My main approach has been to simply output positions (say, something > > like unlist(gregexpr('b', target_string))), and 'do the math' between > > successive positions. Can anyone suggest a more elegant approach? > > > > Thanks in advance... > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > Hello, > > I don't find your solution inelegant, it's even easy to write it as a > one-line function. > > > char_interval <- function(x, s) { > lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y))) > } > > target_string <-"abaaabbaaaaabaaab" > char_interval('b', target_string) > #> [[1]] > #> [1] 2 4 1 6 4 > > > Hope this helps, > > Rui Barradas > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]