Rui Barradas
2022-Dec-03 08:48 UTC
[R] interval between specific characters in a string...
?s 17:18 de 02/12/2022, Evan Cooch escreveu:> Was wondering if there is an 'efficient/elegant' way to do the following > (without tidyverse). Take a string > > abaaabbaaaaabaaab > > Its easy enough to count the number of times the character 'b' shows up > in the string, but...what I'm looking for is outputing the 'intervals' > between occurrences of 'b' (starting the counter at the beginning of the > string). So, for the preceding example, 'b' shows up in positions > > 2, 6, 7, 13, 17 > > So, the interval data would be: 2, 4, 1, 6, 4 > > My main approach has been to simply output positions (say, something > like unlist(gregexpr('b', target_string))), and 'do the math' between > successive positions. Can anyone suggest a more elegant approach? > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, I don't find your solution inelegant, it's even easy to write it as a one-line function. char_interval <- function(x, s) { lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y))) } target_string <-"abaaabbaaaaabaaab" char_interval('b', target_string) #> [[1]] #> [1] 2 4 1 6 4 Hope this helps, Rui Barradas
Perhaps it is worth pointing out that looping constructs like lapply() can
be avoided and the procedure vectorized by mimicking Martin Morgan's
solution:
## s is the string to be searched.
diff(c(0,grep('b',strsplit(s,'')[[1]])))
However, Martin's solution is simpler and likely even faster as the regex
engine is unneeded:
diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely
vectorized
This seems much preferable to me.
-- Bert
On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> ?s 17:18 de 02/12/2022, Evan Cooch escreveu:
> > Was wondering if there is an 'efficient/elegant' way to do the
following
> > (without tidyverse). Take a string
> >
> > abaaabbaaaaabaaab
> >
> > Its easy enough to count the number of times the character 'b'
shows up
> > in the string, but...what I'm looking for is outputing the
'intervals'
> > between occurrences of 'b' (starting the counter at the
beginning of the
> > string). So, for the preceding example, 'b' shows up in
positions
> >
> > 2, 6, 7, 13, 17
> >
> > So, the interval data would be: 2, 4, 1, 6, 4
> >
> > My main approach has been to simply output positions (say, something
> > like unlist(gregexpr('b', target_string))), and 'do the
math' between
> > successive positions. Can anyone suggest a more elegant approach?
> >
> > Thanks in advance...
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> I don't find your solution inelegant, it's even easy to write it as
a
> one-line function.
>
>
> char_interval <- function(x, s) {
> lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
> }
>
> target_string <-"abaaabbaaaaabaaab"
> char_interval('b', target_string)
> #> [[1]]
> #> [1] 2 4 1 6 4
>
>
> Hope this helps,
>
> Rui Barradas
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]