thr3ads.net - R help - [R] interval between specific characters in a string... [Dec 2022]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2022-Dec-03 15:21 UTC

[R] interval between specific characters in a string...

Perhaps it is worth pointing out that looping constructs like lapply() can
be avoided and the procedure vectorized by mimicking Martin Morgan's
solution:

## s is the string to be searched.
diff(c(0,grep('b',strsplit(s,'')[[1]])))

However, Martin's solution is simpler and likely even faster as the regex
engine is unneeded:

diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely
vectorized

This seems much preferable to me.

-- Bert





On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> ?s 17:18 de 02/12/2022, Evan Cooch escreveu:
> > Was wondering if there is an 'efficient/elegant' way to do the
following
> > (without tidyverse). Take a string
> >
> > abaaabbaaaaabaaab
> >
> > Its easy enough to count the number of times the character 'b'
shows up
> > in the string, but...what I'm looking for is outputing the
'intervals'
> > between occurrences of 'b' (starting the counter at the
beginning of the
> > string). So, for the preceding example, 'b' shows up in
positions
> >
> > 2, 6, 7, 13, 17
> >
> > So, the interval data would be: 2, 4, 1, 6, 4
> >
> > My main approach has been to simply output positions (say, something
> > like unlist(gregexpr('b', target_string))), and 'do the
math' between
> > successive positions. Can anyone suggest a more elegant approach?
> >
> > Thanks in advance...
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> I don't find your solution inelegant, it's even easy to write it as
a
> one-line function.
>
>
> char_interval <- function(x, s) {
>    lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
> }
>
> target_string <-"abaaabbaaaaabaaab"
> char_interval('b', target_string)
> #> [[1]]
> #> [1] 2 4 1 6 4
>
>
> Hope this helps,
>
> Rui Barradas
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Hervé Pagès

2022-Dec-03 23:49 UTC

head link

[R] interval between specific characters in a string...

On 03/12/2022 07:21, Bert Gunter wrote:> Perhaps it is worth pointing out that looping constructs like lapply() can
> be avoided and the procedure vectorized by mimicking Martin Morgan's
> solution:
>
> ## s is the string to be searched.
> diff(c(0,grep('b',strsplit(s,'')[[1]])))
>
> However, Martin's solution is simpler and likely even faster as the
regex
> engine is unneeded:
>
> diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ##
completely vectorized
>
> This seems much preferable to me.
Of all the proposed solutions, Andrew Hart's solution seems the most 
efficient:

 ? big_string <- strrep("abaaabbaaaaabaaabaaaaaaaaaaaaaaaaaaab",
500000)

 ? system.time(nchar(strsplit(big_string, split="b", fixed=TRUE)[[1]])
+ 1)
 ? #? ? user? system elapsed
 ? # ? 0.736?? 0.028?? 0.764

 ? system.time(diff(c(0, which(strsplit(big_string, "",
fixed=TRUE)[[1]]
== "b"))))
 ? #? ? user? system elapsed
 ? #? 2.100?? 0.356?? 2.455

The bigger the string, the bigger the gap in performance.

Also, the bigger the average gap between 2 successive b's, the bigger 
the gap in performance.

Finally: always use fixed=TRUE in strsplit() if you don't need to use 
the regex engine.

Cheers,

H.

> -- Bert
>
>
>
>
>
> On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>
>> ?s 17:18 de 02/12/2022, Evan Cooch escreveu:
>>> Was wondering if there is an 'efficient/elegant' way to do
the following
>>> (without tidyverse). Take a string
>>>
>>> abaaabbaaaaabaaab
>>>
>>> Its easy enough to count the number of times the character
'b' shows up
>>> in the string, but...what I'm looking for is outputing the
'intervals'
>>> between occurrences of 'b' (starting the counter at the
beginning of the
>>> string). So, for the preceding example, 'b' shows up in
positions
>>>
>>> 2, 6, 7, 13, 17
>>>
>>> So, the interval data would be: 2, 4, 1, 6, 4
>>>
>>> My main approach has been to simply output positions (say,
something
>>> like unlist(gregexpr('b', target_string))), and 'do the
math' between
>>> successive positions. Can anyone suggest a more elegant approach?
>>>
>>> Thanks in advance...
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>>
>> I don't find your solution inelegant, it's even easy to write
it as a
>> one-line function.
>>
>>
>> char_interval <- function(x, s) {
>>     lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
>> }
>>
>> target_string <-"abaaabbaaaaabaaab"
>> char_interval('b', target_string)
>> #> [[1]]
>> #> [1] 2 4 1 6 4
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Herv? Pag?s

Bioconductor Core Team
hpages.on.github at gmail.com

R help - Dec 2022 - interval between specific characters in a string...

[R] interval between specific characters in a string...

[R] interval between specific characters in a string...