thr3ads.net - R help - [R] interval between specific characters in a string... [Dec 2022]

If this information is useful, please help other people find it:
Share via:

Evan Cooch

2022-Dec-02 17:18 UTC

[R] interval between specific characters in a string...

Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbaaaaabaaab

Its easy enough to count the number of times the character 'b' shows up 
in the string, but...what I'm looking for is outputing the
'intervals'
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something 
like unlist(gregexpr('b', target_string))), and 'do the math'
between
successive positions. Can anyone suggest a more elegant approach?

Thanks in advance...

Andrew Simmons

2022-Dec-03 00:01 UTC

head link

[R] interval between specific characters in a string...

try

gregexpr('b+', target_string)

which looks for one or more b characters, then get the attribute
"match.length"

On Fri, Dec 2, 2022, 18:56 Evan Cooch <evan.cooch at gmail.com> wrote:
> Was wondering if there is an 'efficient/elegant' way to do the
following
> (without tidyverse). Take a string
>
> abaaabbaaaaabaaab
>
> Its easy enough to count the number of times the character 'b'
shows up
> in the string, but...what I'm looking for is outputing the
'intervals'
> between occurrences of 'b' (starting the counter at the beginning
of the
> string). So, for the preceding example, 'b' shows up in positions
>
> 2, 6, 7, 13, 17
>
> So, the interval data would be: 2, 4, 1, 6, 4
>
> My main approach has been to simply output positions (say, something
> like unlist(gregexpr('b', target_string))), and 'do the
math' between
> successive positions. Can anyone suggest a more elegant approach?
>
> Thanks in advance...
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Martin Morgan

2022-Dec-03 00:39 UTC

head link

[R] interval between specific characters in a string...

You could split the string into letters and figure out which ones are ?b?

which(strsplit(x, "")[[1]] == "b")

and then find the difference between each position, ?anchoring? at position 0
> diff(c(0, which(strsplit(x, "")[[1]] == "b")))[1] 2 4 1 6 4

From: R-help <r-help-bounces at r-project.org> on behalf of Evan Cooch
<evan.cooch at gmail.com>
Date: Friday, December 2, 2022 at 6:56 PM
To: r-help at r-project.org <r-help at r-project.org>
Subject: [R] interval between specific characters in a string...
Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbaaaaabaaab

Its easy enough to count the number of times the character 'b' shows up
in the string, but...what I'm looking for is outputing the
'intervals'
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something
like unlist(gregexpr('b', target_string))), and 'do the math'
between
successive positions. Can anyone suggest a more elegant approach?

Thanks in advance...

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

Andrew Hart

2022-Dec-03 01:54 UTC

head link

[R] interval between specific characters in a string...

Here's a function that can get the interval sizes for you.

getStringSegmentLengths <- function(s, delim, ...) {
   nchar(unlist(strsplit(s, delim, ...))) + 1L
}

It uses strsplit to return a list of all the segments of the string 
separated by delim. delim can be a regular expression and with ..., you 
can pass all the extra options to strsplit in order to specify how to 
break up the string.
It then uses unlist to convert the list output of strsplit to a 
character vector. nchar then gives the lengths of all the elements of 
the character vector and finally a 1 is added to each of these in order 
to obtain the correct interval sizes.

Hth,
Andrew.

On 2/12/2022 14:18, Evan Cooch wrote:> Was wondering if there is an 'efficient/elegant' way to do the
following
> (without tidyverse). Take a string
> 
> abaaabbaaaaabaaab
> 
> Its easy enough to count the number of times the character 'b'
shows up
> in the string, but...what I'm looking for is outputing the
'intervals'
> between occurrences of 'b' (starting the counter at the beginning
of the
> string). So, for the preceding example, 'b' shows up in positions
> 
> 2, 6, 7, 13, 17
> 
> So, the interval data would be: 2, 4, 1, 6, 4
> 
> My main approach has been to simply output positions (say, something 
> like unlist(gregexpr('b', target_string))), and 'do the
math' between
> successive positions. Can anyone suggest a more elegant approach?
> 
> Thanks in advance...
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

@vi@e@gross m@iii@g oii gm@ii@com

2022-Dec-03 04:01 UTC

head link

[R] interval between specific characters in a string...

Evan, there are oodles of ways to do many things in R, and mcu of what the
tidyverse supplies can often be done as easily, or easier, outside it.

Before presenting a solution, I need to make sure I am answering the same
question or problem you intend.

Here is the string you have as an example:

st <- "abaaabbaaaaabaaab"

Is the string a string testing for single characters called "b" with
any
other characters being either just "a" or at least non-"b"
and of any length
but at least a few?

If so, ONE METHOD is to convert the string to a vector for reasons that will
become clear. For oddball reasons, this is a way to do it:
> unlist(strsplit(st,""))[1] "a" "b" "a" "a" "a"
"b" "b" "a" "a" "a"
"a" "a" "b" "a" "a"
"a" "b"

The result is a vector you can examine to see if they are equal to "b"
or
not as a TRUE/FALSE vector:
> unlist(strsplit(st,"")) == "b"[1] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE
[17]  TRUE

Now you can ask for the indices which are TRUE, meaning at what offset from
the beginning are there instances of the letter "b":
> which(unlist(strsplit(st,"")) == "b")[1]  2  6  7 13 17

This shows the second the integer offsets for the letter "b" are the
second,
sixth and so on to seventeenth. Again, if I understood you, you want a
measure of how far apart instances of "b" are with adjacent ones being
1
apart. Again, many methods but I chose one where I sort of slid over the
above values by sliding in a zero from the front and removing the last
entry. 

So save that in a variable  first:

indices <- which(unlist(strsplit(st,"")) == "b")
indices_shifted <- c(0, head(indices, -1))

The two contain:
> indices
[1]  2  6  7 13 17> indices_shifted
[1]  0  2  6  7 13> indices - indices_shifted [1] 2 4 1 6 4

The above is the same as your intended result.

If you want to be cautious, handle edge cases like not having any "b"
or an
empty string.

Here is the consolidated code:

st <- "abaaabbaaaaabaaab"
indices <- which(unlist(strsplit(st,"")) == "b")
indices_shifted <- c(0, head(indices, -1))
result <- indices - indices_shifted

There are many other ways to do this and of course some are more
straightforward and some more complex.

Consider a loop using a vector version of the string where each time you see
a b", you remember the last index you saw it and put out the number
representing the gap.

Fairly low tech.


-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Evan Cooch
Sent: Friday, December 2, 2022 12:19 PM
To: r-help at r-project.org
Subject: [R] interval between specific characters in a string...

Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbaaaaabaaab

Its easy enough to count the number of times the character 'b' shows up
in
the string, but...what I'm looking for is outputing the 'intervals' 
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something like
unlist(gregexpr('b', target_string))), and 'do the math' between
successive
positions. Can anyone suggest a more elegant approach?

Thanks in advance...

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2022-Dec-03 08:48 UTC

head link

[R] interval between specific characters in a string...

?s 17:18 de 02/12/2022, Evan Cooch escreveu:> Was wondering if there is an 'efficient/elegant' way to do the
following
> (without tidyverse). Take a string
> 
> abaaabbaaaaabaaab
> 
> Its easy enough to count the number of times the character 'b'
shows up
> in the string, but...what I'm looking for is outputing the
'intervals'
> between occurrences of 'b' (starting the counter at the beginning
of the
> string). So, for the preceding example, 'b' shows up in positions
> 
> 2, 6, 7, 13, 17
> 
> So, the interval data would be: 2, 4, 1, 6, 4
> 
> My main approach has been to simply output positions (say, something 
> like unlist(gregexpr('b', target_string))), and 'do the
math' between
> successive positions. Can anyone suggest a more elegant approach?
> 
> Thanks in advance...
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Hello,

I don't find your solution inelegant, it's even easy to write it as a 
one-line function.


char_interval <- function(x, s) {
   lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
}

target_string <-"abaaabbaaaaabaaab"
char_interval('b', target_string)
#> [[1]]
#> [1] 2 4 1 6 4


Hope this helps,

Rui Barradas

R help - Dec 2022 - interval between specific characters in a string...

[R] interval between specific characters in a string...

[R] interval between specific characters in a string...

[R] interval between specific characters in a string...

[R] interval between specific characters in a string...

[R] interval between specific characters in a string...

[R] interval between specific characters in a string...