Was wondering if there is an 'efficient/elegant' way to do the following (without tidyverse). Take a string abaaabbaaaaabaaab Its easy enough to count the number of times the character 'b' shows up in the string, but...what I'm looking for is outputing the 'intervals' between occurrences of 'b' (starting the counter at the beginning of the string). So, for the preceding example, 'b' shows up in positions 2, 6, 7, 13, 17 So, the interval data would be: 2, 4, 1, 6, 4 My main approach has been to simply output positions (say, something like unlist(gregexpr('b', target_string))), and 'do the math' between successive positions. Can anyone suggest a more elegant approach? Thanks in advance...
Andrew Simmons
2022-Dec-03 00:01 UTC
[R] interval between specific characters in a string...
try gregexpr('b+', target_string) which looks for one or more b characters, then get the attribute "match.length" On Fri, Dec 2, 2022, 18:56 Evan Cooch <evan.cooch at gmail.com> wrote:> Was wondering if there is an 'efficient/elegant' way to do the following > (without tidyverse). Take a string > > abaaabbaaaaabaaab > > Its easy enough to count the number of times the character 'b' shows up > in the string, but...what I'm looking for is outputing the 'intervals' > between occurrences of 'b' (starting the counter at the beginning of the > string). So, for the preceding example, 'b' shows up in positions > > 2, 6, 7, 13, 17 > > So, the interval data would be: 2, 4, 1, 6, 4 > > My main approach has been to simply output positions (say, something > like unlist(gregexpr('b', target_string))), and 'do the math' between > successive positions. Can anyone suggest a more elegant approach? > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Martin Morgan
2022-Dec-03 00:39 UTC
[R] interval between specific characters in a string...
You could split the string into letters and figure out which ones are ?b? which(strsplit(x, "")[[1]] == "b") and then find the difference between each position, ?anchoring? at position 0> diff(c(0, which(strsplit(x, "")[[1]] == "b")))[1] 2 4 1 6 4 From: R-help <r-help-bounces at r-project.org> on behalf of Evan Cooch <evan.cooch at gmail.com> Date: Friday, December 2, 2022 at 6:56 PM To: r-help at r-project.org <r-help at r-project.org> Subject: [R] interval between specific characters in a string... Was wondering if there is an 'efficient/elegant' way to do the following (without tidyverse). Take a string abaaabbaaaaabaaab Its easy enough to count the number of times the character 'b' shows up in the string, but...what I'm looking for is outputing the 'intervals' between occurrences of 'b' (starting the counter at the beginning of the string). So, for the preceding example, 'b' shows up in positions 2, 6, 7, 13, 17 So, the interval data would be: 2, 4, 1, 6, 4 My main approach has been to simply output positions (say, something like unlist(gregexpr('b', target_string))), and 'do the math' between successive positions. Can anyone suggest a more elegant approach? Thanks in advance... ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Here's a function that can get the interval sizes for you. getStringSegmentLengths <- function(s, delim, ...) { nchar(unlist(strsplit(s, delim, ...))) + 1L } It uses strsplit to return a list of all the segments of the string separated by delim. delim can be a regular expression and with ..., you can pass all the extra options to strsplit in order to specify how to break up the string. It then uses unlist to convert the list output of strsplit to a character vector. nchar then gives the lengths of all the elements of the character vector and finally a 1 is added to each of these in order to obtain the correct interval sizes. Hth, Andrew. On 2/12/2022 14:18, Evan Cooch wrote:> Was wondering if there is an 'efficient/elegant' way to do the following > (without tidyverse). Take a string > > abaaabbaaaaabaaab > > Its easy enough to count the number of times the character 'b' shows up > in the string, but...what I'm looking for is outputing the 'intervals' > between occurrences of 'b' (starting the counter at the beginning of the > string). So, for the preceding example, 'b' shows up in positions > > 2, 6, 7, 13, 17 > > So, the interval data would be: 2, 4, 1, 6, 4 > > My main approach has been to simply output positions (say, something > like unlist(gregexpr('b', target_string))), and 'do the math' between > successive positions. Can anyone suggest a more elegant approach? > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
@vi@e@gross m@iii@g oii gm@ii@com
2022-Dec-03 04:01 UTC
[R] interval between specific characters in a string...
Evan, there are oodles of ways to do many things in R, and mcu of what the tidyverse supplies can often be done as easily, or easier, outside it. Before presenting a solution, I need to make sure I am answering the same question or problem you intend. Here is the string you have as an example: st <- "abaaabbaaaaabaaab" Is the string a string testing for single characters called "b" with any other characters being either just "a" or at least non-"b" and of any length but at least a few? If so, ONE METHOD is to convert the string to a vector for reasons that will become clear. For oddball reasons, this is a way to do it:> unlist(strsplit(st,""))[1] "a" "b" "a" "a" "a" "b" "b" "a" "a" "a" "a" "a" "b" "a" "a" "a" "b" The result is a vector you can examine to see if they are equal to "b" or not as a TRUE/FALSE vector:> unlist(strsplit(st,"")) == "b"[1] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE [17] TRUE Now you can ask for the indices which are TRUE, meaning at what offset from the beginning are there instances of the letter "b":> which(unlist(strsplit(st,"")) == "b")[1] 2 6 7 13 17 This shows the second the integer offsets for the letter "b" are the second, sixth and so on to seventeenth. Again, if I understood you, you want a measure of how far apart instances of "b" are with adjacent ones being 1 apart. Again, many methods but I chose one where I sort of slid over the above values by sliding in a zero from the front and removing the last entry. So save that in a variable first: indices <- which(unlist(strsplit(st,"")) == "b") indices_shifted <- c(0, head(indices, -1)) The two contain:> indices[1] 2 6 7 13 17> indices_shifted[1] 0 2 6 7 13> indices - indices_shifted[1] 2 4 1 6 4 The above is the same as your intended result. If you want to be cautious, handle edge cases like not having any "b" or an empty string. Here is the consolidated code: st <- "abaaabbaaaaabaaab" indices <- which(unlist(strsplit(st,"")) == "b") indices_shifted <- c(0, head(indices, -1)) result <- indices - indices_shifted There are many other ways to do this and of course some are more straightforward and some more complex. Consider a loop using a vector version of the string where each time you see a b", you remember the last index you saw it and put out the number representing the gap. Fairly low tech. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Evan Cooch Sent: Friday, December 2, 2022 12:19 PM To: r-help at r-project.org Subject: [R] interval between specific characters in a string... Was wondering if there is an 'efficient/elegant' way to do the following (without tidyverse). Take a string abaaabbaaaaabaaab Its easy enough to count the number of times the character 'b' shows up in the string, but...what I'm looking for is outputing the 'intervals' between occurrences of 'b' (starting the counter at the beginning of the string). So, for the preceding example, 'b' shows up in positions 2, 6, 7, 13, 17 So, the interval data would be: 2, 4, 1, 6, 4 My main approach has been to simply output positions (say, something like unlist(gregexpr('b', target_string))), and 'do the math' between successive positions. Can anyone suggest a more elegant approach? Thanks in advance... ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2022-Dec-03 08:48 UTC
[R] interval between specific characters in a string...
?s 17:18 de 02/12/2022, Evan Cooch escreveu:> Was wondering if there is an 'efficient/elegant' way to do the following > (without tidyverse). Take a string > > abaaabbaaaaabaaab > > Its easy enough to count the number of times the character 'b' shows up > in the string, but...what I'm looking for is outputing the 'intervals' > between occurrences of 'b' (starting the counter at the beginning of the > string). So, for the preceding example, 'b' shows up in positions > > 2, 6, 7, 13, 17 > > So, the interval data would be: 2, 4, 1, 6, 4 > > My main approach has been to simply output positions (say, something > like unlist(gregexpr('b', target_string))), and 'do the math' between > successive positions. Can anyone suggest a more elegant approach? > > Thanks in advance... > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, I don't find your solution inelegant, it's even easy to write it as a one-line function. char_interval <- function(x, s) { lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y))) } target_string <-"abaaabbaaaaabaaab" char_interval('b', target_string) #> [[1]] #> [1] 2 4 1 6 4 Hope this helps, Rui Barradas