Jonathan
2009-Dec-20 06:43 UTC
[R] how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
Last one for you guys: The command: length(gregexpr('cus','hocus pocus')[[1]]) [1] 2 returns the number of times the substring 'cus' appears in 'hocus pocus' (which is two) It's returning the number of **disjoint** matches. So: length(gregexpr('aa','aaa')[[1]]) [1] 1 returns 1. **What I want to do:** I'm looking for a way to count all occurrences of the substring, including overlapping sets (so 'aa' would be found in 'aaa' two times, because the middle 'a' gets counted twice). Any ideas would be much appreciated!! Signing off and thanks for all the great assistance, Jonathan [[alternative HTML version deleted]]
Gabor Grothendieck
2009-Dec-20 10:33 UTC
[R] how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
Use a zero lookaround expression. It will not consume its match. See ?regexp> gregexpr("a(?=a)", "aaa", perl = TRUE)[[1]] [1] 1 2 attr(,"match.length") [1] 1 1 On Sun, Dec 20, 2009 at 1:43 AM, Jonathan <jonsleepy at gmail.com> wrote:> Last one for you guys: > > The command: > > length(gregexpr('cus','hocus pocus')[[1]]) > [1] 2 > > returns the number of times the substring 'cus' appears in 'hocus pocus' > (which is two) > > It's returning the number of **disjoint** matches. ?So: > > length(gregexpr('aa','aaa')[[1]]) > ?[1] 1 > > returns 1. > > **What I want to do:** > I'm looking for a way to count all occurrences of the substring, including > overlapping sets (so 'aa' would be found in 'aaa' two times, because the > middle 'a' gets counted twice). > > Any ideas would be much appreciated!! > > Signing off and thanks for all the great assistance, > Jonathan
Reasonably Related Threads
- moving onto returning a data.frame?
- selinux commands fail on low memory box
- gregexpr in R 2.3.0 != gregexpr in R 2.4.0
- Ajax - Inserting selected elements from request.responseText
- Bug: time complexity of substring is quadratic as string size and number of substrings increases