Is there a function to efficiently search for a subsequence within a vector? For example, with x <- 1:100 I'd like to search for the sequence c(49,50,51), and be told that it occurs exactly once, starting at location 49. (The items in the vectors might be numeric or character, and there might be repetitions within the search pattern or within the vector I'm searching.) Duncan Murdoch
On Aug 28, 2012, at 4:05 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> Is there a function to efficiently search for a subsequence within a vector? > > For example, with > > x <- 1:100 > > I'd like to search for the sequence c(49,50,51), and be told that it occurs exactly once, starting at location 49. (The items in the vectors might be numeric or character, and there might be repetitions within the search pattern or within the vector I'm searching.) > > Duncan MurdochThere is a thread here: http://tolstoy.newcastle.edu.au/R/e17/help/12/02/4201.html that has various solutions and towards the end has a post with some timings. Regards, Marc Schwartz
Hello, Try the following. occur1 <- function(pat, vec){ m <- length(pat) n <- length(vec) candidate <- seq.int(length=n-m+1) for (i in seq.int(length=m)) candidate <- candidate[pat[i] == vec[candidate + i - 1]] candidate } patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) occur1(patrn, exmpl) [1] 6 13 23 This was posted some time ago. And seems to be the fastest of the 7 solutions. https://stat.ethz.ch/pipermail/r-help/2012-February/303756.html Hope this helps, Rui Barradas Em 28-08-2012 22:05, Duncan Murdoch escreveu:> Is there a function to efficiently search for a subsequence within a > vector? > > For example, with > > x <- 1:100 > > I'd like to search for the sequence c(49,50,51), and be told that it > occurs exactly once, starting at location 49. (The items in the > vectors might be numeric or character, and there might be repetitions > within the search pattern or within the vector I'm searching.) > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 12-08-28 5:45 PM, Rui Barradas wrote:> Hello, > > Try the following. > > occur1 <- function(pat, vec){ > m <- length(pat) > n <- length(vec) > candidate <- seq.int(length=n-m+1) > for (i in seq.int(length=m)) > candidate <- candidate[pat[i] == vec[candidate + i - 1]] > candidate > } > > patrn <- c(1,2,3,4) > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > occur1(patrn, exmpl) > [1] 6 13 23 > > This was posted some time ago. And seems to be the fastest of the 7 > solutions. > https://stat.ethz.ch/pipermail/r-help/2012-February/303756.htmlThanks! Duncan Murdoch
Hello, Function Biostrings::matchPattern can be called with an algorithm = "boyer-moore" argument. I've never used it, this is the return value of library(sos) r1 <- findFn('boyer') r2 <- findFn('moore') r1 & r2 I have implemented the Boyer-Moore algorithm a couple of times, the first(!) of all in 8086 assembly, but I'm seeing a difficulty regarding your original request. A Boyer-Moore algorithm to search for subsequences of character vectors all of which such that nchar(x) is 1 should be very easy to implement using the .Call interface, but for integer vectors I am not seeing how to implement the bad character shift table. What would be the alphabet? The set of 32-bit integers? In this case the table length would be prohibitive... Ideas anyone? Rui Barradas Em 28-08-2012 22:05, Duncan Murdoch escreveu:> Is there a function to efficiently search for a subsequence within a > vector? > > For example, with > > x <- 1:100 > > I'd like to search for the sequence c(49,50,51), and be told that it > occurs exactly once, starting at location 49. (The items in the > vectors might be numeric or character, and there might be repetitions > within the search pattern or within the vector I'm searching.) > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.