Hi All, I've been trawling through the documentation and listserv archives on this topic -- but as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without resorting to ugly nested loops. As far as I can tell, grep, match, and %in% are not the correct tools. Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: 6, 13, 23 Suggestions very much appreciated! Kind regards, Matt Redding, Ph.D. Principal Scientist Geochemist/Soil Chemist Queensland Primary Industries & Fisheries DEEDI PO Box 102, Toowoomba, 4350, Qld ph: 0746 881372 fax: 0746 881192 ********************************DISCLAIMER**************...{{dropped:15}}
this is ugly, but... l <-length(patrn) l2 <-length(exmpl) out <- vector("list") for(i in 1:(l2-l+1)) { exmpl[i:(i+l-1)] patrn==exmpl[i:(i+l-1)] if(all(patrn==exmpl[i:(i+l-1)])) { out[[i]] <- i } else { out[[i]] <- "NA"} } out <- do.call(c, out) as.numeric(out[which(out!="NA")]) ## Cheers and HTH Redding, Matthew-2 wrote> > Hi All, > > > I've been trawling through the documentation and listserv archives on this > topic -- but > as yet have not found a solution. I'm sure this is pretty simple with R, > but I cannot work out how without > resorting to ugly nested loops. > > As far as I can tell, grep, match, and %in% are not the correct tools. > > Question: > given these vectors -- > patrn <- c(1,2,3,4) > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the pattern > and returning the starting indices: > 6, 13, 23 > > Suggestions very much appreciated! > > Kind regards, > > > > > Matt Redding, Ph.D. > Principal Scientist > Geochemist/Soil Chemist > Queensland Primary Industries & Fisheries > DEEDI > PO Box 102, Toowoomba, 4350, Qld > ph: 0746 881372 > fax: 0746 881192 > > > ********************************DISCLAIMER**************...{{dropped:15}} > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/matching-a-sequence-in-a-vector-tp4389523p4389560.html Sent from the R help mailing list archive at Nabble.com.
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:> Hi All, > > > I've been trawling through the documentation and listserv archives on this topic -- but > as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without > resorting to ugly nested loops. > > As far as I can tell, grep, match, and %in% are not the correct tools. > > Question: > given these vectors -- > patrn <- c(1,2,3,4) > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: > 6, 13, 23Hi. If the pattern is not too long, try m <- length(patrn) n <- length(exmpl) ind <- seq.int(length=n-m+1) occur <- rep(TRUE, times=n-m+1) for (i in seq.int(length=m)) { occur <- occur & (patrn[i] == exmpl[ind + i - 1]) } which(occur) [1] 6 13 23 Hope this helps. Petr Savicky.
On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:> Hi All, > > > I've been trawling through the documentation and listserv archives on this topic -- but > as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without > resorting to ugly nested loops. > > As far as I can tell, grep, match, and %in% are not the correct tools. > > Question: > given these vectors -- > patrn <- c(1,2,3,4) > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: > 6, 13, 23Hi. A more efficient version of the previous suggestion is as follows. m <- length(patrn) n <- length(exmpl) candidate <- seq.int(length=n-m+1) for (i in seq.int(length=m)) { candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]] } candidate [1] 6 13 23 In this solution, the set of candidate indices decreases. If the prefixes of the searched pattern are rare, the set of candidates is reduced in a few iterations and the remaining iterations become faster. Hope this helps. Petr Savicky.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/02/12 05:17, Redding, Matthew wrote:> Hi All, > > > I've been trawling through the documentation and listserv archives > on this topic -- but as yet have not found a solution. I'm sure > this is pretty simple with R, but I cannot work out how without > resorting to ugly nested loops.No actual solution - but this sounds to me like a "moving window" statistic. I googled for "moving window R" and found, among others, the following: http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html Maybe this can give you some additional ideas - for your question, you would not calculate the e.g. mean of the moving window, but check if the sequence in the window is equal to the one you are looking for. Cheers, Rainer> > As far as I can tell, grep, match, and %in% are not the correct > tools. > > Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- > c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the > pattern and returning the starting indices: 6, 13, 23 > > Suggestions very much appreciated! > > Kind regards, > > > > > Matt Redding, Ph.D. Principal Scientist Geochemist/Soil Chemist > Queensland Primary Industries & Fisheries DEEDI PO Box 102, > Toowoomba, 4350, Qld ph: 0746 881372 fax: 0746 881192 > > > ********************************DISCLAIMER**************...{{dropped:15}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the > posting guide http://www.R-project.org/posting-guide.html and > provide commented, minimal, self-contained, reproducible code.- -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk87b3wACgkQoYgNqgF2egpqxACeNIMFFIDM6oqyejLR5yewNz2W R2AAn1elVRr0zqbADRFyZupWnMirAuZy =BXd9 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/02/12 05:17, Redding, Matthew wrote:> Hi All, > > > I've been trawling through the documentation and listserv archives > on this topic -- but as yet have not found a solution. I'm sure > this is pretty simple with R, but I cannot work out how without > resorting to ugly nested loops.Just another idea: what about converting the vector to a character vector exmplstr <- paste(exmpl, collapse="") patrnstr <- paste(patrn, collapse="") and then search for patrnstr in exmplstr? Rainer> > As far as I can tell, grep, match, and %in% are not the correct > tools. > > Question: given these vectors -- patrn <- c(1,2,3,4) exmpl <- > c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the > pattern and returning the starting indices: 6, 13, 23 > > Suggestions very much appreciated! > > Kind regards, > > > > > Matt Redding, Ph.D. Principal Scientist Geochemist/Soil Chemist > Queensland Primary Industries & Fisheries DEEDI PO Box 102, > Toowoomba, 4350, Qld ph: 0746 881372 fax: 0746 881192 > > > ********************************DISCLAIMER**************...{{dropped:15}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the > posting guide http://www.R-project.org/posting-guide.html and > provide commented, minimal, self-contained, reproducible code.- -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: Rainer at krugs.de Skype: RMkrug -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk87dJ4ACgkQoYgNqgF2egojXACcDPXGfqB+8+Cmz80z+plX376y FX4An1+PWTr3OJceYVCHYxz4y02FZ/ei =ru9x -----END PGP SIGNATURE-----
On 15-02-2012, at 05:17, Redding, Matthew wrote:> Hi All, > > > I've been trawling through the documentation and listserv archives on this topic -- but > as yet have not found a solution. I'm sure this is pretty simple with R, but I cannot work out how without > resorting to ugly nested loops. > > As far as I can tell, grep, match, and %in% are not the correct tools. > > Question: > given these vectors -- > patrn <- c(1,2,3,4) > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: > 6, 13, 23 >patrn.rev <- rev(patrn) w <- embed(exmpl,length(patrn)) w.pos <- apply(w,1,function(r) all(r == patrn.rev)) which(w.pos) You can substitute the last three lines to get a oneliner. Berend
On Tue, Feb 14, 2012 at 11:17 PM, Redding, Matthew <Matthew.Redding at deedi.qld.gov.au> wrote:> I've been trawling through the documentation and listserv archives on this topic -- but > as yet have not found a solution. ?I'm sure this is pretty simple with R, but I cannot work out how without > resorting to ugly nested loops. > > As far as I can tell, grep, match, and %in% are not the correct tools. > > Question: > given these vectors -- > patrn <- c(1,2,3,4) > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4) > > how do I get the desired answer by finding the occurence of the pattern and returning the starting indices: > 6, 13, 23 >Here is a one-liner: library(zoo) which(rollapply(exmpl, 4, identical, patrn, fill = FALSE, align = "left")) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com