Dear All, Consider a simple example a<-c(1,4,3,0,4,5,6,9,3,4) b<-c(0,4,5) c<-c(5,4,0) I would like to be able to tell whether a sequence is contained (the order of the elements does matter) in another one e.g. in the example above, b is a subsequence of a, whereas c is not. Since the order matters, I cannot treat the sequences above as sets (also, elements are repeated). Does anyone know a smart way of achieving that? Many thanks Lorenzo
Convert to strings and use grep functions. using c for variable is a bad idea. a <- paste(a, collapse="") b <- paste(b, collapse="") d <- paste(d, collapse="") grepl(b,a) grepl(d,a) Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.list at gmail.com On Sep 21, 2010, at 6:31 AM, Lorenzo Isella wrote:> a<-c(1,4,3,0,4,5,6,9,3,4) > b<-c(0,4,5) > c<-c(5,4,0)
This function might be helpful: bleh <- function(a, b) { where <- list() matches <- 0 first <- which(a == b[1]) for (i in first) { seq.to.match <- seq(i, length = length(b)) if (identical(a[seq.to.match], b)) { matches <- matches + 1 where[[matches]] <- seq.to.match } } return(where) } a<-c(3,4,3,0,4,5,6,9,3,4) b<-c(0,4,5) c<-c(5,4,0) d<-c(3,4) bleh(a, b) bleh(a, c) bleh(a, d) Cheers, Gustavo. On Tue, Sep 21, 2010 at 11:31 AM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > Consider a simple example > > a<-c(1,4,3,0,4,5,6,9,3,4) > b<-c(0,4,5) > c<-c(5,4,0) > > I would like to be able to tell whether a sequence is contained (the order > of the elements does matter) in another one e.g. in the example above, b is > a subsequence of a, whereas c is not. Since the order matters, I cannot > treat the sequences above as sets (also, elements are repeated). > Does anyone know a smart way of achieving that? > Many thanks > > Lorenzo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Sep 21, 2010, at 6:31 AM, Lorenzo Isella wrote:> Dear All, > Consider a simple example > > a<-c(1,4,3,0,4,5,6,9,3,4) > b<-c(0,4,5) > c<-c(5,4,0) > > I would like to be able to tell whether a sequence is contained (the > order of the elements does matter) in another one e.g. in the > example above, b is a subsequence of a, whereas c is not. Since the > order matters, I cannot treat the sequences above as sets (also, > elements are repeated). > Does anyone know a smart way of achieving that?> grep(paste(c, collapse="#"), paste(a, collapse="#")) integer(0) > grep(paste(b, collapse="#"), paste(a, collapse="#")) [1] 1 Looking at that output I am wondering if you might need to also put markers at the ends of the arguments. > grep(paste("#",b,"#", collapse="#"), paste("#",a,"#", collapse="#")) [1] 1 # To prevent a match like c(1,2,3) with c(101,2,303). There is also an istrings package in the BioConductor repository that provides more extensive string matching facilities. -- David Winsemius, MD West Hartford, CT