I'd like to pick every imbricated five character long subsets from a vector. I guess there is some efficient way to do this without loops... Here is a for-loop-version and a model for output: VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); ADDRESSES=c(); for(i in 1:(length(VECTOR)-4)){ ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") } > ADDRESSES [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" [9] "43686" Atte Tenkanen University of Turku, Finland [[alternative text/enriched version deleted]]
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6) > x <- lapply(seq(length(VECTOR)-4),function(z)paste(VECTOR[z:(z+4)],collapse=''))> unlist(x)[1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" "43686">On 8/22/06, kone <attenka@utu.fi> wrote:> > I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
embed(VECTOR, 5)[, 5:1] gives the subsets, so something like apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="") does the job. The following is a bit more efficient ind <- 1:(length(VECTOR)-4) do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep="")) but by looking at how embed() works it could be made as efficient. Larger example: VECTOR <- sample(1:10, 1e5, replace=TRUE)> system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse=""))[1] 5.73 0.05 5.81 NA NA> system.time({ind <- 1:(length(VECTOR)-4)+ do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep="")) + }) [1] 1.00 0.01 1.01 NA NA The loop method took 195 secs. Just assigning to an answer of the correct length reduced this to 5 secs. e.g. use ADDRESSES <- character(length(VECTOR)-4) Moral: don't grow vectors repeatedly. On Tue, 22 Aug 2006, kone wrote:> I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c();You do not need the semicolons, and they just confuse readers.> for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Like this:> do.call( paste, c( list(sep=""), lapply(1:5,function(x) VECTOR[x:(length(VECTOR)-5+x)]) ))[1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" "43686">HTH, Chuck On Tue, 22 Aug 2006, kone wrote:> I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > > > [ Part 3.64: "Included Message" ] >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
Here is a solution that uses gsub with a negative lookahead perl-style regexp to do it: VECTOR <- c(1,4,2,6,5,0,11,10,4,3,6,8,6) e <- "([[:digit:]]+),(?=([[:digit:]]+),([[:digit:]]+),([[:digit:]]+),([[:digit:]]+))" out <- gsub(e, "\\1\\2\\3\\4\\5 ", paste(VECTOR, collapse = ","), perl = TRUE) head(strsplit(out, " ")[[1]], -1) # uses head from R 2.4.0 On 8/22/06, kone <attenka at utu.fi> wrote:> I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >