I'd like to pick every imbricated five character long subsets from a
vector. I guess there is some efficient way to do this without loops...
Here is a for-loop-version and a model for output:
VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
ADDRESSES=c();
for(i in 1:(length(VECTOR)-4)){
ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")
}
> ADDRESSES
[1] "14265" "42650" "265011"
"6501110" "5011104" "0111043"
"1110436" "104368"
[9] "43686"
Atte Tenkanen
University of Turku, Finland
[[alternative text/enriched version deleted]]
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6) > x <- lapply(seq(length(VECTOR)-4),function(z)paste(VECTOR[z:(z+4)],collapse=''))> unlist(x)[1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" "43686">On 8/22/06, kone <attenka@utu.fi> wrote:> > I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
embed(VECTOR, 5)[, 5:1]
gives the subsets, so something like
apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="")
does the job.
The following is a bit more efficient
ind <- 1:(length(VECTOR)-4)
do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=""))
but by looking at how embed() works it could be made as efficient.
Larger example:
VECTOR <- sample(1:10, 1e5, replace=TRUE)> system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste,
collapse=""))
[1] 5.73 0.05 5.81 NA NA> system.time({ind <- 1:(length(VECTOR)-4)
+ do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=""))
+ })
[1] 1.00 0.01 1.01 NA NA
The loop method took 195 secs. Just assigning to an answer of the correct
length reduced this to 5 secs. e.g. use
ADDRESSES <- character(length(VECTOR)-4)
Moral: don't grow vectors repeatedly.
On Tue, 22 Aug 2006, kone wrote:
> I'd like to pick every imbricated five character long subsets from a
> vector. I guess there is some efficient way to do this without loops...
> Here is a for-loop-version and a model for output:
>
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
>
> ADDRESSES=c();
You do not need the semicolons, and they just confuse readers.
> for(i in 1:(length(VECTOR)-4)){
> ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")
> }
>
> > ADDRESSES
> [1] "14265" "42650" "265011"
"6501110" "5011104" "0111043"
> "1110436" "104368"
> [9] "43686"
>
>
> Atte Tenkanen
> University of Turku, Finland
>
> [[alternative text/enriched version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
Like this:> do.call( paste, c( list(sep=""), lapply(1:5,function(x) VECTOR[x:(length(VECTOR)-5+x)]) ))[1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" "43686">HTH, Chuck On Tue, 22 Aug 2006, kone wrote:> I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > > > [ Part 3.64: "Included Message" ] >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
Here is a solution that uses gsub with a negative lookahead perl-style regexp to do it: VECTOR <- c(1,4,2,6,5,0,11,10,4,3,6,8,6) e <- "([[:digit:]]+),(?=([[:digit:]]+),([[:digit:]]+),([[:digit:]]+),([[:digit:]]+))" out <- gsub(e, "\\1\\2\\3\\4\\5 ", paste(VECTOR, collapse = ","), perl = TRUE) head(strsplit(out, " ")[[1]], -1) # uses head from R 2.4.0 On 8/22/06, kone <attenka at utu.fi> wrote:> I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >