thr3ads.net - R help - [R] Successive subsets from a vector? [Aug 2006]

If this information is useful, please help other people find it:
Share via:

kone

2006-Aug-22 09:31 UTC

[R] Successive subsets from a vector?

I'd like to pick every imbricated five character long subsets from a 
vector. I guess there is some efficient way to do this without loops...
Here is a for-loop-version and a model for output:

VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);

ADDRESSES=c();
for(i in 1:(length(VECTOR)-4)){
	ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")	
}

 > ADDRESSES
[1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043"
"1110436" "104368"
[9] "43686"


Atte Tenkanen
University of Turku, Finland

	[[alternative text/enriched version deleted]]

jim holtman

2006-Aug-22 10:08 UTC

head link

[R] Successive subsets from a vector?

> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6)
> x <- lapply(seq(length(VECTOR)-4),function(z)paste(VECTOR[z:(z+4)],
collapse=''))> unlist(x)[1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043" "1110436"
"104368"  "43686">

On 8/22/06, kone <attenka@utu.fi> wrote:>
> I'd like to pick every imbricated five character long subsets from a
> vector. I guess there is some efficient way to do this without loops...
> Here is a for-loop-version and a model for output:
>
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
>
> ADDRESSES=c();
> for(i in 1:(length(VECTOR)-4)){
>        ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")
> }
>
> > ADDRESSES
> [1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043"
> "1110436" "104368"
> [9] "43686"
>
>
> Atte Tenkanen
> University of Turku, Finland
>
>        [[alternative text/enriched version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

Prof Brian Ripley

2006-Aug-22 10:13 UTC

head link

[R] Successive subsets from a vector?

embed(VECTOR, 5)[, 5:1]

gives the subsets, so something like

    apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="")

does the job.

The following is a bit more efficient

    ind <- 1:(length(VECTOR)-4)
    do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=""))

but by looking at how embed() works it could be made as efficient.

Larger example:

VECTOR <- sample(1:10, 1e5, replace=TRUE)> system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste,
collapse=""))
[1] 5.73 0.05 5.81   NA   NA> system.time({ind <- 1:(length(VECTOR)-4)+ do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=""))
+ })
[1] 1.00 0.01 1.01   NA   NA

The loop method took 195 secs.  Just assigning to an answer of the correct 
length reduced this to 5 secs.  e.g. use

    ADDRESSES <- character(length(VECTOR)-4)

Moral: don't grow vectors repeatedly.

On Tue, 22 Aug 2006, kone wrote:
> I'd like to pick every imbricated five character long subsets from a 
> vector. I guess there is some efficient way to do this without loops...
> Here is a for-loop-version and a model for output:
> 
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
> 
> ADDRESSES=c();
You do not need the semicolons, and they just confuse readers.
> for(i in 1:(length(VECTOR)-4)){
> 	ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")	
> }
> 
>  > ADDRESSES
> [1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043"
> "1110436" "104368"
> [9] "43686"
> 
> 
> Atte Tenkanen
> University of Turku, Finland
> 
> 	[[alternative text/enriched version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Charles C. Berry

2006-Aug-22 17:13 UTC

head link

[R] Successive subsets from a vector?

Like this:> do.call( paste, c( list(sep=""), lapply(1:5,function(x)
VECTOR[x:(length(VECTOR)-5+x)]) ))[1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043" "1110436"
"104368"  "43686">
HTH,

Chuck

On Tue, 22 Aug 2006, kone wrote:
> I'd like to pick every imbricated five character long subsets from a
> vector. I guess there is some efficient way to do this without loops...
> Here is a for-loop-version and a model for output:
>
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
>
> ADDRESSES=c();
> for(i in 1:(length(VECTOR)-4)){
> 	ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")
> }
>
> > ADDRESSES
> [1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043"
> "1110436" "104368"
> [9] "43686"
>
>
> Atte Tenkanen
> University of Turku, Finland
>
> 	[[alternative text/enriched version deleted]]
>
>
>
>    [ Part 3.64: "Included Message" ]
>
Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717

Gabor Grothendieck

2006-Aug-22 18:29 UTC

head link

[R] Successive subsets from a vector?

Here is a solution that uses gsub with a negative lookahead perl-style
regexp to do it:

VECTOR <- c(1,4,2,6,5,0,11,10,4,3,6,8,6)
e <-
"([[:digit:]]+),(?=([[:digit:]]+),([[:digit:]]+),([[:digit:]]+),([[:digit:]]+))"
out <- gsub(e, "\\1\\2\\3\\4\\5 ", paste(VECTOR, collapse =
","), perl = TRUE)
head(strsplit(out, " ")[[1]], -1)  # uses head from R 2.4.0


On 8/22/06, kone <attenka at utu.fi> wrote:> I'd like to pick every imbricated five character long subsets from a
> vector. I guess there is some efficient way to do this without loops...
> Here is a for-loop-version and a model for output:
>
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
>
> ADDRESSES=c();
> for(i in 1:(length(VECTOR)-4)){
>        ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="")
> }
>
>  > ADDRESSES
> [1] "14265"   "42650"   "265011" 
"6501110" "5011104" "0111043"
> "1110436" "104368"
> [9] "43686"
>
>
> Atte Tenkanen
> University of Turku, Finland
>
>        [[alternative text/enriched version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more reasonably related threads

R help - Aug 2006 - Successive subsets from a vector?

[R] Successive subsets from a vector?

[R] Successive subsets from a vector?

[R] Successive subsets from a vector?

[R] Successive subsets from a vector?

[R] Successive subsets from a vector?

Reasonably Related Threads