Hello R-helpers! I have a question concerning extracting sequence information from a vector. I have a vector (representing the bins of a time series where the frequency of occurrences is greater than some threshold) where I would like to extract the min, median and max of each group of consecutive numbers. For Example: tmp <- c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) I would like to have the max,min,median of the following groups: 24,25 29 35,36,37,38,39,40,41,42,43,44,45,46,47, 68,69,70,71 I would like to be able to perform this for many time series so an automated process would be nice. I am hoping to use this as a peak detection protocol. Any advice would be greatly appreciated, Kevin ----- ----- Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, OR 97403 USA kemerson at uoregon.edu
Look at "index vectors" in the R intro. Best Niels On Mon, 24 Jul 2006, Kevin J Emerson wrote:> Hello R-helpers! > > I have a question concerning extracting sequence information from a > vector. I have a vector (representing the bins of a time series where > the frequency of occurrences is greater than some threshold) where I > would like to extract the min, median and max of each group of > consecutive numbers. > > For Example: > > tmp <- c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) > > I would like to have the max,min,median of the following groups: > > 24,25 > 29 > 35,36,37,38,39,40,41,42,43,44,45,46,47, > 68,69,70,71 > > I would like to be able to perform this for many time series so an > automated process would be nice. I am hoping to use this as a peak > detection protocol. > > Any advice would be greatly appreciated, > Kevin > > ----- > ----- > Kevin J Emerson > Center for Ecology and Evolutionary Biology > 1210 University of Oregon > Eugene, OR 97403 > USA > kemerson at uoregon.edu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Kevin, Try something like groups <- cut( tmp, c(-Inf, which(diff(tmp) > 1 ) + 0.5, Inf) ) Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Kevin J Emerson <kemerson at uoregon.edu>:> Hello R-helpers! > > I have a question concerning extracting sequence information from a > vector. I have a vector (representing the bins of a time series where > the frequency of occurrences is greater than some threshold) where I > would like to extract the min, median and max of each group of > consecutive numbers. > > For Example: > > tmp <- c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) > > I would like to have the max,min,median of the following groups: > > 24,25 > 29 > 35,36,37,38,39,40,41,42,43,44,45,46,47, > 68,69,70,71 > > I would like to be able to perform this for many time series so an > automated process would be nice. I am hoping to use this as a peak > detection protocol. > > Any advice would be greatly appreciated, > Kevin > > ----- > ----- > Kevin J Emerson > Center for Ecology and Evolutionary Biology > 1210 University of Oregon > Eugene, OR 97403 > USA > kemerson at uoregon.edu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Let me clarify one thing that I dont think I made clear in my posting. I am looking for the max, min and median of the indicies, not of the time series frequency counts. I am looking to find the max, min, and median time of peaks in a time series, so i am looking for the information concerning that. so mostly my question is how to extract the information of max, min, and median of sequential numbers in a vector. I will reword my original posting below.> > Hello R-helpers! > > > > I have a question concerning extracting sequence information from a > > vector. I have a vector (representing the bins of a time series where > > the frequency of occurrences is greater than some threshold) where I > > would like to extract the min, median and max of each group of > > consecutive numbers in the index vector.. > > > > For Example: > > > > tmp <- c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) > > > > I would like to have the max,min,median of the following groups: > > > > 24,25 - max = 25, min = 24 median = 24.5 > > 29 max=min=median = 29 > > 35,36,37,38,39,40,41,42,43,44,45,46,47, max = 45 min = 35 etc... > > 68,69,70,71 > > > > I would like to be able to perform this for many time series so an > > automated process would be nice. I am hoping to use this as a peak > > detection protocol. > > > > Any advice would be greatly appreciated, > > Kevin > > > > ----- > > ----- > > Kevin J Emerson > > Center for Ecology and Evolutionary Biology > > 1210 University of Oregon > > Eugene, OR 97403 > > USA > > kemerson at uoregon.edu > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >
As you do not seem to have received what you consider to be satisfactory reply, here is a function that I **think** does what you want: sequences <- function(x,incr = 1) { ix <- which(abs(diff(c(FALSE,diff(x) == 1))) ==incr) if(length(ix)%%2)c(ix,length(x)) else ix } This function gives successive pairs of first and last values of sequences of increasing values within x that differ by incr. You can then process these pairs however you like either to summarize statistics on the indices and/or the values of the sequences. Examples:> sequences(c(1:5,50,3:7))[1] 1 5 7 11> sequences(c(10,1:5,50,3:7))[1] 2 6 8 12> sequences(c(1:5,50,3:7,10))[1] 1 5 7 11> sequences(c(10,1:5,50,3:7,10))[1] 2 6 8 12 Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kevin J Emerson > Sent: Monday, July 24, 2006 9:20 AM > To: Niels Vestergaard Jensen > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] grouping by consecutive integers > > Let me clarify one thing that I dont think I made clear in my posting. > I am looking for the max, min and median of the indicies, not of the > time series frequency counts. I am looking to find the max, min, and > median time of peaks in a time series, so i am looking for the > information concerning that. > > so mostly my question is how to extract the information of > max, min, and > median of sequential numbers in a vector. I will reword my original > posting below. > > > > Hello R-helpers! > > > > > > I have a question concerning extracting sequence > information from a > > > vector. I have a vector (representing the bins of a time > series where > > > the frequency of occurrences is greater than some > threshold) where I > > > would like to extract the min, median and max of each group of > > > consecutive numbers in the index vector.. > > > > > > For Example: > > > > > > tmp <- > c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) > > > > > > I would like to have the max,min,median of the following groups: > > > > > > 24,25 - max = 25, min = 24 median = 24.5 > > > 29 max=min=median = 29 > > > 35,36,37,38,39,40,41,42,43,44,45,46,47, max = 45 min = 35 etc... > > > 68,69,70,71 > > > > > > I would like to be able to perform this for many time series so an > > > automated process would be nice. I am hoping to use this > as a peak > > > detection protocol. > > > > > > Any advice would be greatly appreciated, > > > Kevin > > > > > > ----- > > > ----- > > > Kevin J Emerson > > > Center for Ecology and Evolutionary Biology > > > 1210 University of Oregon > > > Eugene, OR 97403 > > > USA > > > kemerson at uoregon.edu > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
This might work:> tmp <- c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) > # generate breaks > group <- c(0, cumsum(diff(tmp) != 1)) > tapply(tmp, group, summary)$`0` Min. 1st Qu. Median Mean 3rd Qu. Max. 24.00 24.25 24.50 24.50 24.75 25.00 $`1` Min. 1st Qu. Median Mean 3rd Qu. Max. 29 29 29 29 29 29 $`2` Min. 1st Qu. Median Mean 3rd Qu. Max. 35 38 41 41 44 47 $`3` Min. 1st Qu. Median Mean 3rd Qu. Max. 68.00 68.75 69.50 69.50 70.25 71.00> >On 7/24/06, Kevin J Emerson <kemerson@uoregon.edu> wrote:> > Hello R-helpers! > > I have a question concerning extracting sequence information from a > vector. I have a vector (representing the bins of a time series where > the frequency of occurrences is greater than some threshold) where I > would like to extract the min, median and max of each group of > consecutive numbers. > > For Example: > > tmp <- c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) > > I would like to have the max,min,median of the following groups: > > 24,25 > 29 > 35,36,37,38,39,40,41,42,43,44,45,46,47, > 68,69,70,71 > > I would like to be able to perform this for many time series so an > automated process would be nice. I am hoping to use this as a peak > detection protocol. > > Any advice would be greatly appreciated, > Kevin > > ----- > ----- > Kevin J Emerson > Center for Ecology and Evolutionary Biology > 1210 University of Oregon > Eugene, OR 97403 > USA > kemerson@uoregon.edu > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]