Ed Holdgate
2007-Jan-30 01:49 UTC
[R] How to find series of small numbers in a big vector?
Hello: I have a vector with 120,000 reals between 0.00000 and 0.9999 They are not sorted but the vector index is the time-order of my measurements, and therefore cannot be lost. How do I use R to find the starting and ending index of ANY and ALL the "series" or "sequences" in that vector where ever there are 5 or more members in a row between 0.021 and 0.029 ? For example: search_range <- c (0.021, 0.029) # inclusive searching search_length <- 5 # find ALL series of 5 members within search_range my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.028, 0.024, 0.027, 0.023, 0.022, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.024, 0.029, 0.023, 0.025, 0.026, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.900, 0.022, 0.023, 0.025, 0.333, 0.027, 0.028, 0.900, 0.900, 0.900, 0.900, 0.900) I seek the R program to report: start_index of 12 and an end_index of 16 -- and also -- start_index of 23 and an end_index of 27 because that is were there happens to be search_length numbers within my search_range. It should _not_ report the series at start_index 40 because that 0.333 in there violates the search_range. I could brute-force hard-code an R program, but perhaps an expert can give me a tip for an easy, elegant existing function or a tactic to approach? Execution speed or algorithm performance is not, for me in this case, important. Rather, I seek an easy R solution to find the time windows (starting & ending indicies) where 5 or more small numbers in my search_range were measured all in a row. Advice welcome and many thanks in advance. Ed Holdgate
Vladimir Eremeev
2007-Jan-30 11:34 UTC
[R] How to find series of small numbers in a big vector?
I would try using na.contiguos from package stats. R.utils has seqToIntervals.defaul, which "Gets all contigous intervals of a vector of indices". (I didn't use the latter, help.search("contiguous") gave me that name). -- View this message in context: http://www.nabble.com/-R--How-to-find-series-of-small-numbers-in-a-big-vector--tf3141691.html#a8707709 Sent from the R help mailing list archive at Nabble.com.
I suggest the following appraoch This gives TRUE for all data within the search_range A1 = my_data > search_range[1] & my_data < search_range[2] which() gives us the indices A2 = which(A1) and diff() the gaps between those intervals A3 = diff(A2) Hence, if A3 > search_length, we have enough consecutive numbers within the search range Finally, this is what you wanted to know? A2[ which(A3 > search_length) ] On Mon, 2007-01-29 at 17:49 -0800, Ed Holdgate wrote:> Hello: > > I have a vector with 120,000 reals > between 0.00000 and 0.9999 > > They are not sorted but the vector index is the > time-order of my measurements, and therefore > cannot be lost. > > How do I use R to find the starting and ending > index of ANY and ALL the "series" or "sequences" > in that vector where ever there are 5 or more > members in a row between 0.021 and 0.029 ? > > For example: > > search_range <- c (0.021, 0.029) # inclusive searching > search_length <- 5 # find ALL series of 5 members within search_range > my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.028, 0.024, 0.027, 0.023, > 0.022, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.024, 0.029, 0.023, > 0.025, 0.026, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.022, > 0.023, 0.025, 0.333, 0.027, 0.028, > 0.900, 0.900, 0.900, 0.900, 0.900) > > I seek the R program to report: > start_index of 12 and an end_index of 16 > -- and also -- > start_index of 23 and an end_index of 27 > because that is were there happens to be > search_length numbers within my search_range. > > It should _not_ report the series at start_index 40 > because that 0.333 in there violates the search_range. > > I could brute-force hard-code an R program, but > perhaps an expert can give me a tip for an > easy, elegant existing function or a tactic > to approach? > > Execution speed or algorithm performance is not, > for me in this case, important. Rather, I > seek an easy R solution to find the time windows > (starting & ending indicies) where 5 or more > small numbers in my search_range were measured > all in a row. > > Advice welcome and many thanks in advance. > > Ed Holdgate > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Michael Dewey
2007-Jan-30 12:19 UTC
[R] How to find series of small numbers in a big vector?
At 01:49 30/01/2007, Ed Holdgate wrote:>Hello: > >I have a vector with 120,000 reals >between 0.00000 and 0.9999 > >They are not sorted but the vector index is the >time-order of my measurements, and therefore >cannot be lost. > >How do I use R to find the starting and ending >index of ANY and ALL the "series" or "sequences" >in that vector where ever there are 5 or more >members in a row between 0.021 and 0.029 ?You could look at rle which codes into runs>For example: > >search_range <- c (0.021, 0.029) # inclusive searching >search_length <- 5 # find ALL series of 5 members within search_range >my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.028, 0.024, 0.027, 0.023, > 0.022, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.024, 0.029, 0.023, > 0.025, 0.026, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.022, > 0.023, 0.025, 0.333, 0.027, 0.028, > 0.900, 0.900, 0.900, 0.900, 0.900) > >I seek the R program to report: >start_index of 12 and an end_index of 16 >-- and also -- >start_index of 23 and an end_index of 27 >because that is were there happens to be >search_length numbers within my search_range. > >It should _not_ report the series at start_index 40 >because that 0.333 in there violates the search_range. > >I could brute-force hard-code an R program, but >perhaps an expert can give me a tip for an >easy, elegant existing function or a tactic >to approach? > >Execution speed or algorithm performance is not, >for me in this case, important. Rather, I >seek an easy R solution to find the time windows >(starting & ending indicies) where 5 or more >small numbers in my search_range were measured >all in a row. > >Advice welcome and many thanks in advance. > >Ed HoldgateMichael Dewey http://www.aghmed.fsnet.co.uk