Ed Holdgate
2007-Jan-30 01:49 UTC
[R] How to find series of small numbers in a big vector?
Hello:
I have a vector with 120,000 reals
between 0.00000 and 0.9999
They are not sorted but the vector index is the
time-order of my measurements, and therefore
cannot be lost.
How do I use R to find the starting and ending
index of ANY and ALL the "series" or "sequences"
in that vector where ever there are 5 or more
members in a row between 0.021 and 0.029 ?
For example:
search_range <- c (0.021, 0.029) # inclusive searching
search_length <- 5 # find ALL series of 5 members within search_range
my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900,
0.900, 0.900, 0.900, 0.900, 0.900,
0.900, 0.028, 0.024, 0.027, 0.023,
0.022, 0.900, 0.900, 0.900, 0.900,
0.900, 0.900, 0.024, 0.029, 0.023,
0.025, 0.026, 0.900, 0.900, 0.900,
0.900, 0.900, 0.900, 0.900, 0.900,
0.900, 0.900, 0.900, 0.900, 0.022,
0.023, 0.025, 0.333, 0.027, 0.028,
0.900, 0.900, 0.900, 0.900, 0.900)
I seek the R program to report:
start_index of 12 and an end_index of 16
-- and also --
start_index of 23 and an end_index of 27
because that is were there happens to be
search_length numbers within my search_range.
It should _not_ report the series at start_index 40
because that 0.333 in there violates the search_range.
I could brute-force hard-code an R program, but
perhaps an expert can give me a tip for an
easy, elegant existing function or a tactic
to approach?
Execution speed or algorithm performance is not,
for me in this case, important. Rather, I
seek an easy R solution to find the time windows
(starting & ending indicies) where 5 or more
small numbers in my search_range were measured
all in a row.
Advice welcome and many thanks in advance.
Ed Holdgate
Vladimir Eremeev
2007-Jan-30 11:34 UTC
[R] How to find series of small numbers in a big vector?
I would try using na.contiguos from package stats.
R.utils has seqToIntervals.defaul,
which "Gets all contigous intervals of a vector of indices".
(I didn't use the latter, help.search("contiguous") gave me that
name).
--
View this message in context:
http://www.nabble.com/-R--How-to-find-series-of-small-numbers-in-a-big-vector--tf3141691.html#a8707709
Sent from the R help mailing list archive at Nabble.com.
I suggest the following appraoch
This gives TRUE for all data within the search_range
A1 = my_data > search_range[1] & my_data < search_range[2]
which() gives us the indices
A2 = which(A1)
and diff() the gaps between those intervals
A3 = diff(A2)
Hence, if A3 > search_length, we have enough consecutive numbers within
the search range
Finally, this is what you wanted to know?
A2[ which(A3 > search_length) ]
On Mon, 2007-01-29 at 17:49 -0800, Ed Holdgate wrote:> Hello:
>
> I have a vector with 120,000 reals
> between 0.00000 and 0.9999
>
> They are not sorted but the vector index is the
> time-order of my measurements, and therefore
> cannot be lost.
>
> How do I use R to find the starting and ending
> index of ANY and ALL the "series" or "sequences"
> in that vector where ever there are 5 or more
> members in a row between 0.021 and 0.029 ?
>
> For example:
>
> search_range <- c (0.021, 0.029) # inclusive searching
> search_length <- 5 # find ALL series of 5 members within search_range
> my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900,
> 0.900, 0.900, 0.900, 0.900, 0.900,
> 0.900, 0.028, 0.024, 0.027, 0.023,
> 0.022, 0.900, 0.900, 0.900, 0.900,
> 0.900, 0.900, 0.024, 0.029, 0.023,
> 0.025, 0.026, 0.900, 0.900, 0.900,
> 0.900, 0.900, 0.900, 0.900, 0.900,
> 0.900, 0.900, 0.900, 0.900, 0.022,
> 0.023, 0.025, 0.333, 0.027, 0.028,
> 0.900, 0.900, 0.900, 0.900, 0.900)
>
> I seek the R program to report:
> start_index of 12 and an end_index of 16
> -- and also --
> start_index of 23 and an end_index of 27
> because that is were there happens to be
> search_length numbers within my search_range.
>
> It should _not_ report the series at start_index 40
> because that 0.333 in there violates the search_range.
>
> I could brute-force hard-code an R program, but
> perhaps an expert can give me a tip for an
> easy, elegant existing function or a tactic
> to approach?
>
> Execution speed or algorithm performance is not,
> for me in this case, important. Rather, I
> seek an easy R solution to find the time windows
> (starting & ending indicies) where 5 or more
> small numbers in my search_range were measured
> all in a row.
>
> Advice welcome and many thanks in advance.
>
> Ed Holdgate
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Michael Dewey
2007-Jan-30 12:19 UTC
[R] How to find series of small numbers in a big vector?
At 01:49 30/01/2007, Ed Holdgate wrote:>Hello: > >I have a vector with 120,000 reals >between 0.00000 and 0.9999 > >They are not sorted but the vector index is the >time-order of my measurements, and therefore >cannot be lost. > >How do I use R to find the starting and ending >index of ANY and ALL the "series" or "sequences" >in that vector where ever there are 5 or more >members in a row between 0.021 and 0.029 ?You could look at rle which codes into runs>For example: > >search_range <- c (0.021, 0.029) # inclusive searching >search_length <- 5 # find ALL series of 5 members within search_range >my_data <- c(0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.028, 0.024, 0.027, 0.023, > 0.022, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.024, 0.029, 0.023, > 0.025, 0.026, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.900, > 0.900, 0.900, 0.900, 0.900, 0.022, > 0.023, 0.025, 0.333, 0.027, 0.028, > 0.900, 0.900, 0.900, 0.900, 0.900) > >I seek the R program to report: >start_index of 12 and an end_index of 16 >-- and also -- >start_index of 23 and an end_index of 27 >because that is were there happens to be >search_length numbers within my search_range. > >It should _not_ report the series at start_index 40 >because that 0.333 in there violates the search_range. > >I could brute-force hard-code an R program, but >perhaps an expert can give me a tip for an >easy, elegant existing function or a tactic >to approach? > >Execution speed or algorithm performance is not, >for me in this case, important. Rather, I >seek an easy R solution to find the time windows >(starting & ending indicies) where 5 or more >small numbers in my search_range were measured >all in a row. > >Advice welcome and many thanks in advance. > >Ed HoldgateMichael Dewey http://www.aghmed.fsnet.co.uk