thr3ads.net - R help - [R] efficiency when processing ordered data frames [May 2009]

If this information is useful, please help other people find it:
Share via:

Brigid Mooney

2009-May-20 12:54 UTC

[R] efficiency when processing ordered data frames

Hoping for a little insight into how to make sure I have R running as
efficiently as possible.

Suppose I have a data frame, A, with n rows and m columns, where col1
is a date time stamp.  Also suppose that when this data is imported
(from a csv or SQL), that the data is already sorted such that the
time stamp in col1 is in ascending (or descending) order.

If I then wanted to select only the rows of A where col1 <= a certain
time, I am wondering if R has to read through the entirety of col1 to
select those rows (all n of them).  Is it possible for R to recognize
(or somehow be told) that these rows are already in order, thus
allowing the computation could be completed in ~log(n) row reads
instead?

Thanks!

jim holtman

2009-May-20 13:27 UTC

head link

[R] efficiency when processing ordered data frames

How much is it currently costing you in time to do the selection process?
Is it having a large impact on your program? Is it the part that is really
consuming the overall time?  What is your concern in this area? Here is the
timing that it take so select from 10M values those that are less than a
specific value.  This takes less than 0.2 seconds:
> x <- runif(1e7)
> system.time(y <- x < .5)   user  system elapsed
   0.15    0.05    0.20> x <- sort(x)
> system.time(y <- x < .5)   user  system elapsed
   0.11    0.03    0.14>

On Wed, May 20, 2009 at 8:54 AM, Brigid Mooney <bkmooney@gmail.com> wrote:
> Hoping for a little insight into how to make sure I have R running as
> efficiently as possible.
>
> Suppose I have a data frame, A, with n rows and m columns, where col1
> is a date time stamp.  Also suppose that when this data is imported
> (from a csv or SQL), that the data is already sorted such that the
> time stamp in col1 is in ascending (or descending) order.
>
> If I then wanted to select only the rows of A where col1 <= a certain
> time, I am wondering if R has to read through the entirety of col1 to
> select those rows (all n of them).  Is it possible for R to recognize
> (or somehow be told) that these rows are already in order, thus
> allowing the computation could be completed in ~log(n) row reads
> instead?
>
> Thanks!
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more maybe matching threads

R help - May 2009 - efficiency when processing ordered data frames

[R] efficiency when processing ordered data frames

[R] efficiency when processing ordered data frames

Seemingly Similar Threads