thr3ads.net - R help - [R] need technique for speeding up R dataframe individual element insertion (no deletion though) [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Ishwor

2009-Aug-13 12:07 UTC

[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

Hi fellas,

I am working on a dataframe cam and it involves comparison within the
2 columns - t1 and t2 on about 20K rows and 14 columns.

###
cap = cam; # this doesn't take long. ~1 secs.


for( i in 1:length(cam$end_date))
  {
    x1=strptime(cam$end_date[i], "%d/%m/%Y");
    x2=strptime(cam$end_date[i+1], "%d/%m/%Y");

    t1= cam$vol[i];
    t2= cam$vol[i+1];

    if(!is.na(x2) && !is.na(x1) && !is.na(t1) &&
!is.na(t2))
    {
      if( (x2>=x1) && (t1==t2) ) # date and vol
      {
        cap$levels[i]=1; #make change to specific dataframe cell
        cap$levels[i+1]=1;
      }
    }
  }
###

Having coded that, i ran a timing profile on this section and each
1000'th row comparison is taking ~1.1 minutes on a 2.8Ghz dual-core
box (which is a test box we use).
This obviously computes to ~21 minutes for 20k which is definitely not
where we want it headed. I believe, optimisation(or even different way
to address indexing inside dataframe) can be had inside the innermost
`if' and specifically in `cap$levels[i]=1;' but I am a bit at a loss
having scoured the documentation failing to find anything of value.
So, my question remains are there any general/specific changes I can
do to speed up the code execution dramatically?

Thanks folks.

-- 
Regards,
Ishwor Gurung

jim holtman

2009-Aug-13 12:25 UTC

head link

[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

First of all, do the strptime conversions one time outside the loop.
I would guess that if you ran Rprof on the code, most of the time is
in that routine -- did you run Rprof?

Also you are going through the loop one too many times; your ending
value is 'length(cam$end_date)' and then you are indexing one greater
than that in the loop 'x2=strptime(cam$end_date[i+1],
"%d/%m/%Y");'

FYI -- you don't need the semicolons at the end of the statements.

On Thu, Aug 13, 2009 at 8:07 AM, Ishwor<ishwor.gurung at gmail.com>
wrote:> Hi fellas,
>
> I am working on a dataframe cam and it involves comparison within the
> 2 columns - t1 and t2 on about 20K rows and 14 columns.
>
> ###
> cap = cam; # this doesn't take long. ~1 secs.
>
>
> for( i in 1:length(cam$end_date))
> ?{
> ? ?x1=strptime(cam$end_date[i], "%d/%m/%Y");
> ? ?x2=strptime(cam$end_date[i+1], "%d/%m/%Y");
>
> ? ?t1= cam$vol[i];
> ? ?t2= cam$vol[i+1];
>
> ? ?if(!is.na(x2) && !is.na(x1) && !is.na(t1) &&
!is.na(t2))
> ? ?{
> ? ? ?if( (x2>=x1) && (t1==t2) ) # date and vol
> ? ? ?{
> ? ? ? ?cap$levels[i]=1; #make change to specific dataframe cell
> ? ? ? ?cap$levels[i+1]=1;
> ? ? ?}
> ? ?}
> ?}
> ###
>
> Having coded that, i ran a timing profile on this section and each
> 1000'th row comparison is taking ~1.1 minutes on a 2.8Ghz dual-core
> box (which is a test box we use).
> This obviously computes to ~21 minutes for 20k which is definitely not
> where we want it headed. I believe, optimisation(or even different way
> to address indexing inside dataframe) can be had inside the innermost
> `if' and specifically in `cap$levels[i]=1;' but I am a bit at a
loss
> having scoured the documentation failing to find anything of value.
> So, my question remains are there any general/specific changes I can
> do to speed up the code execution dramatically?
>
> Thanks folks.
>
> --
> Regards,
> Ishwor Gurung
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Bill.Venables at csiro.au

2009-Aug-13 12:44 UTC

head link

[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

Why do you need an explicit loop at all?

(Also, your loop goes over i in 1:length(cam$end_date) but your code refers to
cam$end_date[i+1] -->||<--!!)


Here is a suggestion.  You want to identify places where the date increases but
the volume does not change.  OK, where?

ind <- with(cam, {
           dx <- as.numeric(diff(strptime(end_date, "%d/%m/%Y")))
           dt <- diff(vol)
           which(dx > 0 & dt == 0)
})

Now adjust the new data frame

cap <- within(cam, {
             levels[ind] <- 1
             levels[ind+1] <- 1
})

Of course this is untested code, so caveat emptor!

Bill Venables.

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Ishwor [ishwor.gurung at gmail.com]
Sent: 13 August 2009 22:07
To: r-help at r-project.org
Subject: [R] need technique for speeding up R dataframe individual element     
insertion (no deletion though)

Hi fellas,

I am working on a dataframe cam and it involves comparison within the
2 columns - t1 and t2 on about 20K rows and 14 columns.

###
cap = cam; # this doesn't take long. ~1 secs.


for( i in 1:length(cam$end_date))
  {
    x1=strptime(cam$end_date[i], "%d/%m/%Y");
    x2=strptime(cam$end_date[i+1], "%d/%m/%Y");

    t1= cam$vol[i];
    t2= cam$vol[i+1];

    if(!is.na(x2) && !is.na(x1) && !is.na(t1) &&
!is.na(t2))
    {
      if( (x2>=x1) && (t1==t2) ) # date and vol
      {
        cap$levels[i]=1; #make change to specific dataframe cell
        cap$levels[i+1]=1;
      }
    }
  }
###

Having coded that, i ran a timing profile on this section and each
1000'th row comparison is taking ~1.1 minutes on a 2.8Ghz dual-core
box (which is a test box we use).
This obviously computes to ~21 minutes for 20k which is definitely not
where we want it headed. I believe, optimisation(or even different way
to address indexing inside dataframe) can be had inside the innermost
`if' and specifically in `cap$levels[i]=1;' but I am a bit at a loss
having scoured the documentation failing to find anything of value.
So, my question remains are there any general/specific changes I can
do to speed up the code execution dramatically?

Thanks folks.

--
Regards,
Ishwor Gurung

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Aug 2009 - need technique for speeding up R dataframe individual element insertion (no deletion though)

[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

Possibly Parallel Threads