thr3ads.net - R help - [R] spped up a function [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Santiago Guallar

2013-Jul-02 17:47 UTC

[R] spped up a function

Hi,

I have written a function to assign the values of a certain variable
'wd' from a dataset to another dataset. Both contain data from the
same?time period but differ in the length of their time intervals: 'GPS'
has regular 10-minute intervals whereas 'xact' has irregular intervals.
I attached simplified text versions from write.table. You can also get a dput of
'xact' in this address:
http://www.megafileupload.com/en/file/431569/xact-dput.html).
The original objects are large and the function takes almost one hour to finish.
Here's the function:

fxG= function(xact, GPS){
l <- rep( 'A', nrow(GPS) )
v <- unique(GPS$Ring) # the process is carried out for several individuals
identified by 'Ring'
for(k in 1:length(v) ){
I = v[k]
df <- xact[xact$Ring == I,]
for(i in 1:nrow(GPS)){
if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for each i;
it'd save time to make it stop with the last record of each i instead
u <- df$timepos <= GPS[i,]$timepos
# fill vector l for each interval t from xact <= each interval from GPS (take
the max if there's > 1 interval)
l[i] <- df[max( which(u == TRUE) ),]$wd
}
}
}
return(l)}

vwd <- fxG(xact, GPS)


My question is: how can I speed up (optimize) this function?

Thank you for your help

David Winsemius

2013-Jul-02 18:24 UTC

head link

[R] spped up a function

On Jul 2, 2013, at 10:47 AM, Santiago Guallar wrote:
> Hi,
> 
> I have written a function to assign the values of a certain variable
'wd' from a dataset to another dataset. Both contain data from the same
time period but differ in the length of their time intervals: 'GPS' has
regular 10-minute intervals whereas 'xact' has irregular intervals. I
attached simplified text versions from write.table. You can also get a dput of
'xact' in this address:
http://www.megafileupload.com/en/file/431569/xact-dput.html).
> The original objects are large and the function takes almost one hour to
finish.
> Here's the function:
> 
> fxG= function(xact, GPS){
> l <- rep( 'A', nrow(GPS) )
> v <- unique(GPS$Ring) # the process is carried out for several
individuals identified by 'Ring'
> for(k in v ){
>    
>    df <- xact[xact$Ring == v,]
Simplified a bit , this is starting to look like a case for the split function:
>    for(i in 1:nrow(GPS)){
>          if(GPS[i,]$Ring== v){# the code runs along the whole data.frame
for each i;
                   # After doing the simplification I must ask how GPS[i,]$Ring
could not == v ( or I)

                      > 
>           u <- df$timepos <= GPS[i,]$timepos
>                              # fill vector l for each interval t from xact
<= each interval from GPS (take the max if there's > 1 interval)
>           l[i] <- df[max( which(u == TRUE) ),]$wd
                 #perhaps tail(df[which(u),
'wd'],1)?>                               }
>                         }
>                      }
> return(l)}
> This looks like it will be overwriting the l-object with every iteration of
'k'
> vwd <- fxG(xact, GPS)
> 
> 
> My question is: how can I speed up (optimize) this function?
The first thing you should do is describe in natural language what is desired to
be done with objects: 'xact' and 'GPS' not yet described ....
rather than asking for simplification of obscure nested  for-loops with probably
redundant assignments and extraneous conditions. Make a simple example of such
objects and repost.

-- 

David Winsemius
Alameda, CA, USA

Jim Holtman

2013-Jul-03 09:42 UTC

head link

[R] spped up a function

first thing to do when trying to speed up a function is to see where it is
spending its time.  take a subset of the data and use Rprof to profile the code.
my guess is that a lot of time is taken up in the use of dataframes.  see if you
can use matrices instead.

Sent from my iPad

On Jul 2, 2013, at 13:47, Santiago Guallar <sguallar at yahoo.com> wrote:
> Hi,
> 
> I have written a function to assign the values of a certain variable
'wd' from a dataset to another dataset. Both contain data from the same
time period but differ in the length of their time intervals: 'GPS' has
regular 10-minute intervals whereas 'xact' has irregular intervals. I
attached simplified text versions from write.table. You can also get a dput of
'xact' in this address:
http://www.megafileupload.com/en/file/431569/xact-dput.html).
> The original objects are large and the function takes almost one hour to
finish.
> Here's the function:
> 
> fxG= function(xact, GPS){
> l <- rep( 'A', nrow(GPS) )
> v <- unique(GPS$Ring) # the process is carried out for several
individuals identified by 'Ring'
> for(k in 1:length(v) ){
> I = v[k]
> df <- xact[xact$Ring == I,]
> for(i in 1:nrow(GPS)){
> if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for each i;
it'd save time to make it stop with the last record of each i instead
> u <- df$timepos <= GPS[i,]$timepos
> # fill vector l for each interval t from xact <= each interval from GPS
(take the max if there's > 1 interval)
> l[i] <- df[max( which(u == TRUE) ),]$wd
> }
> }
> }
> return(l)}
> 
> vwd <- fxG(xact, GPS)
> 
> 
> My question is: how can I speed up (optimize) this function?
> 
> Thank you for your help
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

PIKAL Petr

2013-Jul-08 09:34 UTC

head link

[R] spped up a function

Hi

It seems to me, that you basically want merge, but I can miss the point. Try
post

dput(head(xact))
dput(head(GPS))

and what shall be desired result based on those 2 datasets.

Regards
Petr

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Santiago Guallar
> Sent: Tuesday, July 02, 2013 7:47 PM
> To: r-help
> Subject: [R] spped up a function
> 
> Hi,
> 
> I have written a function to assign the values of a certain variable
> 'wd' from a dataset to another dataset. Both contain data from the
> same?time period but differ in the length of their time intervals:
> 'GPS' has regular 10-minute intervals whereas 'xact' has
irregular
> intervals. I attached simplified text versions from write.table. You
> can also get a dput of 'xact' in this address:
> http://www.megafileupload.com/en/file/431569/xact-dput.html).
> The original objects are large and the function takes almost one hour
> to finish.
> Here's the function:
> 
> fxG= function(xact, GPS){
> l <- rep( 'A', nrow(GPS) )
> v <- unique(GPS$Ring) # the process is carried out for several
> individuals identified by 'Ring'
> for(k in 1:length(v) ){
> I = v[k]
> df <- xact[xact$Ring == I,]
> for(i in 1:nrow(GPS)){
> if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for
> each i; it'd save time to make it stop with the last record of each i
> instead u <- df$timepos <= GPS[i,]$timepos # fill vector l for each
> interval t from xact <= each interval from GPS (take the max if
there's
> > 1 interval) l[i] <- df[max( which(u == TRUE) ),]$wd } } }
return(l)}
> 
> vwd <- fxG(xact, GPS)
> 
> 
> My question is: how can I speed up (optimize) this function?
> 
> Thank you for your help

PIKAL Petr

2013-Jul-10 06:35 UTC

head link

[R] spped up a function

Hi Santiago

Keep conversation in list. Others can have better ideas.

I am still messing the reasoning

Merge seems to me the solution but I am lost in your resoning what to keep and
what to discard from resulting object.

After merge I have this

result <- structure(list(Ring = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("6106933", "6134701", "6140497",
"6140719", "6140756",
"6140855", "6143070", "6143090",
"6143093", "6175711", "6175726",
"6175730", "6175769", "6175776",
"6175784", "6188609", "6188705",
"6195159", "6195171", "6198153",
"6198154", "6198156", "6198157",
"6198172"), class = "factor"), jul = c(15135, 15135, 15135,
15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135), timepos = structure(c(1307680575, 1307680740,
1307681040, 1307681340, 1307681640, 1307681940, 1307682240, 1307682540,
1307682780, 1307683080, 1307683380, 1307683680, 1307683980, 1307684280,
1307684397, 1307684424, 1307684484, 1307684490, 1307684580, 1307684880,
1307685180, 1307685243, 1307685321, 1307685336), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), act = c(3822L, NA, NA, NA, NA,
NA,
NA, NA, NA, NA, NA, NA, NA, NA, 27L, 60L, 6L, 753L, NA, NA, NA,
78L, 15L, 18L), wd = c("dry", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, "wet", "dry", "wet",
"dry", NA, NA, NA, "wet",
"dry", "wet")), .Names = c("Ring",
"jul", "timepos", "act", "wd"
), row.names = c(NA, -24L), class = "data.frame")
> result      Ring   jul             timepos  act   wd
1  6106933 15135 2011-06-10 04:36:15 3822  dry
2  6106933 15135 2011-06-10 04:39:00   NA <NA>
3  6106933 15135 2011-06-10 04:44:00   NA <NA>
4  6106933 15135 2011-06-10 04:49:00   NA <NA>
5  6106933 15135 2011-06-10 04:54:00   NA <NA>
6  6106933 15135 2011-06-10 04:59:00   NA <NA>
7  6106933 15135 2011-06-10 05:04:00   NA <NA>
8  6106933 15135 2011-06-10 05:09:00   NA <NA>
9  6106933 15135 2011-06-10 05:13:00   NA <NA>
10 6106933 15135 2011-06-10 05:18:00   NA <NA>
11 6106933 15135 2011-06-10 05:23:00   NA <NA>
12 6106933 15135 2011-06-10 05:28:00   NA <NA>
13 6106933 15135 2011-06-10 05:33:00   NA <NA>
14 6106933 15135 2011-06-10 05:38:00   NA <NA>
15 6106933 15135 2011-06-10 05:39:57   27  wet
16 6106933 15135 2011-06-10 05:40:24   60  dry
17 6106933 15135 2011-06-10 05:41:24    6  wet
18 6106933 15135 2011-06-10 05:41:30  753  dry
19 6106933 15135 2011-06-10 05:43:00   NA <NA>
20 6106933 15135 2011-06-10 05:48:00   NA <NA>
21 6106933 15135 2011-06-10 05:53:00   NA <NA>
22 6106933 15135 2011-06-10 05:54:03   78  wet
23 6106933 15135 2011-06-10 05:55:21   15  dry
24 6106933 15135 2011-06-10 05:55:36   18  wet

I understand you want to keep only time values from GPL data.frame. OK this can
be done in the last step. But I am a bit lost in the logic for discarding lines
15-18. Anyway, this can be what you want

library(zoo)
result$wd<-na.locf(result$wd)
final<-result[is.na(result$act),]> final      Ring   jul             timepos act  wd
2  6106933 15135 2011-06-10 04:39:00  NA dry
3  6106933 15135 2011-06-10 04:44:00  NA dry
4  6106933 15135 2011-06-10 04:49:00  NA dry
5  6106933 15135 2011-06-10 04:54:00  NA dry
6  6106933 15135 2011-06-10 04:59:00  NA dry
7  6106933 15135 2011-06-10 05:04:00  NA dry
8  6106933 15135 2011-06-10 05:09:00  NA dry
9  6106933 15135 2011-06-10 05:13:00  NA dry
10 6106933 15135 2011-06-10 05:18:00  NA dry
11 6106933 15135 2011-06-10 05:23:00  NA dry
12 6106933 15135 2011-06-10 05:28:00  NA dry
13 6106933 15135 2011-06-10 05:33:00  NA dry
14 6106933 15135 2011-06-10 05:38:00  NA dry
19 6106933 15135 2011-06-10 05:43:00  NA dry
20 6106933 15135 2011-06-10 05:48:00  NA dry
21 6106933 15135 2011-06-10 05:53:00  NA dry>
Regards
Petr

From: Santiago Guallar [mailto:sguallar@yahoo.com]
Sent: Tuesday, July 09, 2013 10:02 PM
To: PIKAL Petr
Subject: Re: [R] spped up a function

Dear Petr,

I wanted the two data sets merged in such a way that the values of the
'wd' vector (from the intervals t of 'xact') are assigned to the
corresponding intervals of 'GPS'. If there is more than one value (i.e
if there is more than one interval of 'xact' for the corresponding
interval of 'GPS'), then take the maximum (i.e. the value of the
interval of 'xact' closest to the corresponding interval of
'GPS'). This is why the output of the particular sequence of the result
I copied in the previous message contains only 'dry'.

Santi


From: PIKAL Petr
<petr.pikal@precheza.cz<mailto:petr.pikal@precheza.cz>>
To: Santiago Guallar
<sguallar@yahoo.com<mailto:sguallar@yahoo.com>>; r-help
<r-help@r-project.org<mailto:r-help@r-project.org>>
Sent: Tuesday, July 9, 2013 11:19 AM
Subject: RE: [R] spped up a function

Hi Santiago

I am a bit confused how is your result organised, why there are only „dry“ value
regardless of timepos values.

It is not necessary to attach files resulting from dput. Just copy it to your
mail and anybody can copy it directly to R

Ring is factor in xact but numeric in GPS> str(xact)'data.frame':   8 obs. of  5 variables:
$ Ring   : Factor w/ 24 levels "6106933","6134701",..: 1 1 1
1 1 1 1 1
$ jul    : num  15135 15135 15135 15135 15135 ...
$ timepos: POSIXct, format: "2011-06-10 04:36:15" "2011-06-10
05:39:57" ...
$ act    : int  3822 27 60 6 753 78 15 18
$ wd     : chr  "dry" "wet" "dry" "wet"
...> str(GPS)'data.frame':   16 obs. of  3 variables:
$ Ring   : int  6106933 6106933 6106933 6106933 6106933 6106933 6106933 6106933
6106933 6106933 ...
$ jul    : num  15135 15135 15135 15135 15135 ...
$ timepos: POSIXct, format: "2011-06-10 04:39:00" "2011-06-10
04:44:00" ...

So I first changed it to factor in both.

GPS$Ring<-factor(GPS$Ring)

after that I merged both files

result<-merge(xact, GPS, all=T)

and here is result

dput(result)
structure(list(Ring = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("6106933", "6134701", "6140497",
"6140719", "6140756",
"6140855", "6143070", "6143090",
"6143093", "6175711", "6175726",
"6175730", "6175769", "6175776",
"6175784", "6188609", "6188705",
"6195159", "6195171", "6198153",
"6198154", "6198156", "6198157",
"6198172"), class = "factor"), jul = c(15135, 15135, 15135,
15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135), timepos = structure(c(1307680575, 1307680740,
1307681040, 1307681340, 1307681640, 1307681940, 1307682240, 1307682540,
1307682780, 1307683080, 1307683380, 1307683680, 1307683980, 1307684280,
1307684397, 1307684424, 1307684484, 1307684490, 1307684580, 1307684880,
1307685180, 1307685243, 1307685321, 1307685336), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), act = c(3822L, NA, NA, NA, NA,
NA,
NA, NA, NA, NA, NA, NA, NA, NA, 27L, 60L, 6L, 753L, NA, NA, NA,
78L, 15L, 18L), wd = c("dry", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, "wet", "dry", "wet",
"dry", NA, NA, NA, "wet",
"dry", "wet")), .Names = c("Ring",
"jul", "timepos", "act", "wd"
), row.names = c(NA, -24L), class = "data.frame")

there are empty values in act and wd column. You can fill it eg. by „na.locf“
function from „zoo“ package.
> result$wd[1] "dry" NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
NA
[13] NA    NA    "wet" "dry" "wet" "dry"
NA    NA    NA    "wet" "dry"
"wet"> na.locf(result$wd)[1] "dry" "dry" "dry" "dry"
"dry" "dry" "dry" "dry" "dry"
"dry" "dry" "dry"
[13] "dry" "dry" "wet" "dry"
"wet" "dry" "dry" "dry" "dry"
"wet" "dry" "wet">
Is this what you want?

Regards
Petr


From: Santiago Guallar [mailto:sguallar@yahoo.com]
Sent: Tuesday, July 09, 2013 8:53 AM
To: PIKAL Petr; r-help
Subject: Re: [R] spped up a function

Hi Petr, yes the function basically consists on merging two time series with
different time intervals: one regular 'GPS' and one irregular
'xact' (the latter containing the binomial variable 'wd' that I
want to add to 'GPS'.
Apparently my attachments did not go through. Here you have the dputs you
requested plus the desired result based on them:

head(xact)
Ring     jul   timepos        act   wd
6106933 15135 2011-06-10 04:36:15  3822 dry
6106933 15135 2011-06-10 05:39:57    27 wet
6106933 15135 2011-06-10 05:40:24    60 dry
6106933 15135 2011-06-10 05:41:24     6 wet
6106933 15135 2011-06-10 05:41:30   753 dry
6106933 15135 2011-06-10 05:54:03    78 wet
6106933 15135 2011-06-10 05:55:21    15 dry
6106933 15135 2011-06-10 05:55:36    18 wet

head(GPS1, 16) and desired result (added column wd)
      Ring   jul             timepos wd
5  6106933 15135 2011-06-10 04:39:00 dry
6  6106933 15135 2011-06-10 04:44:00 dry
7  6106933 15135 2011-06-10 04:49:00 dry
8  6106933 15135 2011-06-10 04:54:00 dry
9  6106933 15135 2011-06-10 04:59:00 dry
10 6106933 15135 2011-06-10 05:04:00 dry
11 6106933 15135 2011-06-10 05:09:00 dry
12 6106933 15135 2011-06-10 05:13:00 dry
13 6106933 15135 2011-06-10 05:18:00 dry
14 6106933 15135 2011-06-10 05:23:00 dry
15 6106933 15135 2011-06-10 05:28:00 dry
16 6106933 15135 2011-06-10 05:33:00 dry
17 6106933 15135 2011-06-10 05:38:00 dry
18 6106933 15135 2011-06-10 05:43:00 dry
19 6106933 15135 2011-06-10 05:48:00 dry
20 6106933 15135 2011-06-10 05:53:00 dry

Santi
________________________________
From: PIKAL Petr
<petr.pikal@precheza.cz<mailto:petr.pikal@precheza.cz>>
To: Santiago Guallar
<sguallar@yahoo.com<mailto:sguallar@yahoo.com>>; r-help
<r-help@r-project.org<mailto:r-help@r-project.org>>
Sent: Monday, July 8, 2013 11:34 AM
Subject: RE: [R] spped up a function

Hi

It seems to me, that you basically want merge, but I can miss the point. Try
post

dput(head(xact))
dput(head(GPS))

and what shall be desired result based on those 2 datasets.

Regards
Petr

> -----Original Message-----
> From:
r-help-bounces@r-project.org<mailto:r-help-bounces@r-project.org>
[mailto:r-help-bounces@r-
> project.org<http://project.org/>] On Behalf Of Santiago Guallar
> Sent: Tuesday, July 02, 2013 7:47 PM
> To: r-help
> Subject: [R] spped up a function
>
> Hi,
>
> I have written a function to assign the values of a certain variable
> 'wd' from a dataset to another dataset. Both contain data from the
> same time period but differ in the length of their time intervals:
> 'GPS' has regular 10-minute intervals whereas 'xact' has
irregular
> intervals. I attached simplified text versions from write.table. You
> can also get a dput of 'xact' in this address:
> http://www.megafileupload.com/en/file/431569/xact-dput.html).
> The original objects are large and the function takes almost one hour
> to finish.
> Here's the function:
>
> fxG= function(xact, GPS){
> l <- rep( 'A', nrow(GPS) )
> v <- unique(GPS$Ring) # the process is carried out for several
> individuals identified by 'Ring'
> for(k in 1:length(v) ){
> I = v[k]
> df <- xact[xact$Ring == I,]
> for(i in 1:nrow(GPS)){
> if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for
> each i; it'd save time to make it stop with the last record of each i
> instead u <- df$timepos <= GPS[i,]$timepos # fill vector l for each
> interval t from xact <= each interval from GPS (take the max if
there's
> > 1 interval) l[i] <- df[max( which(u == TRUE) ),]$wd } } }
return(l)}
>
> vwd <- fxG(xact, GPS)
>
>
> My question is: how can I speed up (optimize) this function?
>
> Thank you for your help

	[[alternative HTML version deleted]]

R help - Jul 2013 - spped up a function

[R] spped up a function

[R] spped up a function

[R] spped up a function

[R] spped up a function

[R] spped up a function