I have a For loop that is quite slow and am wondering if there is a faster option: df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500)) df$Height <- exp(-0.1 + 0.2*df$Age) df$HeightGrowth <- NA #intialize with NA for (i in 2:nrow(df)) {if(df$TreeID[i]==df$TreeID[i-1]) {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1] } } Trevor Walker Email: trevordaviswalker@gmail.com [[alternative HTML version deleted]]
Hello, One way to speed it up is to use a matrix instead of a data.frame. Since data.frames can hold data of all classes, the access to their elements is slow. And your data is all numeric so it can be hold in a matrix. The second way below gave me a speed up by a factor of 50. system.time({ for (i in 2:nrow(df)) {if(df$TreeID[i]==df$TreeID[i-1]) {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1] } } }) system.time({ df2 <- data.matrix(df) for(i in seq_len(nrow(df2))[-1]){ if(df2[i, "TreeID"] == df2[i - 1, "TreeID"]) df2[i, "HeightGrowth"] <- df2[i, "Height"] - df2[i - 1, "Height"] } }) all.equal(df, as.data.frame(df2)) # TRUE Hope this helps, Rui Barradas Em 10-06-2013 18:28, Trevor Walker escreveu:> I have a For loop that is quite slow and am wondering if there is a faster > option: > > df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500)) > df$Height <- exp(-0.1 + 0.2*df$Age) > df$HeightGrowth <- NA #intialize with NA > for (i in 2:nrow(df)) > {if(df$TreeID[i]==df$TreeID[i-1]) > {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1] > } > } > > Trevor Walker > Email: trevordaviswalker at gmail.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
How about for (ir in unique(df$TreeID)) { in.ir <- df$TreeID == ir df$HeightGrowth[in.ir] <- cumsum(df$Height[in.ir]) } Seemed fast enough to me. In R, it is generally good to look for ways to operate on entire vectors or arrays, rather than element by element within them. The cumsum() function does that in this example. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 6/10/13 10:28 AM, "Trevor Walker" <trevordaviswalker at gmail.com> wrote:>I have a For loop that is quite slow and am wondering if there is a faster >option: > >df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500)) >df$Height <- exp(-0.1 + 0.2*df$Age) >df$HeightGrowth <- NA #intialize with NA >for (i in 2:nrow(df)) > {if(df$TreeID[i]==df$TreeID[i-1]) > {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1] > } > } > >Trevor Walker >Email: trevordaviswalker at gmail.com > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Jun 10, 2013, at 10:28 AM, Trevor Walker wrote:> I have a For loop that is quite slow and am wondering if there is a faster > option: > > df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500)) > df$Height <- exp(-0.1 + 0.2*df$Age) > df$HeightGrowth <- NA #intialize with NA > for (i in 2:nrow(df)) > {if(df$TreeID[i]==df$TreeID[i-1]) > {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1] > } > } >Ivoid tests with if(){}e;se(). Use vectorized code, possibly with 'ifelse' but in this case you need a function that does calcualtions within groups. The ave() function with diff() will do it compactly and efficiently:> df <- data.frame(TreeID=rep(1:5,each=4), Age=rep(seq(1,4,1),5)) > df$Height <- exp(-0.1 + 0.2*df$Age) > df$HeightGrowth <- NA #intialize with NA> df$HeightGrowth <- ave(df$Height, df$TreeID, FUN= function(vec) c(NA, diff(vec))) > dfTreeID Age Height HeightGrowth 1 1 1 1.105171 NA 2 1 2 1.349859 0.2446879 3 1 3 1.648721 0.2988625 4 1 4 2.013753 0.3650314 5 2 1 1.105171 NA 6 2 2 1.349859 0.2446879 7 2 3 1.648721 0.2988625 8 2 4 2.013753 0.3650314 9 3 1 1.105171 NA 10 3 2 1.349859 0.2446879 11 3 3 1.648721 0.2988625 12 3 4 2.013753 0.3650314 13 4 1 1.105171 NA 14 4 2 1.349859 0.2446879 15 4 3 1.648721 0.2988625 16 4 4 2.013753 0.3650314 17 5 1 1.105171 NA 18 5 2 1.349859 0.2446879 19 5 3 1.648721 0.2988625 20 5 4 2.013753 0.3650314 (On my machine it was over six times as fast as the if-based code from Arun. ) -- David Winsemius Alameda, CA, USA
Hi, Some speed comparisons: df <- data.frame(TreeID=rep(1:6000,each=20), Age=rep(seq(1,20,1),6000)) df$Height <- exp(-0.1 + 0.2*df$Age) df1<- df df3<-df library(data.table) dt1<- data.table(df) df$HeightGrowth <- NA system.time({? #Rui's 2nd function df2 <- data.matrix(df) for(i in seq_len(nrow(df2))[-1]){ ??? if(df2[i, "TreeID"] == df2[i - 1, "TreeID"]) ??? ??? df2[i, "HeightGrowth"] <- df2[i, "Height"] - df2[i - 1, "Height"] } }) # user? system elapsed ?# 1.108?? 0.000?? 1.109 system.time({for (ir in unique(df$TreeID)) {?? #Don's first function ? in.ir <- df$TreeID == ir ? df$HeightGrowth[in.ir] <- c(NA, diff(df$Height[in.ir])) }}) #? user? system elapsed #100.004?? 0.704 100.903 system.time({df3$delta <- c(NA,diff(df3$Height)) ##Don's 2nd function df3$delta[df3$delta < 0] <- NA}) #####winner #?? user? system elapsed ?# 0.016?? 0.000?? 0.014 system.time(df1$HeightGrowth <- ave(df1$Height, df1$TreeID, FUN= function(vec) c(NA, diff(vec)))) #David's ?#user? system elapsed ?# 0.136?? 0.000?? 0.137 ?system.time(dt1[,HeightGrowth:=c(NA,diff(Height)),by=TreeID]) #? user? system elapsed ?# 0.076?? 0.000?? 0.079 ?identical(df1,as.data.frame(dt1)) #[1] TRUE ?identical(df1,df) #[1] TRUE head(df1,2) #? TreeID Age?? Height HeightGrowth #1????? 1?? 1 1.105171?????????? NA #2????? 1?? 2 1.349859??? 0.2446879 head(df2,2) #???? TreeID Age?? Height HeightGrowth #[1,]????? 1?? 1 1.105171?????????? NA #[2,]????? 1?? 2 1.349859??? 0.2446879 A.K. ----- Original Message ----- From: Trevor Walker <trevordaviswalker at gmail.com> To: r-help at r-project.org Cc: Sent: Monday, June 10, 2013 1:28 PM Subject: [R] Speed up or alternative to 'For' loop I have a For loop that is quite slow and am wondering if there is a faster option: df <- data.frame(TreeID=rep(1:500,each=20), Age=rep(seq(1,20,1),500)) df$Height <- exp(-0.1 + 0.2*df$Age) df$HeightGrowth <- NA? #intialize with NA for (i in 2:nrow(df)) {if(df$TreeID[i]==df$TreeID[i-1]) ? {df$HeightGrowth[i] <- df$Height[i]-df$Height[i-1] ? } } Trevor Walker Email: trevordaviswalker at gmail.com ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.