ssobek at gwdg.de
2011-Jul-27 17:38 UTC
[R] Executing for loop by grouping variable within dataframe
Dear list, I have a large dataset which is structured as follows: locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958", "USC00020958", "USC00021001","USC00021001", "USC00021001", "USC00021001", "USC00021001", "USC00021001") temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8, -3.0) month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11) day= c(27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6) df=data.frame(locality,temp.a,month,day)> locality temp.a month day >1 USC00020958 -1.2 12 27 >2 USC00020958 -1.2 12 28 >3 USC00020958 -1.2 12 29 >4 USC00020958 -1.2 12 30 >5 USC00020958 -1.1 12 31 >6 USC00021001 -2.2 11 1 >7 USC00021001 -2.4 11 2 >8 USC00021001 -2.6 11 3 >9 USC00021001 -2.7 11 4 >10 USC00021001 -2.8 11 5 >11 USC00021001 -3.0 11 6I would like to calculate a 5th variable, temp.t, based on temp.a, and temp.t for the preceding time step. I successfully created a for loop as follows: temp.t=list() for(i in 2:nrow(df)){ k=0.8 temp.t[1]=df$temp.a[1] temp.t[i]=(as.numeric(temp.t[i-1]))+k*(as.numeric(df$temp.a[i])-(as.numeric(temp.t[i-1]))) } temp.t <- unlist(temp.t) df["temp.t"] <- round(temp.t,1) df> locality temp.a month day temp.t >1 USC00020958 -1.2 12 27 -1.2 >2 USC00020958 -1.2 12 28 -1.2 >3 USC00020958 -1.2 12 29 -1.2 >4 USC00020958 -1.2 12 30 -1.2 >5 USC00020958 -1.1 12 31 -1.1 >6 USC00021001 -2.2 11 1 -2.0 >7 USC00021001 -2.4 11 2 -2.3 >8 USC00021001 -2.6 11 3 -2.5 >9 USC00021001 -2.7 11 4 -2.7 >10 USC00021001 -2.8 11 5 -2.8 >11 USC00021001 -3.0 11 6 -3.0This worked fine as long as I was dealing with datasets that only contained one locality. However, as you can see above, my current dataset contains more than one locality, and I need to execute my loop for each locality separately. What is the best approach to do this? I have tried repeatedly to put the loop into a command using either ave, by or tapply and to specify locality as the grouping variable, but no matter what I try, nothing works, because I am unable to specify my loop as a function within ave, by, or tapply. I don't know if I am just doing it wrong (likely!) since I have no experience working with loops/functions, or if this is simply not the right approach to solve my problem. I was also considering using a nested for loop, but failed at setting it up. I would greatly appreciate if someone could point me in the right direction. Thanks a lot, Stephanie
Dennis Murphy
2011-Jul-27 23:44 UTC
[R] Executing for loop by grouping variable within dataframe
Hi: I don't get exactly the same results as you did in the second group (how does temp.t[1] = -2.0 instead of -2.2?) but try this: locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958", "USC00020958", "USC00021001","USC00021001", "USC00021001", "USC00021001", "USC00021001", "USC00021001") temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8, -3.0) month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11) day= c(27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6) df=data.frame(locality, temp.a, month, day) f <- function(d) { k <- 0.8 if(nrow(d) == 1L) {return(data.frame(d, temp.t = temp.a))} else { tmp <- rep(NA, nrow(d)) tmp[1] <- d[1, 'temp.a'] for(j in 2:length(tmp)) tmp[j] <- tmp[j - 1] + k * (d$temp.a[j] - tmp[j - 1]) data.frame(d, temp.t = tmp) } } require('plyr') ddply(df, 'locality', f) locality temp.a month day temp.t 1 USC00020958 -1.2 12 27 -1.200000 2 USC00020958 -1.2 12 28 -1.200000 3 USC00020958 -1.2 12 29 -1.200000 4 USC00020958 -1.2 12 30 -1.200000 5 USC00020958 -1.1 12 31 -1.120000 6 USC00021001 -2.2 11 1 -2.200000 7 USC00021001 -2.4 11 2 -2.360000 8 USC00021001 -2.6 11 3 -2.552000 9 USC00021001 -2.7 11 4 -2.670400 10 USC00021001 -2.8 11 5 -2.774080 11 USC00021001 -3.0 11 6 -2.954816 If you want to round the result, substitute the last line in the function with data.frame(d, temp.t = round(tmp, 1)) Related functions are ceiling() and floor() in case they are of interest. HTH, Dennis On Wed, Jul 27, 2011 at 10:38 AM, <ssobek at gwdg.de> wrote:> Dear list, > > I have a large dataset which is structured as follows: > > locality=c("USC00020958", "USC00020958", "USC00020958", "USC00020958", > "USC00020958", "USC00021001","USC00021001", "USC00021001", "USC00021001", > "USC00021001", "USC00021001") > > temp.a=c(-1.2, -1.2, -1.2, -1.2, -1.1, -2.2, -2.4, -2.6,-2.7, -2.8, -3.0) > > month= c(12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11) > > day= c(27, 28, 29, 30, 31, 1, ?2, ?3, ?4, ?5, ?6) > > df=data.frame(locality,temp.a,month,day) > >> ? ? ?locality temp.a month day >>1 ?USC00020958 ? -1.2 ? ?12 ?27 >>2 ?USC00020958 ? -1.2 ? ?12 ?28 >>3 ?USC00020958 ? -1.2 ? ?12 ?29 >>4 ?USC00020958 ? -1.2 ? ?12 ?30 >>5 ?USC00020958 ? -1.1 ? ?12 ?31 >>6 ?USC00021001 ? -2.2 ? ?11 ? 1 >>7 ?USC00021001 ? -2.4 ? ?11 ? 2 >>8 ?USC00021001 ? -2.6 ? ?11 ? 3 >>9 ?USC00021001 ? -2.7 ? ?11 ? 4 >>10 USC00021001 ? -2.8 ? ?11 ? 5 >>11 USC00021001 ? -3.0 ? ?11 ? 6 > > I would like to calculate a 5th variable, temp.t, based on temp.a, and > temp.t for the preceding time step. I successfully created a for loop as > follows: > > temp.t=list() > > for(i in 2:nrow(df)){ > k=0.8 > temp.t[1]=df$temp.a[1] > temp.t[i]=(as.numeric(temp.t[i-1]))+k*(as.numeric(df$temp.a[i])-(as.numeric(temp.t[i-1]))) > } > > temp.t <- unlist(temp.t) > > > df["temp.t"] <- round(temp.t,1) > > df > >> ? ? locality temp.a month day temp.t >>1 ?USC00020958 ? -1.2 ? ?12 ?27 ? -1.2 >>2 ?USC00020958 ? -1.2 ? ?12 ?28 ? -1.2 >>3 ?USC00020958 ? -1.2 ? ?12 ?29 ? -1.2 >>4 ?USC00020958 ? -1.2 ? ?12 ?30 ? -1.2 >>5 ?USC00020958 ? -1.1 ? ?12 ?31 ? -1.1 >>6 ?USC00021001 ? -2.2 ? ?11 ? 1 ? -2.0 >>7 ?USC00021001 ? -2.4 ? ?11 ? 2 ? -2.3 >>8 ?USC00021001 ? -2.6 ? ?11 ? 3 ? -2.5 >>9 ?USC00021001 ? -2.7 ? ?11 ? 4 ? -2.7 >>10 USC00021001 ? -2.8 ? ?11 ? 5 ? -2.8 >>11 USC00021001 ? -3.0 ? ?11 ? 6 ? -3.0 > > This worked fine as long as I was dealing with datasets that only > contained one locality. However, as you can see above, my current dataset > contains more than one locality, and I need to execute my loop for each > locality separately. What is the best approach to do this? > > I have tried repeatedly to put the loop into a command using either ave, > by or tapply and to specify locality as the grouping variable, but no > matter what I try, nothing works, because I am unable to specify my loop > as a function within ave, by, or tapply. > > I don't know if I am just doing it wrong (likely!) since I have no > experience working with loops/functions, or if this is simply not the > right approach to ?solve my problem. I was also considering using a nested > for loop, but failed at setting it up. I would greatly appreciate if > someone could point me in the right direction. > > Thanks a lot, > > Stephanie > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >