Nerak
2012-Feb-15 16:02 UTC
[R] function similar to ddply? + calculations based on previous row
Hi all, I was wondering if there is a function kind of similar that splits a dataframe, applies a function to each row and returns in a data frame. I know ddply but this one isn?t useful in this situation. I have a dataframe with values for each day (rows) for different objects (columns). I have values for several years. Now, I want to do calculations on only the data of that year. With the ddply function you can use as second argument Year to split the data frame into years. But the function you use is for that whole part so you get only one output value for that year (for example the sum all the values in that column belonging to that year). I want to calculate a new value for each day of that year (what would be possible with apply if you have only data for one year). I found another way to do this by using a for loop (y in Year[,]:Year[length(Year)] and test2.dataframe<-test.dataframe[which(year==y)] to select the rows belonging to that year on which I make to calculations. The problem is that for loops take a lot of time to run and I?m trying to avoid using them whenever possible. (Example almost reproducible script below) I?m also wondering if it?s possible to refer to a value of the row below from another data vector or data frame or ?. The line I mean in the script below is this one (and is the one that is the course that the script doesn?t work because that n is not known): test.number$numberb[y-Year[1]+1]<-length(which(test.starty==1 & test.f[(n+1)]== 1 )) I want that for a certain row, the according value of test.starty (on the row with the same number (e.g. n) ) = 1 and the according value of the row below row n of test.f ==1. How can I do this without having to loop (which I want to learn to avoid as much as possible). I tried to search on Rhelpforum already and found: http://r.789695.n4.nabble.com/How-to-calc-ratios-base-on-current-and-previous-row-td2341407.html My n+1 is based on the original value so there is should be a solution without looping but I don?t understand how I should index? I?ll illustrate what I mean with a loop to solve this kind (different script) of problems: test[1,]<-ifelse(AAA[1,]>1,1,0) for (t in 2:10) { test <- ifelse(AAA[t,]>1 & AAA[t-1,]==0,1,0) } Below you can see how I did it with the for loop and what I want to create: Year<-data.frame(Date=c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,1982,1982,1983,1983,1983,1983)) test.b<-data.frame(C=c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0),B=c(0,0,0,0,9,6,2,0,0,24,20,16,2,0,0,0),F=c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0)) test.start<-data.frame(C=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0),B=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0),F=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0)) test.2b<-test.b>1 test.number<-data.frame(c(1980:1983)) for (l in 1:nrow(test.b)) { for (y in 1980:1983) { test.f<-test.2b[which(Year == y),l] test.starty<-test.start[which(Year ==y),l] test.number$numberb[y-Year[1]+1]<-length(which(test.starty==1 & test.f[(n-1)]== 1 )) } test.number[,l+1]<-cbind(test.number$numberb) } If someone knows a way to get rid of the loops, let me know! Because I want to make this script as fast as possible for larger datasets. I'm trying to get through the apply family to find solutions but it's a hard issue. Many thanks in advance, Kind regards, Nerak -- View this message in context: http://r.789695.n4.nabble.com/function-similar-to-ddply-calculations-based-on-previous-row-tp4390925p4390925.html Sent from the R help mailing list archive at Nabble.com.
Nerak
2012-Feb-15 21:25 UTC
[R] function similar to ddply? + calculations based on previous row
I saw I made a little mistake in the loop, in the line test.number$numberb[y-Year[1]+1]<-length(which(test.starty==1 & test.f[(n+1)]== 1 )) it is n+1 instead of n-1 (like I wrote in the beginning) But the question I have about it is still the same. My excuzes Year<-data.frame(Date=c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,1982,1982,1983,1983,1983,1983)) test.b<-data.frame(C=c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0),B=c(0,0,0,0,9,6,2,0,0,24,20,16,2,0,0,0),F=c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0)) test.start<-data.frame(C=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0),B=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0),F=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0)) test.2b<-test.b>1 test.number<-data.frame(c(1980:1983)) for (l in 1:nrow(test.b)) { for (y in 1980:1983) { test.f<-test.2b[which(Year == y),l] test.starty<-test.start[which(Year ==y),l] test.number$numberb[y-Year[1]+1]<-length(which(test.starty==1 & test.f[(n+1)]== 1 )) } test.number[,l+1]<-cbind(test.number$numberb) } -- View this message in context: http://r.789695.n4.nabble.com/function-similar-to-ddply-calculations-based-on-previous-row-tp4390925p4392040.html Sent from the R help mailing list archive at Nabble.com.
Ista Zahn
2012-Feb-16 12:53 UTC
[R] function similar to ddply? + calculations based on previous row
Hi, On Wednesday, February 15, 2012 08:02:44 AM Nerak wrote:> Hi all, > > I was wondering if there is a function kind of similar that splits a > dataframe, applies a function to each row and returns in a data frame. I > know ddply but this one isn?t useful in this situation.Why not? Sounds like a description of exactly the thing ddply is designed to do!> > I have a dataframe with values for each day (rows) for different objects > (columns). I have values for several years. Now, I want to do calculations > on only the data of that year. With the ddply function you can use as second > argument Year to split the data frame into years. But the function you use > is for that whole part so you get only one output value for that year (for > example the sum all the values in that column belonging to that year).Not true. ddply is designed to return a data.frame, not a single value. I think if you read the JSS article (http://www.jstatsoft.org/v40/i01) you'll find that ddply will do exactly what you want. Best, Ista I> want to calculate a new value for each day of that year (what would be > possible with apply if you have only data for one year). I found another > way to do this by using a for loop (y in Year[,]:Year[length(Year)] and > test2.dataframe<-test.dataframe[which(year==y)] to select the rows > belonging to that year on which I make to calculations. The problem is that > for loops take a lot of time to run and I?m trying to avoid using them > whenever possible. (Example almost reproducible script below) > > I?m also wondering if it?s possible to refer to a value of the row below > from another data vector or data frame or ?. The line I mean in the script > below is this one (and is the one that is the course that the script doesn?t > work because that n is not known): > > test.number$numberb[y-Year[1]+1]<-length(which(test.starty==1 & > test.f[(n+1)]== 1 )) > > I want that for a certain row, the according value of test.starty (on the > row with the same number (e.g. n) ) = 1 and the according value of the row > below row n of test.f ==1. How can I do this without having to loop (which I > want to learn to avoid as much as possible). I tried to search on > Rhelpforum already and found: > http://r.789695.n4.nabble.com/How-to-calc-ratios-base-on-current-and-previou > s-row-td2341407.html My n+1 is based on the original value so there is > should be a solution without looping but I don?t understand how I should > index? > > I?ll illustrate what I mean with a loop to solve this kind (different > script) of problems: > test[1,]<-ifelse(AAA[1,]>1,1,0) > for (t in 2:10) > { > test <- ifelse(AAA[t,]>1 & AAA[t-1,]==0,1,0) > } > > > Below you can see how I did it with the for loop and what I want to create: > > > Year<-data.frame(Date=c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,19 > 82,1982,1983,1983,1983,1983)) > test.b<-data.frame(C=c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0),B=c(0,0,0,0,9,6, > 2,0,0,24,20,16,2,0,0,0),F=c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0)) > test.start<-data.frame(C=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0),B=c(0,0,0,0,1,0 > ,0,0,0,1,0,0,0,0,0,0),F=c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0)) > > > > test.2b<-test.b>1 > test.number<-data.frame(c(1980:1983)) > for (l in 1:nrow(test.b)) > { > for (y in 1980:1983) > { > > test.f<-test.2b[which(Year == y),l] > test.starty<-test.start[which(Year ==y),l] > test.number$numberb[y-Year[1]+1]<-length(which(test.starty==1 & > test.f[(n-1)]== 1 )) > > } > test.number[,l+1]<-cbind(test.number$numberb) > > } > > > If someone knows a way to get rid of the loops, let me know! Because I want > to make this script as fast as possible for larger datasets. I'm trying to > get through the apply family to find solutions but it's a hard issue. > > Many thanks in advance, > Kind regards, > Nerak > > -- > View this message in context: > http://r.789695.n4.nabble.com/function-similar-to-ddply-calculations-based- > on-previous-row-tp4390925p4390925.html Sent from the R help mailing list > archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- apply with as function ifelse with 2 logical conditions
- different way for a for loop for several columns?
- gradient fill of a grid.polygon
- [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
- unexpected behaviour with ddply and colwise