Hi, I have a data.frame that is date ordered by row number - earliest date first and most current last. I want to create a couple of new columns that show the max and min values from other columns *so far* - not for the whole data.frame. It seems this sort of question is really coming from my lack of understanding about how R intends me to limit myself to portions of a data.frame. I get the impression from the help files that the generic way is that if I'm on the 500th row of a 1000 row data.frame and want to limit the search max does to rows 1:500 I should use something like [1:row] but it's not working inside my function. The idea works outside the function, in the sense I can create tempt1[1:7] and the max function returns what I expect. How do I do this with row? Simple example attached. hp should be 'highest p', ll should be 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument" Thanks, Mark AddCols = function (MyFrame) { MyFrame$p<-0 MyFrame$l<-0 MyFrame$pc<-0 MyFrame$lc<-0 MyFrame$pwin<-0 MyFrame$hp<-0 MyFrame$ll<-0 return(MyFrame) } BinPosNeg = function (MyFrame) { ## Positive y in p column, negative y in l column pos <- MyFrame$y > 0 MyFrame$p[pos] <- MyFrame$y[pos] MyFrame$l[!pos] <- MyFrame$y[!pos] return(MyFrame) } RunningCount = function (MyFrame) { ## Running count of p & l events pos <- (MyFrame$p > 0) MyFrame$pc <- cumsum(pos) pos <- (MyFrame$l < 0) MyFrame$lc <- cumsum(pos) return(MyFrame) } PercentWins = function (MyFrame) { MyFrame$pwin <- round((MyFrame$pc / (MyFrame$pc+MyFrame$lc)),2) return(MyFrame) } HighLow = function (MyFrame) { temp1 <- MyFrame$p[1:row] MyFrame$hp <- max(temp1) ## Highest p temp1 <- MyFrame$l[1:row] MyFrame$ll <- min(temp1) ## Lowest l return(MyFrame) } F1 <- data.frame(x=1:10, y=2*(-4:5) ) F1 <- AddCols(F1) F1 <- BinPosNeg(F1) F1 <- RunningCount(F1) F1 <- PercentWins(F1) F1 F1 <- HighLow(F1) F1 temp1<-F1$p[1:5] max(temp1) temp1<-F1$p[1:7] max(temp1) temp1<-F1$p[1:10] max(temp1)
On 01/07/2009 11:49 AM, Mark Knecht wrote:> Hi, > I have a data.frame that is date ordered by row number - earliest > date first and most current last. I want to create a couple of new > columns that show the max and min values from other columns *so far* - > not for the whole data.frame. > > It seems this sort of question is really coming from my lack of > understanding about how R intends me to limit myself to portions of a > data.frame. I get the impression from the help files that the generic > way is that if I'm on the 500th row of a 1000 row data.frame and want > to limit the search max does to rows 1:500 I should use something > like [1:row] but it's not working inside my function. The idea works > outside the function, in the sense I can create tempt1[1:7] and the > max function returns what I expect. How do I do this with row? > > Simple example attached. hp should be 'highest p', ll should be > 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument" > > Thanks, > Mark > > AddCols = function (MyFrame) { > MyFrame$p<-0 > MyFrame$l<-0 > MyFrame$pc<-0 > MyFrame$lc<-0 > MyFrame$pwin<-0 > MyFrame$hp<-0 > MyFrame$ll<-0 > return(MyFrame) > } > > BinPosNeg = function (MyFrame) { > > ## Positive y in p column, negative y in l column > pos <- MyFrame$y > 0 > MyFrame$p[pos] <- MyFrame$y[pos] > MyFrame$l[!pos] <- MyFrame$y[!pos] > return(MyFrame) > } > > RunningCount = function (MyFrame) { > ## Running count of p & l events > > pos <- (MyFrame$p > 0) > MyFrame$pc <- cumsum(pos) > pos <- (MyFrame$l < 0) > MyFrame$lc <- cumsum(pos) > > return(MyFrame) > } > > PercentWins = function (MyFrame) { > > MyFrame$pwin <- round((MyFrame$pc / (MyFrame$pc+MyFrame$lc)),2) > > return(MyFrame) > } > > HighLow = function (MyFrame) { > temp1 <- MyFrame$p[1:row] > MyFrame$hp <- max(temp1) ## Highest p > temp1 <- MyFrame$l[1:row] > MyFrame$ll <- min(temp1) ## Lowest l > > return(MyFrame) > }You get an error in this function because you didn't define row, so R assumes you mean the function in the base package, and 1:row doesn't make sense. What you want for the "highest so far" is the cummax (for "cumulative maximum") function. See ?cummax. Duncan Murdoch> > F1 <- data.frame(x=1:10, y=2*(-4:5) ) > F1 <- AddCols(F1) > F1 <- BinPosNeg(F1) > F1 <- RunningCount(F1) > F1 <- PercentWins(F1) > F1 > F1 <- HighLow(F1) > F1 > > temp1<-F1$p[1:5] > max(temp1) > temp1<-F1$p[1:7] > max(temp1) > temp1<-F1$p[1:10] > max(temp1) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> More generally, you can always write a loop. They aren't necesssrilyfast > or elegant, but they're pretty general. For example, to calculate the max > of the previous 50 observations (or fewer near the start of a vector), you > could do > > x <- ... some vector ... > > result <- numeric(length(x)) > for (i in seq_along(x)) { > result[i] <- max( x[ max(1, i-49):i ]) > } > > Duncan Murdoch > You should be able to do the same as that loop with one of the *apply functions as well, which would be cleaner and faster (usually). Something like (this isn't real code and won't work) x<-lapply(function(i,j) max(foo[i:j]), seq(1,N-50),seq(50,N)) (where foo is your original vector of length N) Carl
Reasonably Related Threads
- running count in data.frame
- Using functions to change values in a data.frame
- average environmental data if AnimalID and Time is duplicated
- summing values by week - based on daily dates - but with some dates missing
- My very first loop!! I failed. May I have some start-up aid?