Hi,
I have a data.frame that is date ordered by row number - earliest
date first and most current last. I want to create a couple of new
columns that show the max and min values from other columns *so far* -
not for the whole data.frame.
It seems this sort of question is really coming from my lack of
understanding about how R intends me to limit myself to portions of a
data.frame. I get the impression from the help files that the generic
way is that if I'm on the 500th row of a 1000 row data.frame and want
to limit the search max does to rows 1:500 I should use something
like [1:row] but it's not working inside my function. The idea works
outside the function, in the sense I can create tempt1[1:7] and the
max function returns what I expect. How do I do this with row?
Simple example attached. hp should be 'highest p', ll should be
'lowest l'. I get an error message "Error in 1:row : NA/NaN
argument"
Thanks,
Mark
AddCols = function (MyFrame) {
MyFrame$p<-0
MyFrame$l<-0
MyFrame$pc<-0
MyFrame$lc<-0
MyFrame$pwin<-0
MyFrame$hp<-0
MyFrame$ll<-0
return(MyFrame)
}
BinPosNeg = function (MyFrame) {
## Positive y in p column, negative y in l column
pos <- MyFrame$y > 0
MyFrame$p[pos] <- MyFrame$y[pos]
MyFrame$l[!pos] <- MyFrame$y[!pos]
return(MyFrame)
}
RunningCount = function (MyFrame) {
## Running count of p & l events
pos <- (MyFrame$p > 0)
MyFrame$pc <- cumsum(pos)
pos <- (MyFrame$l < 0)
MyFrame$lc <- cumsum(pos)
return(MyFrame)
}
PercentWins = function (MyFrame) {
MyFrame$pwin <- round((MyFrame$pc / (MyFrame$pc+MyFrame$lc)),2)
return(MyFrame)
}
HighLow = function (MyFrame) {
temp1 <- MyFrame$p[1:row]
MyFrame$hp <- max(temp1) ## Highest p
temp1 <- MyFrame$l[1:row]
MyFrame$ll <- min(temp1) ## Lowest l
return(MyFrame)
}
F1 <- data.frame(x=1:10, y=2*(-4:5) )
F1 <- AddCols(F1)
F1 <- BinPosNeg(F1)
F1 <- RunningCount(F1)
F1 <- PercentWins(F1)
F1
F1 <- HighLow(F1)
F1
temp1<-F1$p[1:5]
max(temp1)
temp1<-F1$p[1:7]
max(temp1)
temp1<-F1$p[1:10]
max(temp1)
On 01/07/2009 11:49 AM, Mark Knecht wrote:> Hi, > I have a data.frame that is date ordered by row number - earliest > date first and most current last. I want to create a couple of new > columns that show the max and min values from other columns *so far* - > not for the whole data.frame. > > It seems this sort of question is really coming from my lack of > understanding about how R intends me to limit myself to portions of a > data.frame. I get the impression from the help files that the generic > way is that if I'm on the 500th row of a 1000 row data.frame and want > to limit the search max does to rows 1:500 I should use something > like [1:row] but it's not working inside my function. The idea works > outside the function, in the sense I can create tempt1[1:7] and the > max function returns what I expect. How do I do this with row? > > Simple example attached. hp should be 'highest p', ll should be > 'lowest l'. I get an error message "Error in 1:row : NA/NaN argument" > > Thanks, > Mark > > AddCols = function (MyFrame) { > MyFrame$p<-0 > MyFrame$l<-0 > MyFrame$pc<-0 > MyFrame$lc<-0 > MyFrame$pwin<-0 > MyFrame$hp<-0 > MyFrame$ll<-0 > return(MyFrame) > } > > BinPosNeg = function (MyFrame) { > > ## Positive y in p column, negative y in l column > pos <- MyFrame$y > 0 > MyFrame$p[pos] <- MyFrame$y[pos] > MyFrame$l[!pos] <- MyFrame$y[!pos] > return(MyFrame) > } > > RunningCount = function (MyFrame) { > ## Running count of p & l events > > pos <- (MyFrame$p > 0) > MyFrame$pc <- cumsum(pos) > pos <- (MyFrame$l < 0) > MyFrame$lc <- cumsum(pos) > > return(MyFrame) > } > > PercentWins = function (MyFrame) { > > MyFrame$pwin <- round((MyFrame$pc / (MyFrame$pc+MyFrame$lc)),2) > > return(MyFrame) > } > > HighLow = function (MyFrame) { > temp1 <- MyFrame$p[1:row] > MyFrame$hp <- max(temp1) ## Highest p > temp1 <- MyFrame$l[1:row] > MyFrame$ll <- min(temp1) ## Lowest l > > return(MyFrame) > }You get an error in this function because you didn't define row, so R assumes you mean the function in the base package, and 1:row doesn't make sense. What you want for the "highest so far" is the cummax (for "cumulative maximum") function. See ?cummax. Duncan Murdoch> > F1 <- data.frame(x=1:10, y=2*(-4:5) ) > F1 <- AddCols(F1) > F1 <- BinPosNeg(F1) > F1 <- RunningCount(F1) > F1 <- PercentWins(F1) > F1 > F1 <- HighLow(F1) > F1 > > temp1<-F1$p[1:5] > max(temp1) > temp1<-F1$p[1:7] > max(temp1) > temp1<-F1$p[1:10] > max(temp1) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> More generally, you can always write a loop. They aren't necesssrilyfast > or elegant, but they're pretty general. For example, to calculate the max > of the previous 50 observations (or fewer near the start of a vector), you > could do > > x <- ... some vector ... > > result <- numeric(length(x)) > for (i in seq_along(x)) { > result[i] <- max( x[ max(1, i-49):i ]) > } > > Duncan Murdoch > You should be able to do the same as that loop with one of the *apply functions as well, which would be cleaner and faster (usually). Something like (this isn't real code and won't work) x<-lapply(function(i,j) max(foo[i:j]), seq(1,N-50),seq(50,N)) (where foo is your original vector of length N) Carl
Possibly Parallel Threads
- running count in data.frame
- Using functions to change values in a data.frame
- average environmental data if AnimalID and Time is duplicated
- summing values by week - based on daily dates - but with some dates missing
- My very first loop!! I failed. May I have some start-up aid?