Pooya Lalehzari
2014-Dec-23 22:57 UTC
[R] Carrying a value down a data.frame conditionally
Hello, I have a data.frame (below) containing the two fields of "Value" and "Signal" and I would need to create the third field of "To_Be_Produced". The condition for producing the third field is to carry the 1 in the "Signal" field down until "Value" is below 40. Do I have to create a for-loop to do this or will I be able to do anything else more efficient? df <- data.frame( Value=c(0,0,100,85,39,1,30,40,20,20,0,0), Signal=c(0,1,0,0,0,0,0,0,0,1,0,0), To_Be_Produced= c(0,1,1,1,0,0,0,0,0,1,0,0) ) Thank you, Pooya. *** We are pleased to announce that, as of October 20th, 2014, we've moved to our new office at: Platinum Partners 250 West 55th Street, 14th Floor, New York, NY 10019 T: 212.582.2222 | F: 212.582.2424 *** THIS E-MAIL IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND MAY CONTAIN CONFIDENTIAL AND PRIVILEGED INFORMATION.ANY UNAUTHORIZED REVIEW, USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND DESTROY ALL COPIES OF THE ORIGINAL E-MAIL. [[alternative HTML version deleted]]
> On 23 Dec 2014, at 23:57 , Pooya Lalehzari <plalehzari at platinumlp.com> wrote: > > Hello, > I have a data.frame (below) containing the two fields of "Value" and "Signal" and I would need to create the third field of "To_Be_Produced". The condition for producing the third field is to carry the 1 in the "Signal" field down until "Value" is below 40. > Do I have to create a for-loop to do this or will I be able to do anything else more efficient? > > > df <- data.frame( Value=c(0,0,100,85,39,1,30,40,20,20,0,0), > Signal=c(0,1,0,0,0,0,0,0,0,1,0,0), > To_Be_Produced= c(0,1,1,1,0,0,0,0,0,1,0,0) > )I'd go with the for loop, unless you _really_ need the efficiency. And if you do need efficiency that badly, it is probably better to code up the for loop in C/C++. (An Rcpp evangelist is likely to chime in any moment now.) If you want a vectorized solution just for the academic exercise, I think you can do something with ave(), grouping by cumsum(Signal) and within groups doing cumprod(Value >= 40), except that you need to skip the first element of each group. And be careful that the first group is different. This seems to do it:> with(df, ave(Value, cumsum(Signal), FUN=function(x) c(0,cumprod(x[-1]>=40))) + Signal) [1] 0 1 1 1 0 0 0 0 0 1 0 0 -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
A while ago I wrote for a questioner on this list a function, 'f1', below, that would give the start and stop times of runs of data that started when then the data went above a threshold and stopped when it first dropped below a different (lower) threshold). It used no loops and was pretty quick. With your data you could use it as > ss <- with(df, f1( (Value>=40)+Signal*2, start=2, stop=1)) > ss start stop 1 2 4 2 10 10 You can convert those start and stop times to a vector with 1's in the runs and 0's outside of the runs with something like > v <- integer(length(df$Value)) > v[ss$start] <- 1 > v[pmain(ss$stop+1, length(v))] <- -1 > cumsum(v) [1] 0 1 1 1 0 0 0 0 0 1 0 0 f1 would be trivial to write in C/C++. It needs a better name. f1 <- function(x, startThreshold, stopThreshold, plot=FALSE) { # find intervals that # start when x goes above startThreshold and # end when x goes below stopThreshold. stopifnot(startThreshold > stopThreshold) isFirstInRun <- function(x)c(TRUE, x[-1] != x[-length(x)]) isLastInRun <- function(x)c(x[-1] != x[-length(x)], TRUE) isOverStart <- x >= startThreshold isOverStop <- x >= stopThreshold possibleStartPt <- which(isFirstInRun(isOverStart) & isOverStart) possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop) pts <- c(possibleStartPt, possibleStopPt) names(pts) <- rep(c("start","stop"), c(length(possibleStartPt), length(possibleStopPt))) pts <- pts[order(pts)] tmp <- isFirstInRun(names(pts)) start <- pts[tmp & names(pts)=="start"] stop <- pts[tmp & names(pts)=="stop"] # Remove case where first downcrossing happens # before first upcrossing. if (length(stop) > length(start)) stop <- stop[-1] if (plot) { plot(x, cex=.5) abline(h=c(startThreshold, stopThreshold)) abline(v=start, col="green") abline(v=stop, col="red") } data.frame(start=start, stop=stop) } Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Dec 23, 2014 at 2:57 PM, Pooya Lalehzari <plalehzari at platinumlp.com> wrote:> > Hello, > I have a data.frame (below) containing the two fields of "Value" and > "Signal" and I would need to create the third field of "To_Be_Produced". > The condition for producing the third field is to carry the 1 in the > "Signal" field down until "Value" is below 40. > Do I have to create a for-loop to do this or will I be able to do anything > else more efficient? > > > df <- data.frame( Value=c(0,0,100,85,39,1,30,40,20,20,0,0), > Signal=c(0,1,0,0,0,0,0,0,0,1,0,0), > To_Be_Produced= c(0,1,1,1,0,0,0,0,0,1,0,0) > ) > > Thank you, > Pooya. > > > > > *** > We are pleased to announce that, as of October 20th, 2014, we've moved to > our new office at: > Platinum Partners > 250 West 55th Street, 14th Floor, New York, NY 10019 > T: 212.582.2222 | F: 212.582.2424 > *** > THIS E-MAIL IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND MAY > CONTAIN > CONFIDENTIAL AND PRIVILEGED INFORMATION.ANY UNAUTHORIZED REVIEW, USE, > DISCLOSURE > OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED RECIPIENT, > PLEASE > CONTACT THE SENDER BY REPLY E-MAIL AND DESTROY ALL COPIES OF THE ORIGINAL > E-MAIL. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]