I have a data set similar to the set below where 1 and 2 indicate test results and 0 indicates time points in between where there are no test results. I would like to allocate the time points leading up to a test result with the value of the test result. What I have: What I want: 1 1 0 1 0 1 0 1 1 1 0 2 0 2 2 2 0 1 0 1 1 1 0 2 2 2 I have attempted methods creating a data.frame of the the breaks/changes in of values to from 0 to 1 or to 2. x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) x1 <- which(diff(x) == 1) x2 <- which(diff(x) == 2) What ever the solution, I can't be entered by hand due to the size of the dataset (>10 million and change). Any ideas? This is my first time posting to this forum and I am relatively new to R, so please don't flame me to hard. Desperate times call for desperate measures. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2010-Nov-01 20:32 UTC
[R] foreloop? aggregating time series data into groups
On Nov 1, 2010, at 3:34 PM, blurg wrote:> > I have a data set similar to the set below where 1 and 2 indicate test > results and 0 indicates time points in between where there are no test > results. I would like to allocate the time points leading up to a > test > result with the value of the test result. > > What I have: What I want: > 1 1 > 0 1 > 0 1 > 0 1 > 1 1 > 0 2 > 0 2 > 2 2 > 0 1 > 0 1 > 1 1 > 0 2 > 2 2 > > I have attempted methods creating a data.frame of the the breaks/ > changes in > of values to from 0 to 1 or to 2. > x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) > x1 <- which(diff(x) == 1) > x2 <- which(diff(x) == 2)Not sure how long you longest run of zeros is but repeate applications of htis method n-such times will fill in in the backward direction: > xna <- x > xna[xna==0] <- NA > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1] > xna [1] 2 2 1 1 NA NA NA 1 1 1 1 NA NA 2 2 1 NA NA 2 2 NA NA 1 1 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1] > xna [1] 2 2 1 1 NA NA 1 1 1 1 1 NA 2 2 2 1 NA 2 2 2 NA 1 1 1 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1] > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1] > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1] > xna [1] 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1 I'm not sure that conversion to NA is needed. The indexing with which(x==0) and which(x==0+1 might work as well. Yep... that work's too: > x [1] 0 2 0 1 0 0 0 0 1 0 1 0 0 0 2 1 0 0 0 2 0 0 0 1 > x[which(x==0)] <- x[which(x==0)+1] > x [1] 2 2 1 1 0 0 0 1 1 1 1 0 0 2 2 1 0 0 2 2 0 0 1 1 > x[which(x==0)] <- x[which(x==0)+1] > x[which(x==0)] <- x[which(x==0)+1] > x[which(x==0)] <- x[which(x==0)+1] > x [1] 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1 -- David> > What ever the solution, I can't be entered by hand due to the size > of the > dataset (>10 million and change). Any ideas? This is my first time > posting > to this forum and I am relatively new to R, so please don't flame me > to > hard. Desperate times call for desperate measures. Thanks. > --David Winsemius, MD West Hartford, CT
Hi, Welcome to R and the help list! On Mon, Nov 1, 2010 at 12:34 PM, blurg <ian.jhsph at gmail.com> wrote:> > I have a data set similar to the set below where 1 and 2 indicate test > results and 0 indicates time points in between where there are no test > results. ?I would like to allocate the time points leading up to a test > result with the value of the test result. > > What I have: ? ? What I want: > 1 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 1 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 2 > 0 ? ? ? ? ? ? ? ? ? ? 2 > 2 ? ? ? ? ? ? ? ? ? ? 2 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 1 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 2 > 2 ? ? ? ? ? ? ? ? ? ? 2 > > I have attempted methods creating a data.frame of the the breaks/changes in > of values to from 0 to 1 or to 2. > x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) > x1 <- which(diff(x) == 1) > x2 <- which(diff(x) == 2)## Functions that *I think* does what you want myfun <- function(x) { dat <- rle(x) i <- which(dat$values == 0) dat$lengths[i + 1] <- with(dat, lengths[i + 1] + lengths[i]) return(with(dat, rep(values[-i], lengths[-i]))) } ## Three test pieces of data x <- c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) y <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) z <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,0) ## your example, works myfun(x) ## test case 2 (begins with a number), works myfun(y) ## test case 3 (ends with 0), fails myfun(z) So, if things work how I think they do, that function should do what you need as long as the last value is not 0, which kind of makes sense because what value would be assigned anyways? Side note, I created a sample vector with 10 million elements, and it took about 9 seconds to run it through my function. @list members, I welcome someone checking my work, I'm uneasy about a couple aspects generalizing properly.> > What ever the solution, I can't be entered by hand due to the size of the > dataset (>10 million and change). Any ideas? ?This is my first time posting > to this forum and I am relatively new to R, so please don't flame me to > hard.Although this list can certainly be tough at times, for your peace of mind you pretty much did everything right as far as I am concerned. You described your problem, included a small set of sample data that was easily read into R (for future reference say you have a more complex object that is not as easy to create, dput() will save you and us trouble), and even showed what you tried to do. Finally, in your explanation you gave both sample data AND desired outcome. This gives us a "gold standard" to test our code against, rather than hoping our results match what your described you want. I am always thrilled when I'm not left re-reading a paragraph long, English explanation that can be shown nicely with a few numbers.> Desperate times call for desperate measures.and assuming you have put forth some effort trying to solve it yourself and took the time to help us answer your question (as you clearly did here), the help list should not be a desperate measure :) Cheers, Josh ?Thanks.> -- > View this message in context: http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
you can use na.locf in the zoo package:> require(zoo) > x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) > # replace zeros with NA > x[x == 0] <- NA > x[1] NA 2 NA 1 NA NA NA NA 1 NA 1 NA NA NA 2 1 NA NA NA 2 NA NA NA 1> na.locf(x, fromLast = TRUE)[1] 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1>On Mon, Nov 1, 2010 at 3:34 PM, blurg <ian.jhsph at gmail.com> wrote:> > I have a data set similar to the set below where 1 and 2 indicate test > results and 0 indicates time points in between where there are no test > results. ?I would like to allocate the time points leading up to a test > result with the value of the test result. > > What I have: ? ? What I want: > 1 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 1 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 2 > 0 ? ? ? ? ? ? ? ? ? ? 2 > 2 ? ? ? ? ? ? ? ? ? ? 2 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 1 > 1 ? ? ? ? ? ? ? ? ? ? 1 > 0 ? ? ? ? ? ? ? ? ? ? 2 > 2 ? ? ? ? ? ? ? ? ? ? 2 > > I have attempted methods creating a data.frame of the the breaks/changes in > of values to from 0 to 1 or to 2. > x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1) > x1 <- which(diff(x) == 1) > x2 <- which(diff(x) == 2) > > What ever the solution, I can't be entered by hand due to the size of the > dataset (>10 million and change). Any ideas? ?This is my first time posting > to this forum and I am relatively new to R, so please don't flame me to > hard. ?Desperate times call for desperate measures. ?Thanks. > -- > View this message in context: http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?