thr3ads.net - R help - [R] foreloop? aggregating time series data into groups [Nov 2010]

If this information is useful, please help other people find it:
Share via:

blurg

2010-Nov-01 19:34 UTC

[R] foreloop? aggregating time series data into groups

I have a data set similar to the set below where 1 and 2 indicate test
results and 0 indicates time points in between where there are no test
results.  I would like to allocate the time points leading up to a test
result with the value of the test result. 

What I have:     What I want:
1                     1
0                     1
0                     1
0                     1
1                     1
0                     2
0                     2
2                     2
0                     1
0                     1
1                     1
0                     2
2                     2

I have attempted methods creating a data.frame of the the breaks/changes in
of values to from 0 to 1 or to 2.
x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
x1 <- which(diff(x) == 1) 
x2 <- which(diff(x) == 2)

What ever the solution, I can't be entered by hand due to the size of the
dataset (>10 million and change). Any ideas?  This is my first time posting
to this forum and I am relatively new to R, so please don't flame me to
hard.  Desperate times call for desperate measures.  Thanks.
-- 
View this message in context:
http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius

2010-Nov-01 20:32 UTC

head link

[R] foreloop? aggregating time series data into groups

On Nov 1, 2010, at 3:34 PM, blurg wrote:
>
> I have a data set similar to the set below where 1 and 2 indicate test
> results and 0 indicates time points in between where there are no test
> results.  I would like to allocate the time points leading up to a  
> test
> result with the value of the test result.
>
> What I have:     What I want:
> 1                     1
> 0                     1
> 0                     1
> 0                     1
> 1                     1
> 0                     2
> 0                     2
> 2                     2
> 0                     1
> 0                     1
> 1                     1
> 0                     2
> 2                     2
>
> I have attempted methods creating a data.frame of the the breaks/ 
> changes in
> of values to from 0 to 1 or to 2.
> x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
> x1 <- which(diff(x) == 1)
> x2 <- which(diff(x) == 2)
Not sure how long you longest run of zeros is but repeate applications  
of htis method n-such times will fill in in the backward direction:

 > xna <- x
 > xna[xna==0] <- NA
 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1]
 > xna
  [1]  2  2  1  1 NA NA NA  1  1  1  1 NA NA  2  2  1 NA NA  2  2 NA  
NA  1  1
 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1]
 > xna
  [1]  2  2  1  1 NA NA  1  1  1  1  1 NA  2  2  2  1 NA  2  2  2 NA   
1  1  1
 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1]
 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1]
 > xna[which(is.na(xna))] <- xna[which(is.na(xna))+1]
 > xna
  [1] 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1

I'm not sure that conversion to NA is needed. The indexing with  
which(x==0) and which(x==0+1 might work as well. Yep... that work's too:

 > x
  [1] 0 2 0 1 0 0 0 0 1 0 1 0 0 0 2 1 0 0 0 2 0 0 0 1
 > x[which(x==0)] <- x[which(x==0)+1]
 > x
  [1] 2 2 1 1 0 0 0 1 1 1 1 0 0 2 2 1 0 0 2 2 0 0 1 1
 > x[which(x==0)] <- x[which(x==0)+1]
 > x[which(x==0)] <- x[which(x==0)+1]
 > x[which(x==0)] <- x[which(x==0)+1]
 > x
  [1] 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1


-- 
David

>
> What ever the solution, I can't be entered by hand due to the size  
> of the
> dataset (>10 million and change). Any ideas?  This is my first time  
> posting
> to this forum and I am relatively new to R, so please don't flame me  
> to
> hard.  Desperate times call for desperate measures.  Thanks.
> -- 

David Winsemius, MD
West Hartford, CT

Joshua Wiley

2010-Nov-01 21:04 UTC

head link

[R] foreloop? aggregating time series data into groups

Hi,

Welcome to R and the help list!

On Mon, Nov 1, 2010 at 12:34 PM, blurg <ian.jhsph at gmail.com>
wrote:>
> I have a data set similar to the set below where 1 and 2 indicate test
> results and 0 indicates time points in between where there are no test
> results. ?I would like to allocate the time points leading up to a test
> result with the value of the test result.
>
> What I have: ? ? What I want:
> 1 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 1 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 2
> 0 ? ? ? ? ? ? ? ? ? ? 2
> 2 ? ? ? ? ? ? ? ? ? ? 2
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 1 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 2
> 2 ? ? ? ? ? ? ? ? ? ? 2
>
> I have attempted methods creating a data.frame of the the breaks/changes in
> of values to from 0 to 1 or to 2.
> x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
> x1 <- which(diff(x) == 1)
> x2 <- which(diff(x) == 2)
## Functions that *I think* does what you want
myfun <- function(x) {
  dat <- rle(x)
  i <- which(dat$values == 0)
  dat$lengths[i + 1] <- with(dat, lengths[i + 1] + lengths[i])
  return(with(dat, rep(values[-i], lengths[-i])))
}

## Three test pieces of data
x <- c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
y <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
z <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,0)

## your example, works
myfun(x)
## test case 2 (begins with a number), works
myfun(y)
## test case 3 (ends with 0), fails
myfun(z)

So, if things work how I think they do, that function should do what
you need as long as the last value is not 0, which kind of makes sense
because what value would be assigned anyways?

Side note, I created a sample vector with 10 million elements, and it
took about 9 seconds to run it through my function.

@list members, I welcome someone checking my work, I'm uneasy about a
couple aspects generalizing properly.
>
> What ever the solution, I can't be entered by hand due to the size of
the
> dataset (>10 million and change). Any ideas? ?This is my first time
posting
> to this forum and I am relatively new to R, so please don't flame me to
> hard.
Although this list can certainly be tough at times, for your peace of
mind you pretty much did everything right as far as I am concerned.
You described your problem, included a small set of sample data that
was easily read into R (for future reference say you have a more
complex object that is not as easy to create, dput() will save you and
us trouble), and even showed what you tried to do.

Finally, in your explanation you gave both sample data AND desired
outcome.  This gives us a "gold standard" to test our code against,
rather than hoping our results match what your described you want.  I
am always thrilled when I'm not left re-reading a paragraph long,
English explanation that can be shown nicely with a few numbers.
> Desperate times call for desperate measures.
and assuming you have put forth some effort trying to solve it
yourself and took the time to help us answer your question (as you
clearly did here), the help list should not be a desperate measure :)

Cheers,

Josh

 ?Thanks.> --
> View this message in context:
http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

jim holtman

2010-Nov-01 21:15 UTC

head link

[R] foreloop? aggregating time series data into groups

you can use na.locf in the zoo package:
> require(zoo)
> x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
> # replace zeros with NA
> x[x == 0] <- NA
> x [1] NA  2 NA  1 NA NA NA NA  1 NA  1 NA NA NA  2  1 NA NA NA  2 NA NA NA 
1> na.locf(x, fromLast = TRUE)
 [1] 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1>

On Mon, Nov 1, 2010 at 3:34 PM, blurg <ian.jhsph at gmail.com>
wrote:>
> I have a data set similar to the set below where 1 and 2 indicate test
> results and 0 indicates time points in between where there are no test
> results. ?I would like to allocate the time points leading up to a test
> result with the value of the test result.
>
> What I have: ? ? What I want:
> 1 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 1 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 2
> 0 ? ? ? ? ? ? ? ? ? ? 2
> 2 ? ? ? ? ? ? ? ? ? ? 2
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 1
> 1 ? ? ? ? ? ? ? ? ? ? 1
> 0 ? ? ? ? ? ? ? ? ? ? 2
> 2 ? ? ? ? ? ? ? ? ? ? 2
>
> I have attempted methods creating a data.frame of the the breaks/changes in
> of values to from 0 to 1 or to 2.
> x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
> x1 <- which(diff(x) == 1)
> x2 <- which(diff(x) == 2)
>
> What ever the solution, I can't be entered by hand due to the size of
the
> dataset (>10 million and change). Any ideas? ?This is my first time
posting
> to this forum and I am relatively new to R, so please don't flame me to
> hard. ?Desperate times call for desperate measures. ?Thanks.
> --
> View this message in context:
http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Seemingly Similar Threads

Search for more maybe matching threads

R help - Nov 2010 - foreloop? aggregating time series data into groups

[R] foreloop? aggregating time series data into groups

[R] foreloop? aggregating time series data into groups

[R] foreloop? aggregating time series data into groups

[R] foreloop? aggregating time series data into groups

Seemingly Similar Threads