Greetings, I have a vector of the form: [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a combination of sequences of non-missing values and missing values, with each sequence possibly of a different length. I'd like to create another vector which will help me pick out the sequences of non-missing values. For the example above, this would be: [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal ultimately is to calculate means separately for each sequence. Your help is appreciated. If I'm making this more complicated than necessary, I'd appreciate knowing that as well! Many thanks. [[alternative HTML version deleted]]
Dear Krishna, Here is one way. It is not very elegant, but seems to work: # x is the vector you want to change foo <- function(x){ R1 <- rle(!is.na(x)) R2 <- rle(is.na(x)) len <- R1$lengths[!R2$values] x[!is.na(x)] <- rep(1:length(len), len) x } # Example x <- c(10, 8, 1, 3, 0, 8, NA, NA, NA, NA, 2, 1, 6, NA, NA, NA, 0, 5, 1, 9) foo(x) # [1] 1 1 1 1 1 1 NA NA NA NA 2 2 2 NA NA NA 3 3 3 3 HTH, Jorge On Tue, Jul 7, 2009 at 5:08 PM, Krishna Tateneni <tateneni@gmail.com> wrote:> Greetings, I have a vector of the form: > [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a > combination > of sequences of non-missing values and missing values, with each sequence > possibly of a different length. > > I'd like to create another vector which will help me pick out the sequences > of non-missing values. For the example above, this would be: > [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal ultimately > is > to calculate means separately for each sequence. > > Your help is appreciated. If I'm making this more complicated than > necessary, I'd appreciate knowing that as well! > > Many thanks. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Here's one possibility: vv <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)> (1+cumsum(diff(is.na(c(vv[1],vv)))==1)) * !is.na(vv)[1] 1 1 1 1 1 1 0 0 0 0 2 2 2 0 0 0 3 3 3 3 On Tue, Jul 7, 2009 at 5:08 PM, Krishna Tateneni <tateneni@gmail.com> wrote:> Greetings, I have a vector of the form: > [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a > combination > of sequences of non-missing values and missing values, with each sequence > possibly of a different length. > > I'd like to create another vector which will help me pick out the sequences > of non-missing values. For the example above, this would be: > [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal ultimately > is > to calculate means separately for each sequence. > > Your help is appreciated. If I'm making this more complicated than > necessary, I'd appreciate knowing that as well! > > Many thanks. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Jul 7, 2009, at 4:08 PM, Krishna Tateneni wrote:> Greetings, I have a vector of the form: > [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a > combination > of sequences of non-missing values and missing values, with each > sequence > possibly of a different length. > > I'd like to create another vector which will help me pick out the > sequences > of non-missing values. For the example above, this would be: > [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal > ultimately is > to calculate means separately for each sequence. > > Your help is appreciated. If I'm making this more complicated than > necessary, I'd appreciate knowing that as well! > > Many thanks.Here is one possibility: Vec <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9) > Vec [1] 10 8 1 3 0 8 NA NA NA NA 2 1 6 NA NA NA 0 5 1 9 Use rle() to get the runs of NA and non-NA values. See ?rle Runs <- rle(is.na(Vec)) > Runs Run Length Encoding lengths: int [1:5] 6 4 3 3 4 values : logi [1:5] FALSE TRUE FALSE TRUE FALSE Create grouping values for each run: Grps <- rep(seq(length(Runs$lengths)), Runs$lengths) > Grps [1] 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 5 5 Now get the means for each run, split by Grps. See ?aggregate > aggregate(Vec, list(Grps = Grps), mean) Grps x 1 1 5.00 2 2 NA 3 3 3.00 4 4 NA 5 5 3.75 If you don't want the NA runs included in the result, you could use subset(): > subset(aggregate(Vec, list(Grps = Grps), mean), !is.na(x)) Grps x 1 1 5.00 3 3 3.00 5 5 3.75 HTH, Marc Schwartz