Suppose I have a long format for a longitudinal data id time x 1 1 10 1 2 11 1 3 23 1 4 23 2 2 12 2 3 13 2 4 14 3 1 11 3 3 15 3 4 18 3 5 21 4 2 22 4 3 27 4 6 29 I want to select the x values for each ID when time is equal to 3. When that observation is not observed, then I want to replace it with the obervation at time equal to 4. otherwise just use NA. How can I implement this with a quick command? [[alternative HTML version deleted]]
Bill.Venables at csiro.au
2009-Mar-29 09:45 UTC
[R] select observations from longitudinal data
Let's tackle the bigger problem of doing this not just for time = 3 but for all times. First we start with your data frame:> datid time x 1 1 1 10 2 1 2 11 3 1 3 23 4 1 4 23 5 2 2 12 6 2 3 13 7 2 4 14 8 3 1 11 9 3 3 15 10 3 4 18 11 3 5 21 12 4 2 22 13 4 3 27 14 4 6 29>### Now put the data into an id x time matrix, with gaps:> mat <- with(dat, {+ lev_id <- sort(unique(id)) + lev_tm <- sort(unique(time)) + M <- matrix(NA, length(lev_id), length(lev_tm)) + dimnames(M) <- list(id = lev_id, time = lev_tm) + M[cbind(match(id, lev_id), match(time, lev_tm))] <- x + M + })> mattime id 1 2 3 4 5 6 1 10 11 23 23 NA NA 2 NA 12 13 14 NA NA 3 11 NA 15 18 21 NA 4 NA 22 27 NA NA 29>### Now do the replacements ### (this is a very questionable dodge, by the way!)> for(i in 2:nrow(mat))+ if(any(k <- is.na(mat[i-1, ]))) + mat[i-1, k] <- mat[i, k]> > mattime id 1 2 3 4 5 6 1 10 11 23 23 NA NA 2 11 12 13 14 21 NA 3 11 22 15 18 21 29 4 NA 22 27 NA NA 29>### some gaps cannot be filled. ### now turn it back into a data frame; ### this is a splendid trick that no one knows about:> dat <- as.data.frame(as.table(mat), responseName = "x") > dat <- with(dat, dat[order(id, time), ]) >### this will look OK, but the first two columns are factors ### as we started with numeric variables (if we did) then ### it might be useful to turn them back to numerical variables ### again:> dat <- within(dat, {+ id <- as.numeric(as.character(id)) + time <- as.numeric(as.character(time)) + })> > datid time x 1 1 1 10 5 1 2 11 9 1 3 23 13 1 4 23 17 1 5 NA 21 1 6 NA 2 2 1 11 6 2 2 12 10 2 3 13 14 2 4 14 18 2 5 21 22 2 6 NA 3 3 1 11 7 3 2 22 11 3 3 15 15 3 4 18 19 3 5 21 23 3 6 29 4 4 1 NA 8 4 2 22 12 4 3 27 16 4 4 NA 20 4 5 NA 24 4 6 29>### As many gaps have been filled as can be filled (with fake data!). ### If you want to remove those still missing, you can use> dat <- na.omit(dat)Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of gallon li Sent: Sunday, 29 March 2009 6:51 PM To: r-help Subject: [R] select observations from longitudinal data Suppose I have a long format for a longitudinal data id time x 1 1 10 1 2 11 1 3 23 1 4 23 2 2 12 2 3 13 2 4 14 3 1 11 3 3 15 3 4 18 3 5 21 4 2 22 4 3 27 4 6 29 I want to select the x values for each ID when time is equal to 3. When that observation is not observed, then I want to replace it with the obervation at time equal to 4. otherwise just use NA. How can I implement this with a quick command? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
gallon li wrote:> Suppose I have a long format for a longitudinal data > > id time x > 1 1 10 > 1 2 11 > 1 3 23 > 1 4 23 > 2 2 12 > 2 3 13 > 2 4 14 > 3 1 11 > 3 3 15 > 3 4 18 > 3 5 21 > 4 2 22 > 4 3 27 > 4 6 29 > > I want to select the x values for each ID when time is equal to 3. When that > observation is not observed, then I want to replace it with the obervation > at time equal to 4. otherwise just use NA. >with this dummy data: data = read.table(header=TRUE, textConnection(open='r', ' id time x 2 2 2 2 3 3 2 4 4 2 5 5 3 3 3 3 4 4 3 5 5 4 4 4 4 5 5 5 5 5')) you seem to expect the result to be like # id time x # 2 3 3 # 3 3 3 # 4 4 4 # 5 NA NA one way to hack this is: # the time points you'd like to use, in order of preference times = 3:4 # split the data by id, # for each subset, find values of x for the first time found, or use NA # combine the subsets back into a single data frame do.call(rbind, by(data, data$id, function(data) with(data, { rows = (time == times[which(times %in% time)[1]]) if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else data[rows,] }))) # id time x # 2 2 3 3 # 3 3 3 3 # 4 4 4 4 # 5 5 NA NA with your original data: data = read.table(header=TRUE, textConnection(open='r', ' id time x 1 1 10 1 2 11 1 3 23 1 4 23 2 2 12 2 3 13 2 4 14 3 1 11 3 3 15 3 4 18 3 5 21 4 2 22 4 3 27 4 6 29')) times = 3:4 do.call(rbind, by(data, data$id, function(data) with(data, { rows = (time == times[which(times %in% time)[1]]) if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else data[rows,] }))) # id time x # 1 1 3 23 # 2 2 3 13 # 3 3 3 15 # 4 4 3 27 is this what you wanted? vQ