I have the following longitudinal data: id time y 1 1 10 1 2 12 1 3 15 1 6 18 2 1 8 2 3 9 2 4 11 2 5 12 3 1 8 3 4 16 4 1 9 4 5 13 5 1 7 5 2 9 5 6 11 .... I want to select the observations at time 4. if the observation at time 4 is missing, then i want to slect the observation at time 3. if the observation at time 3 is also missing, then i want to select observation at time 5. otherwise i will put a missing value there. the selected set is like id time y 1 3 15 2 4 11 3 4 16 4 5 13 5 4 NA ... so the rule is (1) obs at time 4 for each id; (2) if no such obs, then look for obs at time 3; (3) if no such obs, then look for obs at time 5; (4) otherwise, NA. [[alternative HTML version deleted]]
Dimitris Rizopoulos
2009-Jan-18 09:54 UTC
[R] select observations from longitudinal data set
one way is the following: dat <- read.table(textConnection("id time y 1 1 10 1 2 12 1 3 15 1 6 18 2 1 8 2 3 9 2 4 11 2 5 12 3 1 8 3 4 16 4 1 9 4 5 13 5 1 7 5 2 9 5 6 11"), header = TRUE) closeAllConnections() val <- 4 dat. <- data.frame(id = unique(dat$id), time = val) out <- merge(dat, dat., all = TRUE) do.call("rbind", lapply(split(out, out$id), function (d) { x <- d[d$time == val, ] ind <- is.na(x$y) if (ind && any(ii <- d$time == val - 1)) { x$y <- d$y[ii] } else if (ind && any(ii <- d$time == val + 1)) { x$y <- d$y[ii] } x })) If you want the output to be a matrix (and not a data.frame), then you could change the do.call("rbind", lapply(...)) part with t(sapply(...)). I hope it helps. Best, Dimitris gallon li wrote:> I have the following longitudinal data: > > id time y > 1 1 10 > 1 2 12 > 1 3 15 > 1 6 18 > 2 1 8 > 2 3 9 > 2 4 11 > 2 5 12 > 3 1 8 > 3 4 16 > 4 1 9 > 4 5 13 > 5 1 7 > 5 2 9 > 5 6 11 > .... > > I want to select the observations at time 4. if the observation at time 4 is > missing, then i want to slect the observation at time 3. if the observation > at time 3 is also missing, then i want to select observation at time 5. > otherwise i will put a missing value there. the selected set is like > > id time y > 1 3 15 > 2 4 11 > 3 4 16 > 4 5 13 > 5 4 NA > ... > > so the rule is (1) obs at time 4 for each id; (2) if no such obs, then look > for obs at time 3; (3) if no such obs, then look for obs at time 5; (4) > otherwise, NA. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Gabor Grothendieck
2009-Jan-18 13:29 UTC
[R] select observations from longitudinal data set
Try this. 'by' splits up the data frame into one data frame per id and then f acts separately on each such sub-dataframe returning a ts series with NAs for the missings. cbind'ing those all together gives us this series with one column per id:> ttTime Series: Start = 1 End = 6 Frequency = 1 1 2 3 4 5 1 10 8 8 9 7 2 12 NA NA NA 9 3 15 9 NA NA NA 4 NA 11 16 NA NA 5 NA 12 NA 13 NA 6 18 NA NA NA 11 and finally we use a string of ifelse's to choose the correct values.> library(zoo) > f <- function(d) as.ts(zoo(d$y, d$time, freq = 1)) > tt <- do.call(cbind, by(dat, dat$id, f)) > ifelse(is.na(tt[4,]), ifelse(is.na(tt[3,]), tt[5,], tt[3,]), tt[4,])1 2 3 4 5 15 11 16 13 NA As in the example data, we have assumed that at least one of the sub-dataframes has a point at time 1 and at least one has a point at time 5. On Sun, Jan 18, 2009 at 2:42 AM, gallon li <gallon.li at gmail.com> wrote:> I have the following longitudinal data: > > id time y > 1 1 10 > 1 2 12 > 1 3 15 > 1 6 18 > 2 1 8 > 2 3 9 > 2 4 11 > 2 5 12 > 3 1 8 > 3 4 16 > 4 1 9 > 4 5 13 > 5 1 7 > 5 2 9 > 5 6 11 > .... > > I want to select the observations at time 4. if the observation at time 4 is > missing, then i want to slect the observation at time 3. if the observation > at time 3 is also missing, then i want to select observation at time 5. > otherwise i will put a missing value there. the selected set is like > > id time y > 1 3 15 > 2 4 11 > 3 4 16 > 4 5 13 > 5 4 NA > ... > > so the rule is (1) obs at time 4 for each id; (2) if no such obs, then look > for obs at time 3; (3) if no such obs, then look for obs at time 5; (4) > otherwise, NA. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >