i have a large longitudinal data set. The number of observations for each subject is not the same across the sample. The largest number of a subject is 5 and the smallest number is 1. now i want to make each subject to have the same number of observations by filling zero, e.g., my original sample is id x 001 10 001 30 001 20 002 10 002 20 002 40 002 80 002 70 003 20 003 40 004 ...... now i wish to make the data like id x 001 10 001 30 001 20 001 0 001 0 002 10 002 20 002 40 002 80 002 70 003 20 003 40 003 0 003 0 003 0 004 ...... so that each id has exactly 5 observations. is there a function which can allow me do this quickly? [[alternative HTML version deleted]]
Chuck Cleland
2007-Jan-27 10:58 UTC
[R] unequal number of observations for longitudinal data
gallon li wrote:> i have a large longitudinal data set. The number of observations for each > subject is not the same across the sample. The largest number of a subject > is 5 and the smallest number is 1. > > now i want to make each subject to have the same number of observations by > filling zero, e.g., my original sample is > > id x > 001 10 > 001 30 > 001 20 > 002 10 > 002 20 > 002 40 > 002 80 > 002 70 > 003 20 > 003 40 > 004 ...... > > now i wish to make the data like > > id x > 001 10 > 001 30 > 001 20 > 001 0 > 001 0 > 002 10 > 002 20 > 002 40 > 002 80 > 002 70 > 003 20 > 003 40 > 003 0 > 003 0 > 003 0 > 004 ...... > > so that each id has exactly 5 observations. is there a function which can > allow me do this quickly?Filling in with zeros seems like a bad idea, but here is an approach to filling in with NAs. I will leave replacing the NAs with zeros to you. df.long <- data.frame(id = c(1,1,1,2,2,2,2,2,3,3), x = runif(10), time = c(1,2,5,1,2,3,4,5,2,4)) df.long id x time 1 1 0.72888215 1 2 1 0.60893548 2 3 1 0.41347690 5 4 2 0.79388248 1 5 2 0.05810054 2 6 2 0.02451654 3 7 2 0.85464775 4 8 2 0.15970365 5 9 3 0.22856183 2 10 3 0.38291471 4 df.wide <- reshape(df, idvar = "id", v.names = "x", direction="wide") df.wide id x.1 x.2 x.5 x.3 x.4 1 1 0.6375135 0.1651258 0.3210223 NA NA 4 2 0.9878134 0.8909020 0.9853269 0.7747615 0.3834130 9 3 NA 0.3586109 NA NA 0.8310539 df.long2 <- reshape(df.wide, direction="long") df.long2 id time x 1.1 1 1 0.6375135 2.1 2 1 0.9878134 3.1 3 1 NA 1.2 1 2 0.1651258 2.2 2 2 0.8909020 3.2 3 2 0.3586109 1.5 1 5 0.3210223 2.5 2 5 0.9853269 3.5 3 5 NA 1.3 1 3 NA 2.3 2 3 0.7747615 3.3 3 3 NA 1.4 1 4 NA 2.4 2 4 0.3834130 3.4 3 4 0.8310539 This assumes that your data in the "long" format has a time variable. See the help page for reshape() for more details.> [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Gabor Grothendieck
2007-Jan-27 11:15 UTC
[R] unequal number of observations for longitudinal data
merge.zoo in the zoo package has an n-way merge supporting zero fill: library(zoo) DF <- data.frame(id = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3), x = c(10, 30, 20, 10, 20, 40, 80, 70, 20, 40)) as.data.frame(do.call(merge, c(lapply(unstack(DF, x ~ id), zoo), fill = 0))) # last line can alternately be f <- function(DF) zoo(DF$x) as.data.frame(do.call(merge, c(by(DF, DF$id, f), fill = 0))) On 1/27/07, gallon li <gallon.li at gmail.com> wrote:> i have a large longitudinal data set. The number of observations for each > subject is not the same across the sample. The largest number of a subject > is 5 and the smallest number is 1. > > now i want to make each subject to have the same number of observations by > filling zero, e.g., my original sample is > > id x > 001 10 > 001 30 > 001 20 > 002 10 > 002 20 > 002 40 > 002 80 > 002 70 > 003 20 > 003 40 > 004 ...... > > now i wish to make the data like > > id x > 001 10 > 001 30 > 001 20 > 001 0 > 001 0 > 002 10 > 002 20 > 002 40 > 002 80 > 002 70 > 003 20 > 003 40 > 003 0 > 003 0 > 003 0 > 004 ...... > > so that each id has exactly 5 observations. is there a function which can > allow me do this quickly? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >