Hello, I have an unbalanced panel data set that looks like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 Jack,2007,70 Jordan,2008,72 That is, James, Jack, and Jordan are missing a YEAR. Is there any command that will "fill in" the missing YEAR such that the end result will be balanced and look like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 James,2008,NA Jack,2007,70 Jack,2008,NA Jordan,2007,NA Jordan,2008,72 Thank you. Geoff -- Geoffrey Smith Visiting Assistant Professor Department of Finance W. P. Carey School of Business Arizona State University [[alternative HTML version deleted]]
On Apr 2, 2010, at 3:39 PM, Geoffrey Smith wrote:> Hello, I have an unbalanced panel data set that looks like: > > ID,YEAR,HEIGHT > Tom,2007,65 > Tom,2008,66 > Mary,2007,45 > Mary,2008,50 > Harry,2007,62 > Harry,2008,62 > James,2007,68 > Jack,2007,70 > Jordan,2008,72 > > That is, James, Jack, and Jordan are missing a YEAR. > > Is there any command that will "fill in" the missing YEAR such that > the end > result will be balanced and look like: > > ID,YEAR,HEIGHT > Tom,2007,65 > Tom,2008,66 > Mary,2007,45 > Mary,2008,50 > Harry,2007,62 > Harry,2008,62 > James,2007,68 > James,2008,NA > Jack,2007,70 > Jack,2008,NA > Jordan,2007,NA > Jordan,2008,72It's not "one command" but it's an approach ... assumes you have data in a dataframe named ftbl: > fexp <- expand.grid(ID=unique(ftbl$ID), YEAR=unique(ftbl$YEAR)) > merge(fexp, ftbl, all=TRUE) ID YEAR HEIGHT 1 Harry 2007 62 2 Harry 2008 62 3 Jack 2007 70 4 Jack 2008 NA 5 James 2007 68 6 James 2008 NA 7 Jordan 2007 NA 8 Jordan 2008 72 9 Mary 2007 45 10 Mary 2008 50 11 Tom 2007 65 12 Tom 2008 66 -- David Winsemius, MD West Hartford, CT
Try this:> as.data.frame.table(tapply(DF[,3], DF[2:1], c), responseName = names(DF)[3])YEAR ID HEIGHT 1 2007 Harry 62 2 2008 Harry 62 3 2007 Jack 70 4 2008 Jack NA 5 2007 James 68 6 2008 James NA 7 2007 Jordan NA 8 2008 Jordan 72 9 2007 Mary 45 10 2008 Mary 50 11 2007 Tom 65 12 2008 Tom 66 On Fri, Apr 2, 2010 at 3:39 PM, Geoffrey Smith <gps at asu.edu> wrote:> Hello, I have an unbalanced panel data set that looks like: > > ID,YEAR,HEIGHT > Tom,2007,65 > Tom,2008,66 > Mary,2007,45 > Mary,2008,50 > Harry,2007,62 > Harry,2008,62 > James,2007,68 > Jack,2007,70 > Jordan,2008,72 > > That is, James, Jack, and Jordan are missing a YEAR. > > Is there any command that will "fill in" the missing YEAR such that the end > result will be balanced and look like: > > ID,YEAR,HEIGHT > Tom,2007,65 > Tom,2008,66 > Mary,2007,45 > Mary,2008,50 > Harry,2007,62 > Harry,2008,62 > James,2007,68 > James,2008,NA > Jack,2007,70 > Jack,2008,NA > Jordan,2007,NA > Jordan,2008,72 > > Thank you. ?Geoff > > -- > Geoffrey Smith > Visiting Assistant Professor > Department of Finance > W. P. Carey School of Business > Arizona State University > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >