Steven Archambault
2015-Mar-10 22:53 UTC
[R] Panel Data--filling in missing dates in a span only
Hi folks, I have this panel data (below), with observations missing in each of the panels. I want to fill in years for the missing data, but only those years within the span of the existing data. For instance, BC-0002 needs on year, 1995. I do not want any years after the last observation. structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("BC-0002", "BC-0003", "BC-0004"), class = "factor"), Date = c(1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1996L, 1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1996L, 1995L, 1996L, 1997L, 1998L, 2000L, 1994L, 1993L, 1999L, 1998L), DepthtoWater_bgs = c(317.85, 317.25, 321.25, 312.31, 313.01, 330.41, 321.01, 166.58, 167.55, 168.65, 168.95, 169.25, 168.85, 169.75, 260.6, 261.65, 262.15, 265.45, 266.15, 265.25, 265.05, 266.95, 267.75)), .Names = c("ID", "Date", "DepthtoWater_bgs" ), class = "data.frame", row.names = c(NA, -23L)) I have been using this code to expand the entire panels, but it is not what exactly what I want. fexp <- expand.grid(ID=unique(wells$ID), Date=unique(wells$Date)) merge(fexp, wells, all=TRUE) Any help would be much appreciated! Thanks, Steve [[alternative HTML version deleted]]
Steve, Here is one approach that works. I am calling your first data frame "df". # list all years from min to max observed in each ID years <- tapply(df$Date, df$ID, function(x) min(x):max(x)) # create a data frame based on the observed range of years fulldf <- data.frame(ID=rep(names(years), sapply(years, length)), Date=unlist(years)) # merge the data frame of observations with the data frame with all years merge(fulldf, df, all=TRUE) Jean On Tue, Mar 10, 2015 at 5:53 PM, Steven Archambault <archstevej at gmail.com> wrote:> Hi folks, > > I have this panel data (below), with observations missing in each of the > panels. I want to fill in years for the missing data, but only those years > within the span of the existing data. For instance, BC-0002 needs on year, > 1995. I do not want any years after the last observation. > > structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label > c("BC-0002", > "BC-0003", "BC-0004"), class = "factor"), Date = c(1989L, 1990L, > 1991L, 1992L, 1993L, 1994L, 1996L, 1989L, 1990L, 1991L, 1992L, > 1993L, 1994L, 1996L, 1995L, 1996L, 1997L, 1998L, 2000L, 1994L, > 1993L, 1999L, 1998L), DepthtoWater_bgs = c(317.85, 317.25, 321.25, > 312.31, 313.01, 330.41, 321.01, 166.58, 167.55, 168.65, 168.95, > 169.25, 168.85, 169.75, 260.6, 261.65, 262.15, 265.45, 266.15, > 265.25, 265.05, 266.95, 267.75)), .Names = c("ID", "Date", > "DepthtoWater_bgs" > ), class = "data.frame", row.names = c(NA, -23L)) > > > I have been using this code to expand the entire panels, but it is not > what exactly what I want. > > fexp <- expand.grid(ID=unique(wells$ID), Date=unique(wells$Date)) > merge(fexp, wells, all=TRUE) > > Any help would be much appreciated! > > Thanks, > Steve > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]