Kunzler, Andreas
2009-Apr-06 11:13 UTC
[R] time-series data and time-invariant missing values
Dear list, I have some problems with time-series data and missing values of time-invariant informations like sex or the birth-date. Assume a data (d) structure like id birth sex year of observation 1 NA NA 2006 1 1976-01-01 male 2007 1 NA NA 2008 I am looking for a way to replace the missing values. Right know my answer to this problem slows down R for (i in 1:length(d[,1])){ # for all observations if (is.na(d$birth)[i])==F){ # Check if birth of observation(i) is missing d$birth_2[i] <- as.Date(birth[i],"%d.%m.%Y") }else{ d$birth2[i] <- d$birth[id[i]==d$id & is.na(d$birth)==F],"%d.%m.%Y")[1] # if birth of observation (i) is missing, take a observation of another year } } } Result: id birth sex year of observation birth2 1 NA NA 2006 1976-01-01 1 01.01.1976 male 2007 1976-01-01 1 NA NA 2008 1976-01-01 unfortunately the data consists of over 20000 observations a year. Does anybody know a better way? Thanks Mit freundlichen Gr??en Andreas Kunzler ____________________________ Bundeszahn?rztekammer (BZ?K) Chausseestra?e 13 10115 Berlin Tel.: 030 40005-113 Fax: 030 40005-119 E-Mail: a.kunzler at bzaek.de
Gabor Grothendieck
2009-Apr-06 11:52 UTC
[R] time-series data and time-invariant missing values
Check out na.locf in the zoo package. Here we fill in NAs going forward and just in case there were NAs right at the beginning we fill them in backward as well. library(zoo) x <- as.Date(c(NA, "2000-01-01", NA)) x2 <- na.locf(x, na.rm = FALSE) x2 <- na.locf(x2, fromLast = TRUE, na.rm = FALSE) gives:> x2[1] "2000-01-01" "2000-01-01" "2000-01-01" On Mon, Apr 6, 2009 at 7:13 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> Dear list, > > I have some problems with time-series data and missing values of time-invariant informations like sex or the birth-date. > > Assume a data (d) structure like > > id ? ? ?birth ? ? ? ? ? sex ? ? year of observation > 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2006 > 1 ? ? ? 1976-01-01 ? ? ?male ? ?2007 > 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2008 > > I am looking for a way to replace the missing values. > > Right know my answer to this problem slows down R > > > > for (i in 1:length(d[,1])){ # for all observations > > ? ? ? ?if (is.na(d$birth)[i])==F){ # Check if birth of observation(i) is missing > ? ? ? ? ? ?d$birth_2[i] <- as.Date(birth[i],"%d.%m.%Y") > ? ? ? ?}else{ > ? ? ? ? ? ?d$birth2[i] ?<- d$birth[id[i]==d$id & is.na(d$birth)==F],"%d.%m.%Y")[1] # if birth of observation (i) is missing, take a observation of another year > ? ? ? ?} > ? ?} > } > > Result: > > > id ? ? ?birth ? ? ? ? ? sex ? ? year of observation ? ? birth2 > 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2006 ? ? ? ? ? ? ? ? ? ?1976-01-01 > 1 ? ? ? 01.01.1976 ? ? ?male ? ?2007 ? ? ? ? ? ? ? ? ? ?1976-01-01 > 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2008 ? ? ? ? ? ? ? ? ? ?1976-01-01 > > unfortunately the data consists of over 20000 observations a year. > > Does anybody know a better way? > > Thanks > > Mit freundlichen Gr??en > > Andreas Kunzler > ____________________________ > Bundeszahn?rztekammer (BZ?K) > Chausseestra?e 13 > 10115 Berlin > > Tel.: 030 40005-113 > Fax: ?030 40005-119 > > E-Mail: a.kunzler at bzaek.de > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >