Hi, I know this can be done in Stata (which is quite messy) but I wanted to know if it can be done in R. So lets say I have a merged data set (I used the merge function by date for the attached two files), where all the missing values are filled with NAs (which is what the all.x=TRUE does). Is there any way to replace those NAs with the value of the latest row that contains a value? For example: > Date<-read.table("Desktop/R/Testdate.txt", head=T, sep="\t") > Data<-read.table("Desktop/R/Testinput.txt", head=T, sep="\t") > Merged<-merge(Date, Data, all.x=TRUE) > Merged Date France Germany 1 3/10/07 2 4 2 3/11/07 NA NA 3 3/12/07 NA NA 4 3/13/07 NA NA 5 3/14/07 NA NA 6 3/15/07 1 2 Given this Merged data, is there a way to replace every NA value from 3/11 to 3/14 with that of 3/15? But then say there are multiple intervals with NAs that I want to fill with the last given value? Technically, I know I could just manually edit the data, but the real files I need to work with has thousands of rows, so I was wondering if there was a way to do this (probably a loop?). Thanks a bunch. Jia Ying Mei -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Testdate.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080723/aa0e5d29/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Testinput.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080723/aa0e5d29/attachment-0001.txt>
Hello - Jia Ying Mei wrote:> Hi, > > I know this can be done in Stata (which is quite messy) but I wanted to > know if it can be done in R. So lets say I have a merged data set (I > used the merge function by date for the attached two files), where all > the missing values are filled with NAs (which is what the all.x=TRUE does). > > Is there any way to replace those NAs with the value of the latest row > that contains a value? > > For example: > > > Date<-read.table("Desktop/R/Testdate.txt", head=T, sep="\t") > > Data<-read.table("Desktop/R/Testinput.txt", head=T, sep="\t") > > Merged<-merge(Date, Data, all.x=TRUE) > > Merged > Date France Germany > 1 3/10/07 2 4 > 2 3/11/07 NA NA > 3 3/12/07 NA NA > 4 3/13/07 NA NA > 5 3/14/07 NA NA > 6 3/15/07 1 2 > > Given this Merged data, is there a way to replace every NA value from > 3/11 to 3/14 with that of 3/15? But then say there are multiple > intervals with NAs that I want to fill with the last given value? >Yes, no loop necessary. # begin R code install.packages("zoo") library(zoo) Merged$France <- na.locf(Merged$France, fromLast = TRUE) # end R code This will of course work on the 'France' column. Use of lapply in conjuction with this idea will lead to solving this problem for N columns in a couple lines of R. Not messy at all! Best Regards, Erik Iverson
This can be done readily with zoo. See zoo's 3 vignettes and its help files for commands used below. # setwd("...directory where we have write permission...") # can alternately manually download these two files download.file("https://stat.ethz.ch/pipermail/r-help/attachments/20080723/aa0e5d29/attachment.txt", "Testdate.txt", method = "wget") download.file("https://stat.ethz.ch/pipermail/r-help/attachments/20080723/aa0e5d29/attachment-0001.txt", "Testinput.txt", method = "wget") fmt <- "%m/%d/%y" dd <- read.csv("Testdate.txt") dd$Date <- as.Date(dd$Date, fmt) library(zoo) z <- read.zoo("Testinput.txt", header = TRUE, format = fmt) z2 <- merge(z, zoo(,dd$Date)) na.locf(z2, fromLast = TRUE) # alternately if its only needed to fill in the gaps in z # then we can eliminate Testdate.txt and calculate z2 # (call it z2a this time) directly from z as: rng <- range(time(z)) z2a <- merge(z, zoo(, seq(rng[1], rng[2], "day"))) # continue as previously 2008/7/23 Jia Ying Mei <jiamei at princeton.edu>:> Hi, > > I know this can be done in Stata (which is quite messy) but I wanted to know > if it can be done in R. So lets say I have a merged data set (I used the > merge function by date for the attached two files), where all the missing > values are filled with NAs (which is what the all.x=TRUE does). > > Is there any way to replace those NAs with the value of the latest row that > contains a value? > > For example: > >> Date<-read.table("Desktop/R/Testdate.txt", head=T, sep="\t") >> Data<-read.table("Desktop/R/Testinput.txt", head=T, sep="\t") >> Merged<-merge(Date, Data, all.x=TRUE) >> Merged > Date France Germany > 1 3/10/07 2 4 > 2 3/11/07 NA NA > 3 3/12/07 NA NA > 4 3/13/07 NA NA > 5 3/14/07 NA NA > 6 3/15/07 1 2 > > Given this Merged data, is there a way to replace every NA value from 3/11 > to 3/14 with that of 3/15? But then say there are multiple intervals with > NAs that I want to fill with the last given value? > > Technically, I know I could just manually edit the data, but the real files > I need to work with has thousands of rows, so I was wondering if there was a > way to do this (probably a loop?). Thanks a bunch. > > Jia Ying Mei > > Date > 3/15/07 > 3/14/07 > 3/13/07 > 3/12/07 > 3/11/07 > 3/10/07 > > Date France Germany > 3/15/07 1 2 > 3/10/07 2 4 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >