Hello, I have a data frame as below ... in cases where I have N.A. I want to use an average of the past date and next date .. any help? 13/10/2010 A 23 13/10/2010 B 12 13/10/2010 C 124 14/10/2010 A 43 14/10/2010 B 54 14/10/2010 C 65 15/10/2010 A 43 15/10/2010 B N.A. 15/10/2010 C 65 ---------------------------------------------------------------------------- -------------------------- Thanks R-Helpers.
If I understand you can use approxfun: DF <- read.table(textConnection(" 13/10/2010 A 23 13/10/2010 B 12 13/10/2010 C 124 14/10/2010 A 43 14/10/2010 B 54 14/10/2010 C 65 15/10/2010 A 43 15/10/2010 B N.A. 15/10/2010 C 65"), na.strings = "N.A.") f <- approxfun(1:nrow(DF), DF$V3) DF$V4 <- sapply(seq(nrow(DF)), f) On Thu, Oct 14, 2010 at 5:17 AM, Santosh Srinivas < santosh.srinivas@gmail.com> wrote:> Hello, I have a data frame as below ... in cases where I have N.A. I want > to use an average of the past date and next date .. any help? > > 13/10/2010 A 23 > 13/10/2010 B 12 > 13/10/2010 C 124 > 14/10/2010 A 43 > 14/10/2010 B 54 > 14/10/2010 C 65 > 15/10/2010 A 43 > 15/10/2010 B N.A. > 15/10/2010 C 65 > > > ---------------------------------------------------------------------------- > -------------------------- > Thanks R-Helpers. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
On Thu, Oct 14, 2010 at 4:17 AM, Santosh Srinivas <santosh.srinivas at gmail.com> wrote:> Hello, ?I have a data frame as below ... in cases where I have N.A. I want > to use an average of the past date and next date .. any help? > > 13/10/2010 ? ? ?A ? ? ? 23 > 13/10/2010 ? ? ?B ? ? ? 12 > 13/10/2010 ? ? ?C ? ? ? 124 > 14/10/2010 ? ? ?A ? ? ? 43 > 14/10/2010 ? ? ?B ? ? ? 54 > 14/10/2010 ? ? ?C ? ? ? 65 > 15/10/2010 ? ? ?A ? ? ? 43 > 15/10/2010 ? ? ?B ? ? ? N.A. > 15/10/2010 ? ? ?C ? ? ? 65Assuming A, B and C refer to separate time series you can use na.approx in zoo. Lines <- "13/10/2010 A 23 13/10/2010 B 12 13/10/2010 C 124 14/10/2010 A 43 14/10/2010 B 54 14/10/2010 C 65 15/10/2010 A 43 15/10/2010 B N.A. 15/10/2010 C 65" library(zoo) # z <- read.zoo("myfile.dat", format = "%d/%m/%Y", split = 2, na.strings = "N.A.") z <- read.zoo(textConnection(Lines), format = "%d/%m/%Y", split = 2, na.strings = "N.A.") na.approx(z) # or na.approx(z, rule = 2) which gives this multivariate time series in zoo:> na.approx(z)A B C 2010-10-13 23 12 124 2010-10-14 43 54 65 2010-10-15 43 NA 65> # or > na.approx(z, rule = 2)A B C 2010-10-13 23 12 124 2010-10-14 43 54 65 2010-10-15 43 54 65 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Wow! That?s Amazing! Many thanks! When I do the below ... why do the column names get thrown off? Ticker is a factor / character ... I tried both> temp <- head(MF_Data_Sub) > tempDate Ticker Price 1 2008-04-01 106270 10.3287 2 2008-04-01 106269 10.3287 3 2008-04-01 102767 12.6832 4 2008-04-01 102766 10.5396 5 2008-04-01 102855 9.7833 6 2008-04-01 102856 12.1485> tZoo <- read.zoo(temp,split=2) > tZooX102766 X102767 X102855 X102856 X106269 X106270 2008-04-01 10.5396 12.6832 9.7833 12.1485 10.3287 10.3287 Also, is there an easy way to do a return profile on the data below after it is transformed? Thanks very much! S -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] Sent: 14 October 2010 18:22 To: Santosh Srinivas Cc: r-help Subject: Re: [R] Replacing N.A values in a data frame On Thu, Oct 14, 2010 at 4:17 AM, Santosh Srinivas <santosh.srinivas at gmail.com> wrote:> Hello, ?I have a data frame as below ... in cases where I have N.A. I want > to use an average of the past date and next date .. any help? > > 13/10/2010 ? ? ?A ? ? ? 23 > 13/10/2010 ? ? ?B ? ? ? 12 > 13/10/2010 ? ? ?C ? ? ? 124 > 14/10/2010 ? ? ?A ? ? ? 43 > 14/10/2010 ? ? ?B ? ? ? 54 > 14/10/2010 ? ? ?C ? ? ? 65 > 15/10/2010 ? ? ?A ? ? ? 43 > 15/10/2010 ? ? ?B ? ? ? N.A. > 15/10/2010 ? ? ?C ? ? ? 65Assuming A, B and C refer to separate time series you can use na.approx in zoo. Lines <- "13/10/2010 A 23 13/10/2010 B 12 13/10/2010 C 124 14/10/2010 A 43 14/10/2010 B 54 14/10/2010 C 65 15/10/2010 A 43 15/10/2010 B N.A. 15/10/2010 C 65" library(zoo) # z <- read.zoo("myfile.dat", format = "%d/%m/%Y", split = 2, na.strings = "N.A.") z <- read.zoo(textConnection(Lines), format = "%d/%m/%Y", split = 2, na.strings = "N.A.") na.approx(z) # or na.approx(z, rule = 2) which gives this multivariate time series in zoo:> na.approx(z)A B C 2010-10-13 23 12 124 2010-10-14 43 54 65 2010-10-15 43 NA 65> # or > na.approx(z, rule = 2)A B C 2010-10-13 23 12 124 2010-10-14 43 54 65 2010-10-15 43 54 65 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Thu, Oct 14, 2010 at 9:59 AM, Santosh Srinivas <santosh.srinivas at gmail.com> wrote:> Wow! That?s Amazing! Many thanks! > > When I do the below ... why do the column names get thrown off? Ticker is a > factor / character ... I tried both > >> temp <- head(MF_Data_Sub) >> temp > ? ? ? ?Date Ticker ? Price > 1 2008-04-01 106270 10.3287 > 2 2008-04-01 106269 10.3287 > 3 2008-04-01 102767 12.6832 > 4 2008-04-01 102766 10.5396 > 5 2008-04-01 102855 ?9.7833 > 6 2008-04-01 102856 12.1485 >> tZoo <- read.zoo(temp,split=2) >> tZoo > ? ? ? ? ? X102766 X102767 X102855 X102856 X106269 X106270 > 2008-04-01 10.5396 12.6832 ?9.7833 12.1485 10.3287 10.3287It automatically makes the names valid variable names for R as does data.frame in R. If you do not want that behavior add the check.names = FALSE argument to read.zoo .> Also, is there an easy way to do a return profile on the data below after it > is transformed? >What is a "profile on the data"? These all work: str(z); summary(z); View(z) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Thu, Oct 14, 2010 at 6:59 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> On Thu, Oct 14, 2010 at 9:59 AM, Santosh Srinivas > <santosh.srinivas at gmail.com> wrote: >> Wow! That?s Amazing! Many thanks! >> >> When I do the below ... why do the column names get thrown off? Ticker is a >> factor / character ... I tried both >> >>> temp <- head(MF_Data_Sub) >>> temp >> ? ? ? ?Date Ticker ? Price >> 1 2008-04-01 106270 10.3287 >> 2 2008-04-01 106269 10.3287 >> 3 2008-04-01 102767 12.6832 >> 4 2008-04-01 102766 10.5396 >> 5 2008-04-01 102855 ?9.7833 >> 6 2008-04-01 102856 12.1485 >>> tZoo <- read.zoo(temp,split=2) >>> tZoo >> ? ? ? ? ? X102766 X102767 X102855 X102856 X106269 X106270 >> 2008-04-01 10.5396 12.6832 ?9.7833 12.1485 10.3287 10.3287 > > It automatically makes the names valid variable names for R as does > data.frame in R. ?If you do not want that behavior add the check.names > = FALSE argument to read.zoo . > >> Also, is there an easy way to do a return profile on the data below after it >> is transformed? >> > > What is a "profile on the data"? ? These all work: str(z); summary(z); View(z)Based on offline discussion what was wanted was the returns, i.e. diff(log(z)) Also for other manipulations of financial data see the xts, quantmod and PerformanceAnalytics packages. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com