Hi everybody, I'm a new R french user. Sorry if my english is not perfect. Hope you'll understand my problem ;) I have to work on temperature data (35000 lines in one file) containing some missing data (N/A). Sometimes I have only 2 or 3 N/A following each other, but I have also sometimes 100 or 200 N/A following each other. Here's an example of my data, when I have only small gaps of missing data (2 or 3 N/A): 09/01/2008 12:00 2 1.93 2.93 4.56 5.43 09/01/2008 12:15 2 *3.93* 3.25 4.93 5.56 09/01/2008 12:30 2 NA 3.5 5.06 5.56 09/01/2008 12:45 2 NA 3.68 5.25 5.68 09/01/2008 13:00 2 *4.93 * 3.87 5.56 5.93 09/01/2008 13:15 2 5.93 4.25 5.75 6.06 09/01/2008 13:30 2 3.93 4.56 5.93 6.18 My question is: how can I replace these small gaps of N/A by numeric values? I would like a fonction which only replace the small gaps (2 or 3 N/A) in my data, but not the big gaps (more than 5 N/A following each other). For the moment, i'm trying to do it by working with the time gap between the 2 numeric values surrounding the N/A as following: imputation <- function(x){ met = NULL temp <- met[1] <- x[1] ind_temp <- 1 tps <- time(x) for (i in 2:(length(x)) ){ if((tps[i]-tps[ind_temp] > 1)&(tps[i]-tps[ind_temp] <4)&(is.na(x[i]))){ met[i] <- na.approx(x) } else { temp <- met[i] <- x[i] ind_temp <- i } } return(met) } In this example, I would like to apply the function: na.approx(x) on my N/A, but only when I have maximum 4 N/A following each other. There's no error, but it doesn't work (it was working in the other way, when I had to detect aberrant data and replace it by N/A, but not now). It is maybe not the good way to solve this problem. I don't have a lot of experience in R. Maybe there is an easier way to do it... Does somebody have an idea about it for helping me? Thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4528184.html Sent from the R help mailing list archive at Nabble.com.
It seems like you could benefit from using a zoo [time series] object to hold your data -- then you have a variety of NA filling functions which work for arbitrarily long gaps. E.g., library(zoo) x <- zoo(1:100, Sys.Date() + 1:100) x[2:60] <- NA # Most of these look the same because the data is simple: will give different results for more complicated examples na.approx(x) na.locf(x) na.spline(x) na.aggregate(x) na.fill # Takes more arguments Hope this helps, Michael On Tue, Apr 3, 2012 at 4:52 AM, jeff6868 <geoffrey_klein at etu.u-bourgogne.fr> wrote:> Hi everybody, > > I'm a new R french user. Sorry if my english is not perfect. Hope you'll > understand my problem ;) > > I have to work on temperature data (35000 lines in one file) containing some > missing data (N/A). Sometimes I have only 2 or 3 N/A following each other, > but I have also sometimes 100 or 200 N/A following each other. Here's an > example of my data, when I have only small gaps of missing data (2 or 3 > N/A): > > 09/01/2008 12:00 ? 2 ? 1.93 ? 2.93 ? 4.56 ? 5.43 > 09/01/2008 12:15 ? 2 ? *3.93* ? 3.25 ? 4.93 ? 5.56 > 09/01/2008 12:30 ? 2 ? ?NA ? 3.5 ? 5.06 ? 5.56 > 09/01/2008 12:45 ? 2 ? ?NA ? 3.68 5.25 ? 5.68 > 09/01/2008 13:00 ? 2 ? *4.93 * ?3.87 ? 5.56 ? 5.93 > 09/01/2008 13:15 ? 2 ? 5.93 ? 4.25 ? 5.75 ? 6.06 > 09/01/2008 13:30 ? 2 ? 3.93 ? 4.56 ? 5.93 ? 6.18 > > My question is: how can I replace these small gaps of N/A by numeric values? > I would like a fonction which only replace the small gaps (2 or 3 N/A) in my > data, but not the big gaps (more than 5 N/A following each other). > > For the moment, i'm trying to do it by working with the time gap between the > 2 numeric values surrounding the N/A as following: > > imputation <- function(x){ > ? ?met = NULL > > ? ?temp <- met[1] <- x[1] > > ? ?ind_temp <- 1 > > ? ?tps <- time(x) > > ? ?for (i in 2:(length(x)) ){ > ? ?if((tps[i]-tps[ind_temp] > 1)&(tps[i]-tps[ind_temp] <> 4)&(is.na(x[i]))){ > ? ?met[i] <- na.approx(x) > ? ?} > ? ?else { > ? ?temp <- met[i] <- x[i] > ? ?ind_temp <- i > ? ?} > ? ?} > > ? ?return(met) > ? ?} > > In this example, I would like to apply the function: na.approx(x) on my N/A, > but only when I have maximum 4 N/A following each other. > There's no error, but it doesn't work (it was working in the other way, when > I had to detect aberrant data and replace it by N/A, but not now). It is > maybe not the good way to solve this problem. I don't have a lot of > experience in R. Maybe there is an easier way to do it... > Does somebody have an idea about it for helping me? > Thanks a lot! > > > -- > View this message in context: http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4528184.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Tue, Apr 3, 2012 at 4:52 AM, jeff6868 <geoffrey_klein at etu.u-bourgogne.fr> wrote:> Hi everybody, > > I'm a new R french user. Sorry if my english is not perfect. Hope you'll > understand my problem ;) > > I have to work on temperature data (35000 lines in one file) containing some > missing data (N/A). Sometimes I have only 2 or 3 N/A following each other, > but I have also sometimes 100 or 200 N/A following each other. Here's an > example of my data, when I have only small gaps of missing data (2 or 3 > N/A): > > 09/01/2008 12:00 ? 2 ? 1.93 ? 2.93 ? 4.56 ? 5.43 > 09/01/2008 12:15 ? 2 ? *3.93* ? 3.25 ? 4.93 ? 5.56 > 09/01/2008 12:30 ? 2 ? ?NA ? 3.5 ? 5.06 ? 5.56 > 09/01/2008 12:45 ? 2 ? ?NA ? 3.68 5.25 ? 5.68 > 09/01/2008 13:00 ? 2 ? *4.93 * ?3.87 ? 5.56 ? 5.93 > 09/01/2008 13:15 ? 2 ? 5.93 ? 4.25 ? 5.75 ? 6.06 > 09/01/2008 13:30 ? 2 ? 3.93 ? 4.56 ? 5.93 ? 6.18 > > My question is: how can I replace these small gaps of N/A by numeric values? > I would like a fonction which only replace the small gaps (2 or 3 N/A) in my > data, but not the big gaps (more than 5 N/A following each other). >Try na.locf, na.approx or na.spline in the zoo package noting the maxgap= argument on each. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Wow, thank you for all your answers. You were completely right michael. Well, it's my fault. I didn't understood your 2nd reply, when you were talking about arguments for larger gaps. I thought it was for deleting big gaps too. I apologize. It was too easy in fact. I also didn't noticed the argument "maxgap" of the function. Finally, it works perfectly only with this: require(zoo) imputation <- function(x){ met <- na.approx(x, maxgap = 4) return(met) } data <- myts[,2:5] myts[,2:5]<-apply(data,2,imputation) Sorry for my stupidity. I'll try to be more careful next time, for such small problems (when I was thinking it would be a big one) ;). Well, thank you very much michael and the other repliers, and thank you for having spared a bit of your time for me! -- View this message in context: http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4531224.html Sent from the R help mailing list archive at Nabble.com.
Maybe Matching Threads
- stop calculation in a function
- help for segmented package
- package zoo, function na.spline with option maxgap -> Error: attempt to apply non-function?
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- lme, nlsList, nlsList.selfStart