Paul Bernal
2017-Mar-27 14:15 UTC
[R] Handling nonexistent observations in R for time series analysis and forecasting
Dear friends, Hope you are all doing great. I am trying to model historical data on transits, and the dates are in the following format: 1985-10-01 00:00:00.000 (this would be october, 1985). The data comes from an SQL Server Database and there are several missing observations. The problem is that, for example, there are dates for which no transit was recorded (because no transit took place) and instead of having that date recorded with an NA value, that date does not appear, resulting in a sequence like this: 1985-01-01 00:00:00.000, 1985-02-01 00:00:00.000, 1985-05-01 00:00:00.00 in this example you start in january 1985, the february 1985, then the next available observation is on may 1985. I know R?s tsclean(data) function takes care of missing values, but that only works if you at least have the non available dates recorded with a value of NA, but what if I do not have those missing observations? Any help will be greatly appreciated, Best regards, Paul [[alternative HTML version deleted]]
Bert Gunter
2017-Mar-27 14:33 UTC
[R] Handling nonexistent observations in R for time series analysis and forecasting
A statistics, not really an R programming question, so I believe OT here. But: 1. See the CRAN Time series task view for what's available: https://cran.r-project.org/web/views/TimeSeries.html 2. stats.stackexchange.com is a good site for statistical questions. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Mar 27, 2017 at 7:15 AM, Paul Bernal <paulbernal07 at gmail.com> wrote:> Dear friends, > > Hope you are all doing great. I am trying to model historical data on > transits, and the dates are in the following format: 1985-10-01 > 00:00:00.000 (this would be october, 1985). > The data comes from an SQL Server Database and there are several missing > observations. The problem is that, for example, there are dates for which > no transit was recorded (because no transit took place) and instead of > having that date recorded with an NA value, that date does not appear, > resulting in a sequence like this: > 1985-01-01 00:00:00.000, 1985-02-01 00:00:00.000, 1985-05-01 00:00:00.00 > in this example you start in january 1985, the february 1985, then the next > available observation is on may 1985. > I know R?s tsclean(data) function takes care of missing values, but that > only works if you at least have the non available dates recorded with a > value of NA, but what if I do not have those missing observations? > > Any help will be greatly appreciated, > > Best regards, > > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2017-Mar-27 16:55 UTC
[R] Handling nonexistent observations in R for time series analysis and forecasting
Actually, I think his question is about R because one answer that has been mentioned is to use the merge function, but I haven't felt the urge to create a reprex for him (see Posting Guide) and he keeps posting in HTML so it would have been corrupted even if he had. Someone else also pointed out that there is an option to use irregular time series analysis. -- Sent from my phone. Please excuse my brevity. On March 27, 2017 7:33:29 AM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:>A statistics, not really an R programming question, so I believe OT >here. >But: > >1. See the CRAN Time series task view for what's available: >https://cran.r-project.org/web/views/TimeSeries.html > >2. stats.stackexchange.com is a good site for statistical questions. > > >Cheers, >Bert > > >Bert Gunter > >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >On Mon, Mar 27, 2017 at 7:15 AM, Paul Bernal <paulbernal07 at gmail.com> >wrote: >> Dear friends, >> >> Hope you are all doing great. I am trying to model historical data on >> transits, and the dates are in the following format: 1985-10-01 >> 00:00:00.000 (this would be october, 1985). >> The data comes from an SQL Server Database and there are several >missing >> observations. The problem is that, for example, there are dates for >which >> no transit was recorded (because no transit took place) and instead >of >> having that date recorded with an NA value, that date does not >appear, >> resulting in a sequence like this: >> 1985-01-01 00:00:00.000, 1985-02-01 00:00:00.000, 1985-05-01 >00:00:00.00 >> in this example you start in january 1985, the february 1985, then >the next >> available observation is on may 1985. >> I know R?s tsclean(data) function takes care of missing values, but >that >> only works if you at least have the non available dates recorded with >a >> value of NA, but what if I do not have those missing observations? >> >> Any help will be greatly appreciated, >> >> Best regards, >> >> Paul >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2017-Mar-27 22:51 UTC
[R] Handling nonexistent observations in R for time series analysis and forecasting
> On Mar 27, 2017, at 7:15 AM, Paul Bernal <paulbernal07 at gmail.com> wrote: > > Dear friends, > > Hope you are all doing great. I am trying to model historical data on > transits, and the dates are in the following format: 1985-10-01 > 00:00:00.000 (this would be october, 1985). > The data comes from an SQL Server Database and there are several missing > observations. The problem is that, for example, there are dates for which > no transit was recorded (because no transit took place) and instead of > having that date recorded with an NA value, that date does not appear, > resulting in a sequence like this: > 1985-01-01 00:00:00.000, 1985-02-01 00:00:00.000, 1985-05-01 00:00:00.00 > in this example you start in january 1985, the february 1985, then the next > available observation is on may 1985. > I know R?s tsclean(data) function takes care of missing values, but that > only works if you at least have the non available dates recorded with a > value of NA, but what if I do not have those missing observations? > > Any help will be greatly appreciated,And the other readers of this ist will greatly appreciate a working example and plain text postings. Assuming you have these date-times in a dataframe named dat within a column named `time`:> merge(x=data.frame(time=seq(min(dat$time), max(dat$time), by="month")), y=dat,all.x=TRUE, by.y='time')time X.placeholder. 1 1985-01-01 placeholder 2 1985-02-01 placeholder 3 1985-03-01 <NA> 4 1985-04-01 <NA> 5 1985-05-01 placeholder> > Best regards, > > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA