thr3ads.net - R help - [R] how to convert "sloppy data" into a time series? [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Mike Williamson

2010-Dec-17 01:35 UTC

[R] how to convert "sloppy data" into a time series?

Hi All,

    First let me state that I did search for a while on r-help, google, and
using the "sos" package inside of 'R', without much luck.  I
want to know
how to create a univariate time series from a set of data that will have
huge time gaps in it.  For instance, here is a snapshot of a piece of data
that I would like to analyze:

*Row             queued_time       processTime
50  2010-06-15 21:50:42.443 6.399989e-02 secs
63  2010-06-15 21:51:57.347 6.300020e-02 secs
156 2010-06-29 14:53:26.073 3.011863e+06 secs
175 2010-07-22 10:14:57.503 4.334879e+06 secs
278 2010-08-05 11:29:56.713 6.155674e+06 secs
509 2010-08-05 11:29:57.443 3.120779e+06 secs
531 2010-08-05 11:29:57.543 3.120779e+06 secs
555 2010-08-05 11:29:57.647 3.120779e+06 secs
190 2010-08-05 11:29:57.943 3.120778e+06 secs
230 2010-08-05 11:29:58.047 3.120778e+06 secs
211 2010-08-05 11:29:58.917 3.120777e+06 secs
251 2010-08-05 11:29:59.077 3.120777e+06 secs
298 2010-08-05 11:29:59.297 3.120777e+06 secs
320 2010-08-05 11:29:59.397 3.120777e+06 secs
366 2010-08-05 11:29:59.707 3.120777e+06 secs
342 2010-08-05 11:30:00.987 3.120775e+06 secs
380 2010-08-05 11:30:01.200 3.120775e+06 secs
120 2010-08-19 09:31:47.207 2.358866e+06 secs
141 2010-08-19 09:31:47.500 2.358866e+06 secs
842 2010-09-03 13:58:21.463 3.641194e+06 secs
*
    I would like to be able to take the second column, the
"processTime",
and put it into a time series using the first column as the key to say when
it occurred.  But everything I could find, such as ts(), went on the
assumption that I had fully univariate data to start with, and all I needed
to do was set the frequency & start date (in the case of ts() ).
    I can adjust the "queued time" arbitrarily as needed, so that if,
for
instance, the data set would end up far too sparse & empty by keeping the
current precision, I could cut the "queued_time" precision down to
just the
year, month, day, hour.  But in that case, how would the time series handle
the fact that there are several (varying) entries with the same value
stored.

    The reason I want to do this is because I next want to be able to use
all the very nice modeling capabilities that a univariate time series
allows, such as arima, etc.

                                                Thanks in advance!
                                                            Mike






"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

	[[alternative HTML version deleted]]

David Winsemius

2010-Dec-17 03:57 UTC

head link

[R] how to convert "sloppy data" into a time series?

On Dec 16, 2010, at 8:35 PM, Mike Williamson wrote:
> Hi All,
>
>    First let me state that I did search for a while on r-help,  
> google, and
> using the "sos" package inside of 'R', without much luck.
I want to
> know
> how to create a univariate time series from a set of data that will  
> have
> huge time gaps in it.  For instance, here is a snapshot of a piece  
> of data
> that I would like to analyze:
>
> *Row             queued_time       processTime
> 50  2010-06-15 21:50:42.443 6.399989e-02 secs
> 63  2010-06-15 21:51:57.347 6.300020e-02 secs
> 156 2010-06-29 14:53:26.073 3.011863e+06 secs
> 175 2010-07-22 10:14:57.503 4.334879e+06 secs
> 278 2010-08-05 11:29:56.713 6.155674e+06 secs
> 509 2010-08-05 11:29:57.443 3.120779e+06 secs
> 531 2010-08-05 11:29:57.543 3.120779e+06 secs
> 555 2010-08-05 11:29:57.647 3.120779e+06 secs
> 190 2010-08-05 11:29:57.943 3.120778e+06 secs
> 230 2010-08-05 11:29:58.047 3.120778e+06 secs
> 211 2010-08-05 11:29:58.917 3.120777e+06 secs
> 251 2010-08-05 11:29:59.077 3.120777e+06 secs
> 298 2010-08-05 11:29:59.297 3.120777e+06 secs
> 320 2010-08-05 11:29:59.397 3.120777e+06 secs
> 366 2010-08-05 11:29:59.707 3.120777e+06 secs
> 342 2010-08-05 11:30:00.987 3.120775e+06 secs
> 380 2010-08-05 11:30:01.200 3.120775e+06 secs
> 120 2010-08-19 09:31:47.207 2.358866e+06 secs
> 141 2010-08-19 09:31:47.500 2.358866e+06 secs
> 842 2010-09-03 13:58:21.463 3.641194e+06 secs
> *
>    I would like to be able to take the second column, the  
> "processTime",
> and put it into a time series using the first column as the key to  
> say when
> it occurred.  But everything I could find, such as ts(), went on the
> assumption that I had fully univariate data to start with, and all I  
> needed
> to do was set the frequency & start date (in the case of ts() ).
>    I can adjust the "queued time" arbitrarily as needed, so that
if,
> for
> instance, the data set would end up far too sparse & empty by  
> keeping the
> current precision, I could cut the "queued_time" precision down
to
> just the
> year, month, day, hour.  But in that case, how would the time series  
> handle
> the fact that there are several (varying) entries with the same value
> stored.
>
>    The reason I want to do this is because I next want to be able to  
> use
> all the very nice modeling capabilities that a univariate time series
> allows, such as arima, etc.
>
		Information on package 'its'

Description:

Package:            its
Version:            1.1.8
Date:               2009-09-06
Title:              Irregular Time Series
Author:             Portfolio & Risk Advisory Group, Commerzbank
                     Securities
Maintainer:         Whit Armstrong <armstrong.whit at gmail.com>

-- 
David/>                                                Thanks in advance!
>                                                            Mike
>
>
>
>
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space  
> maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
>  -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Dennis Murphy

2010-Dec-17 13:57 UTC

head link

[R] how to convert "sloppy data" into a time series?

Hi:

As you mentioned at the outset, you have a very irregular time series, to
which David has given you one reasonable suggestion; perhaps another is the
zoo package. Those are the standard R packages to deal with irregular time
series. There may be others of which I am unaware, though - there may be
something in the Rmetrics suite that pertains, for example. Check the Time
Series view at CRAN for possible alternatives:
http://cran.r-project.org/web/views/

ARIMA modeling, OTOH, assumes that the data are equally spaced and
stationary (perhaps after suitable differencing or detrending).
Consequently, I think you may need to rethink your strategy for modeling
these data. One possibility is to aggregate the data appropriately, but
you're the one who has to decide what would be the appropriate interval of
aggregation and what difficulties might ensue (e.g., unequal sample sizes
per time interval). This is not a simple problem, and the best strategy may
be to start with description and gradually work your way to a reasonable,
scientifically plausible model.

A sensible question to ask is: what is the largest time unit I can use
without losing vital information? That might be a place to start...

Trying to model a time series with very large time gaps is a little like
having several stills from a movie and trying to reconstruct the movie and
its plot without having seen it beforehand. You'll need to use every bit of
knowledge you have about the underlying process to aid in the analysis.

HTH,
Dennis

On Thu, Dec 16, 2010 at 5:35 PM, Mike Williamson
<this.is.mvw@gmail.com>wrote:
> Hi All,
>
>    First let me state that I did search for a while on r-help, google, and
> using the "sos" package inside of 'R', without much luck.
I want to know
> how to create a univariate time series from a set of data that will have
> huge time gaps in it.  For instance, here is a snapshot of a piece of data
> that I would like to analyze:
>
> *Row             queued_time       processTime
> 50  2010-06-15 21:50:42.443 6.399989e-02 secs
> 63  2010-06-15 21:51:57.347 6.300020e-02 secs
> 156 2010-06-29 14:53:26.073 3.011863e+06 secs
> 175 2010-07-22 10:14:57.503 4.334879e+06 secs
> 278 2010-08-05 11:29:56.713 6.155674e+06 secs
> 509 2010-08-05 11:29:57.443 3.120779e+06 secs
> 531 2010-08-05 11:29:57.543 3.120779e+06 secs
> 555 2010-08-05 11:29:57.647 3.120779e+06 secs
> 190 2010-08-05 11:29:57.943 3.120778e+06 secs
> 230 2010-08-05 11:29:58.047 3.120778e+06 secs
> 211 2010-08-05 11:29:58.917 3.120777e+06 secs
> 251 2010-08-05 11:29:59.077 3.120777e+06 secs
> 298 2010-08-05 11:29:59.297 3.120777e+06 secs
> 320 2010-08-05 11:29:59.397 3.120777e+06 secs
> 366 2010-08-05 11:29:59.707 3.120777e+06 secs
> 342 2010-08-05 11:30:00.987 3.120775e+06 secs
> 380 2010-08-05 11:30:01.200 3.120775e+06 secs
> 120 2010-08-19 09:31:47.207 2.358866e+06 secs
> 141 2010-08-19 09:31:47.500 2.358866e+06 secs
> 842 2010-09-03 13:58:21.463 3.641194e+06 secs
> *
>    I would like to be able to take the second column, the
"processTime",
> and put it into a time series using the first column as the key to say when
> it occurred.  But everything I could find, such as ts(), went on the
> assumption that I had fully univariate data to start with, and all I needed
> to do was set the frequency & start date (in the case of ts() ).
>    I can adjust the "queued time" arbitrarily as needed, so that
if, for
> instance, the data set would end up far too sparse & empty by keeping
the
> current precision, I could cut the "queued_time" precision down
to just the
> year, month, day, hour.  But in that case, how would the time series handle
> the fact that there are several (varying) entries with the same value
> stored.
>
>    The reason I want to do this is because I next want to be able to use
> all the very nice modeling capabilities that a univariate time series
> allows, such as arima, etc.
>
>                                                Thanks in advance!
>                                                            Mike
>
>
>
>
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
>  -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Dec 2010 - how to convert "sloppy data" into a time series?

[R] how to convert "sloppy data" into a time series?

[R] how to convert "sloppy data" into a time series?

[R] how to convert "sloppy data" into a time series?