thr3ads.net - R help - [R] Classifying time series by shape over time [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Andreas Neumann

2006-Mar-21 16:08 UTC

[R] Classifying time series by shape over time

Dear all,

I have hundreds of thousands of univariate time series of the form:
character "seriesid", vector of Date, vector of integer
(some exemplary data is at the end of the mail)

I am trying to find the ones which somehow "have a shape" over time
that
looks like the histogramm of a (skewed) normal
distribution:>  hist(rnorm(200,10,2))The "mean" is not interesting, i.e. it does not matter if the first
nonzero observation happens in the 2. or the 40. month of observation.
So all that matters is: They should start sometime, the hits per month
increase, at some point they decrease and then they more or less
disappear.

Short Example (hits at consecutive months (Dates omitted)):
1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0                -> Good
2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4      -> Bad

Series 1 would be an ideal case of what I am looking for.

Graphical inspection would be easy but is not an option due to the huge
amount of series.

Questions:

1. Which (if at all) of the many packages that handle time series is
appropriate for my problem?

2. Which general approach seems to be the most straightforward and best
supported by R?
- Is there a way to test the time series directly (preferably)?
- Or do I need to "type-cast" them as some kind of histogram
  data and then test against the pdf of e.g. a normal distribution (but
  how)?
- Or something totally different?


Thank you for your time,

     Andreas Neumann




Data Examples (id1 is good, id2 is bad):
> id1        dates       hits
1  2004-12-01         3
2  2005-01-01         4
3  2005-02-01        10
4  2005-03-01         6
5  2005-04-01        35
6  2005-05-01        14
7  2005-06-01        33
8  2005-07-01        13
9  2005-08-01         3
10 2005-09-01         9
11 2005-10-01         8
12 2005-11-01         4
13 2005-12-01         3

> id2        dates       hits
1  2001-01-01         6
2  2001-02-01         5
3  2001-03-01         5
4  2001-04-01         6
5  2001-05-01         2
6  2001-06-01         5
7  2001-07-01         1
8  2001-08-01         6
9  2001-09-01         4
10 2001-10-01        10
11 2001-11-01         0
12 2001-12-01         3
13 2002-01-01         6
14 2002-02-01         5
15 2002-03-01         1
16 2002-04-01         2
17 2002-05-01         4
18 2002-06-01         4
19 2002-07-01         0
20 2002-08-01         1
21 2002-09-01         0
22 2002-10-01         2
23 2002-11-01         2
24 2002-12-01         2
25 2003-01-01         2
26 2003-02-01         3
27 2003-03-01         7

Gabor Grothendieck

2006-Mar-21 16:41 UTC

head link

[R] Classifying time series by shape over time

If its good enough just to examine the number of strictly positive runs then

sum(rle(sign(id1$hits))$values == 1)

will give 1 in the good case (one run) and > 1 in the bad case (multiple
runs).

On 3/21/06, Andreas Neumann <Andreas.Neumann at em.uni-karlsruhe.de>
wrote:> Dear all,
>
> I have hundreds of thousands of univariate time series of the form:
> character "seriesid", vector of Date, vector of integer
> (some exemplary data is at the end of the mail)
>
> I am trying to find the ones which somehow "have a shape" over
time that
> looks like the histogramm of a (skewed) normal distribution:
> >  hist(rnorm(200,10,2))
> The "mean" is not interesting, i.e. it does not matter if the
first
> nonzero observation happens in the 2. or the 40. month of observation.
> So all that matters is: They should start sometime, the hits per month
> increase, at some point they decrease and then they more or less
> disappear.
>
> Short Example (hits at consecutive months (Dates omitted)):
> 1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0                -> Good
> 2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4      -> Bad
>
> Series 1 would be an ideal case of what I am looking for.
>
> Graphical inspection would be easy but is not an option due to the huge
> amount of series.
>
> Questions:
>
> 1. Which (if at all) of the many packages that handle time series is
> appropriate for my problem?
>
> 2. Which general approach seems to be the most straightforward and best
> supported by R?
> - Is there a way to test the time series directly (preferably)?
> - Or do I need to "type-cast" them as some kind of histogram
>  data and then test against the pdf of e.g. a normal distribution (but
>  how)?
> - Or something totally different?
>
>
> Thank you for your time,
>
>     Andreas Neumann
>
>
>
>
> Data Examples (id1 is good, id2 is bad):
>
> > id1
>        dates       hits
> 1  2004-12-01         3
> 2  2005-01-01         4
> 3  2005-02-01        10
> 4  2005-03-01         6
> 5  2005-04-01        35
> 6  2005-05-01        14
> 7  2005-06-01        33
> 8  2005-07-01        13
> 9  2005-08-01         3
> 10 2005-09-01         9
> 11 2005-10-01         8
> 12 2005-11-01         4
> 13 2005-12-01         3
>
>
> > id2
>        dates       hits
> 1  2001-01-01         6
> 2  2001-02-01         5
> 3  2001-03-01         5
> 4  2001-04-01         6
> 5  2001-05-01         2
> 6  2001-06-01         5
> 7  2001-07-01         1
> 8  2001-08-01         6
> 9  2001-09-01         4
> 10 2001-10-01        10
> 11 2001-11-01         0
> 12 2001-12-01         3
> 13 2002-01-01         6
> 14 2002-02-01         5
> 15 2002-03-01         1
> 16 2002-04-01         2
> 17 2002-05-01         4
> 18 2002-06-01         4
> 19 2002-07-01         0
> 20 2002-08-01         1
> 21 2002-09-01         0
> 22 2002-10-01         2
> 23 2002-11-01         2
> 24 2002-12-01         2
> 25 2003-01-01         2
> 26 2003-02-01         3
> 27 2003-03-01         7
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Kjetil Brinchmann Halvorsen

2006-Mar-21 19:09 UTC

head link

[R] Classifying time series by shape over time

Andreas Neumann wrote:> Dear all,
> 
> I have hundreds of thousands of univariate time series of the form:
> character "seriesid", vector of Date, vector of integer
> (some exemplary data is at the end of the mail)
> 
> I am trying to find the ones which somehow "have a shape" over
time that
> looks like the histogramm of a (skewed) normal distribution:
>>  hist(rnorm(200,10,2))
> The "mean" is not interesting, i.e. it does not matter if the
first
> nonzero observation happens in the 2. or the 40. month of observation.
> So all that matters is: They should start sometime, the hits per month
> increase, at some point they decrease and then they more or less
> disappear.
> 
> Short Example (hits at consecutive months (Dates omitted)):
> 1. series: 0 0 0 2 5 8 20 42 30 19 6 1 0 0 0                -> Good
> 2. series: 0 3 8 9 20 6 0 3 25 67 7 1 0 4 60 20 10 0 4      -> Bad
> 
> Series 1 would be an ideal case of what I am looking for.
> 
> Graphical inspection would be easy but is not an option due to the huge
> amount of series.
> 
Does function turnpoints)= in package pastecs help_

Kjetil
> Questions:
> 
> 1. Which (if at all) of the many packages that handle time series is
> appropriate for my problem?
> 
> 2. Which general approach seems to be the most straightforward and best
> supported by R?
> - Is there a way to test the time series directly (preferably)?
> - Or do I need to "type-cast" them as some kind of histogram
>   data and then test against the pdf of e.g. a normal distribution (but
>   how)?
> - Or something totally different?
> 
> 
> Thank you for your time,
> 
>      Andreas Neumann
> 
> 
> 
> 
> Data Examples (id1 is good, id2 is bad):
> 
>> id1
>         dates       hits
> 1  2004-12-01         3
> 2  2005-01-01         4
> 3  2005-02-01        10
> 4  2005-03-01         6
> 5  2005-04-01        35
> 6  2005-05-01        14
> 7  2005-06-01        33
> 8  2005-07-01        13
> 9  2005-08-01         3
> 10 2005-09-01         9
> 11 2005-10-01         8
> 12 2005-11-01         4
> 13 2005-12-01         3
> 
> 
>> id2
>         dates       hits
> 1  2001-01-01         6
> 2  2001-02-01         5
> 3  2001-03-01         5
> 4  2001-04-01         6
> 5  2001-05-01         2
> 6  2001-06-01         5
> 7  2001-07-01         1
> 8  2001-08-01         6
> 9  2001-09-01         4
> 10 2001-10-01        10
> 11 2001-11-01         0
> 12 2001-12-01         3
> 13 2002-01-01         6
> 14 2002-02-01         5
> 15 2002-03-01         1
> 16 2002-04-01         2
> 17 2002-05-01         4
> 18 2002-06-01         4
> 19 2002-07-01         0
> 20 2002-08-01         1
> 21 2002-09-01         0
> 22 2002-10-01         2
> 23 2002-11-01         2
> 24 2002-12-01         2
> 25 2003-01-01         2
> 26 2003-02-01         3
> 27 2003-03-01         7
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

R help - Mar 2006 - Classifying time series by shape over time

[R] Classifying time series by shape over time

[R] Classifying time series by shape over time

[R] Classifying time series by shape over time