Hi
I have data that is sampled (in time) with a certain frequency and I would
like to express this time series as a time series of a higher (or lower)
frequency with the newly added time points being filled in with NA, 0, or
perhaps interpolated. My data might be regularly or irregularly spaced. For
example, I might have quarterly data that I would like to handle as a
monthly time series with NAs filled in for the missing months.
RSiteSearch("upsample") gave one link to a function in the
"waveslim"
package that I'm not familiar with. It seems to me that this would be a
fairly common time series task and thus am hoping to find something in the
more common time series packages/classes such as ts, zoo, tseries, etc...
I will now give some example code.
If I am "lucky" enough that my data is irregularly spaced, then a
combination of zoo and ts already accomplishes this task.
> require(zoo)
[1] TRUE> dt <- sample(c(1,3,9), 20, replace=TRUE)
> t <- zoo(dt, as.yearmon(Sys.Date()) + cumsum(dt)/12)
> t
Jan 2007 Feb 2007 Nov 2007 Feb 2008 Nov 2008 Dec 2008 Mar 2009 Apr 2009 Jul
2009 Aug 2009
3 1 9 3 9 1 3 1
3 1
Nov 2009 Feb 2010 Nov 2010 Aug 2011 May 2012 Jun 2012 Jul 2012 Oct 2012 Jul
2013 Aug 2013
3 3 9 9 9 1 1 3
9 1 > as.ts(t)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2007 3 1 NA NA NA NA NA NA NA NA 9 NA
2008 NA 3 NA NA NA NA NA NA NA NA 9 1
2009 NA NA 3 1 NA NA 3 1 NA NA 3 NA
2010 NA 3 NA NA NA NA NA NA NA NA 9 NA
2011 NA NA NA NA NA NA NA 9 NA NA NA NA
2012 NA NA NA NA 9 1 1 NA NA 3 NA NA
2013 NA NA NA NA NA NA 9 1 > plot(t)
However if the data happens to be regularly spaced, upsampling it isn't
quite as straightforward.
> t2 <- zoo(sample(1:3, 20, replace=TRUE), as.yearmon(seq(2000, by=0.5,
length=20)))> t2
Jan 2000 Jul 2000 Jan 2001 Jul 2001 Jan 2002 Jul 2002 Jan 2003 Jul 2003 Jan
2004 Jul 2004
3 3 2 2 1 3 1 2
3 3
Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008 Jul 2008 Jan
2009 Jul 2009
2 2 3 3 2 3 3 2
1 3 > (t2.ts <- as.ts(t2))
Time Series:
Start = c(2000, 1)
End = c(2009, 2)
Frequency = 2
[1] 3 3 2 2 1 3 1 2 3 3 2 2 3 3 2 3 3 2 1 3> plot(t2)
>
I would expect this to be as simple as changing the frequency attribute of
t2.ts to 12 but I didn't seem to be able to find out how to do this or if it
is possible.
So far, the only way around this that I have found is doing it
"manually" in
the following way:
> t2.monthly <- zoo(NA, as.yearmon(seq(from=2000, to=2009.5, by=1/12)))
> window(t2.monthly, as.numeric(time(t2)) ) <- as.numeric(t2) #
can this be done using "[]" indexing?> t2.monthly
Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000 Jul 2000 Aug 2000 Sep
2000 Oct 2000
3 NA NA NA NA NA 3 NA
NA NA
Nov 2000 Dec 2000 Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul
2001 Aug 2001
NA NA 2 NA NA NA NA NA
2 NA
Sep 2001 Oct 2001 Nov 2001 Dec 2001 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May
2002 Jun 2002
NA NA NA NA 1 NA NA NA
NA NA
Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 Jan 2003 Feb 2003 Mar
2003 Apr 2003
3 NA NA NA NA NA 1 NA
NA NA
May 2003 Jun 2003 Jul 2003 Aug 2003 Sep 2003 Oct 2003 Nov 2003 Dec 2003 Jan
2004 Feb 2004
NA NA 2 NA NA NA NA NA
3 NA
Mar 2004 Apr 2004 May 2004 Jun 2004 Jul 2004 Aug 2004 Sep 2004 Oct 2004 Nov
2004 Dec 2004
NA NA NA NA 3 NA NA NA
NA NA
Jan 2005 Feb 2005 Mar 2005 Apr 2005 May 2005 Jun 2005 Jul 2005 Aug 2005 Sep
2005 Oct 2005
2 NA NA NA NA NA 2 NA
NA NA
Nov 2005 Dec 2005 Jan 2006 Feb 2006 Mar 2006 Apr 2006 May 2006 Jun 2006 Jul
2006 Aug 2006
NA NA 3 NA NA NA NA NA
3 NA
Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 2007 Apr 2007 May
2007 Jun 2007
NA NA NA NA 2 NA NA NA
NA NA
Jul 2007 Aug 2007 Sep 2007 Oct 2007 Nov 2007 Dec 2007 Jan 2008 Feb 2008 Mar
2008 Apr 2008
3 NA NA NA NA NA 3 NA
NA NA
May 2008 Jun 2008 Jul 2008 Aug 2008 Sep 2008 Oct 2008 Nov 2008 Dec 2008 Jan
2009 Feb 2009
NA NA 2 NA NA NA NA NA
1 NA
Mar 2009 Apr 2009 May 2009 Jun 2009 Jul 2009
NA NA NA NA 3 > points(t2.monthly, type="p", col="blue")
> lines(na.locf(t2.monthly), col="blue") # as an example of
why I
might want to do this.>
Similarly, it would be nice if one could conveniently downsample a time
series, choosing to keep only the Nth point, or the sum or the average of
the previous N points, etc... I can see how that particular application
could probably be accomplished relatively easily using rapply and a
subsetting operation. However it might be nice to have a convenient wrapper
for this.
Any help would be appreciated. Thanks in advance.
Tobias
********************
Nedbank Limited Reg No 1951/000009/06. The following link displays the names of
the Nedbank Board of Directors and Company Secretary. [
http://www.nedbank.co.za/terms/DirectorsNedbank.htm ]
This email is confidential and is intended for the addressee only. The following
link will take you to Nedbank's legal notice. [
http://www.nedbank.co.za/terms/EmailDisclaimer.htm ]