On Mon, Dec 3, 2012 at 8:30 PM, Andrew Freedman
<andrewf at hotsprings.com.au> wrote:> Hi List,
>
> I have weekly sales observations for several products drawn via ODBC.
> Source data is available at
> https://www.dropbox.com/s/78vxae5ic8tnutf/asr.csv.
>
> This is retail sales data, so will contain seasonality and trend
> information. I expect to see 52 or 53 observations per year, each
> observation occuring on the same day of the week (Saturday). Ultimately
> I'm looking to feed these series into forecasting models for demand
> planning.
>
> The data has issues with internal gaps, so while I've been able to
> create a ts that appears to respect the frequency and period, I suspect
> that a zoo is going to be a better data container. Unfortunately, I'm
> not understanding the use of zoo() to describe frequency/period/deltat.
>
> In the example below I use sales[,16] (aka $p) as it has several
> periods (data between 2004 and 2012). I've tried using frequency=52, =7
> and =1, but get the same result each time; every data point ends up in
> cycle 1 and I don't have the periodicity needed to find seasonality.
>
>> sales <- read.csv("asr.csv")
>> library(zoo)
>
> Attaching package: 'zoo'
>
> The following object(s) are masked from 'package:base':
>
> as.Date, as.Date.numeric
>
>> sales.zoo <- zoo(subset(sales, select=c(2:length(sales))),
order.by> + sales$date_end, frequency = 52)
>> sales.zoo.i <- na.approx(sales.zoo) # interpolate internal NA values
>> frequency(sales.zoo.i) # 52, which seems right
> [1] 52
>> cycle(sales.zoo.i[1:20,16]) # everything is in the same cycle...
> 2004-08-14 2004-08-21 2004-08-28 2004-09-04 2004-09-11 2004-09-18
> 1 1 1 1 1 1
> 2004-09-25 2004-10-02 2004-10-09 2004-10-16 2004-10-23 2004-10-30
> 1 1 1 1 1 1
> 2004-11-06 2004-11-13 2004-11-20 2004-11-27 2004-12-04 2004-12-11
> 1 1 1 1 1 1
> 2004-12-18 2004-12-25 2005-01-01 2005-01-08 2005-01-15 2005-01-22
> 1 1 1 1 1 1
> 2005-01-29 2005-02-05 2005-02-12 2005-02-19 2005-02-26 2005-03-05
> 1 1 1 1 1 1
>>
>
> Doubtless it's some facile error that will make me feel sheepish, but
> I've been staring at this for a bit now and just getting nowhere. Any
> pointers would be greatly appreciated.
>
A complete cycle is always represented by 1 time unit so if you wanted
a complete cycle to be a year then you would need to represent time in
years and fractions of a year, not as "Date" class. That is how
"ts"
class works too.
Since weeks don't evenly divide years you will have to approximate
this in order to have a frequency of 52. There are many ways to do
this but below we drop week 00 in 53 week years so that there are 52
weeks in every year: Years with 52 weeks don't have a week 00 so this
makes all years 52 weeks.
z <- read.zoo("asr.csv", sep = ",", header = TRUE)
# drop week "00"
z0 <- z[ format(time(z), "%W") != "00" ]
t0 <- time(z0)
# convert time to year + fraction
time(z0) <- as.numeric(format(t0, "%Y")) +
(as.numeric(format(t0, "%W")) - 1) / 52
# convert to zooreg class (almost regularly spaced)
zr <- as.zooreg(z0)
frequency(zr) # 52
head(cycle(zr))
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com