On Sat, Aug 7, 2010 at 4:49 PM, steven mosher <moshersteven at gmail.com>
wrote:> Given a data frame, or it could be a matrix if I choose to.
> The data consists of an ID, a year, and data for all 12 months.
> Missing values are a factor AND missing years.
>
> Id<-c(rep(67543,4),rep(12345,3),rep(89765,5))
> ?Years<-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
> ?Values2<-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
> ?Values<-c(12,14,34,21,54,65,23,12,13,13,13,14)
>
?Data<-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
> + Oct=Values,Nov=Values,Dec=Values2)
> ?Data
> ? Index Year Jan ?Feb Mar Apr Jun ? ? ?July Aug Sep Oct Nov Dec
> 1 ?67543 1989 ?12 ?6.0 ?12 ?12 ?12 ?4.000000 ?12 ?12 ?12 ?12 ?12
> 2 ?67543 1990 ?14 ?7.0 ?NA ?NA ?14 ?4.666667 ?NA ?14 ?14 ?14 ?NA
> 3 ?67543 1991 ?34 17.0 ?34 ?34 ?34 11.333333 ?34 ?34 ?34 ?34 ?34
> 4 ?67543 1992 ?21 10.5 ?21 ?21 ?21 ?7.000000 ?21 ?21 ?21 ?21 ?21
> 5 ?12345 1991 ?54 27.0 ?NA ?NA ?54 18.000000 ?NA ?54 ?54 ?54 ?NA
> 6 ?12345 1993 ?65 32.5 ?65 ?65 ?65 21.666667 ?65 ?65 ?65 ?65 ?65
> 7 ?12345 1994 ?23 11.5 ?23 ?23 ?23 ?7.666667 ?23 ?23 ?23 ?23 ?23
> 8 ?89765 1991 ?12 ?6.0 ?NA ?NA ?12 ?4.000000 ?NA ?12 ?12 ?12 ?NA
> 9 ?89765 1992 ?13 ?6.5 ?13 ?13 ?13 ?4.333333 ?13 ?13 ?13 ?13 ?13
> 10 89765 1993 ?13 ?6.5 ?NA ?NA ?13 ?4.333333 ?NA ?13 ?13 ?13 ?NA
> 11 89765 1994 ?13 ?6.5 ?13 ?13 ?13 ?4.333333 ?13 ?13 ?13 ?13 ?13
> 12 89765 1995 ?14 ?7.0 ?14 ?14 ?14 ?4.666667 ?14 ?14 ?14 ?14 ?14
>
>
> The Goal is to return a Time series object for each ID. Alternatively one
> could return a matrix that I can turn into a Time series.
> The final structure would be something like this ( done in matrix form for
> illustration)
> ? ? ? ? ?1989.0 ?1989.083
> ? ?1991 ......1992....1993..... 1994 .... 1995
> 67543 12 ? ? ? 6.0 ? 12 ?12 ?12 ?4.000000 ?12 ?12 ?12 ?12 ?12...
> .34...........21.. ? ? NA.........NA........NA
> 12345 ?NA, NA,
> NA,.............................................................54 27
>
> Basically the time series will have patches at the front, middle and end
> where you may have years of NA
> The must be column ordered by time and aligned so that averages for all
> series can be computed per month.
>
> Now I have looping code to do this, where I loop through all the IDs and
map
> the row of data into the correct
> column. and create column names based on the data and row names based on
the
> ID, but it's painfully
> slow. Any wizardry would help.
Your email came out a bit garbled so its not clear what you want to
get out but this code will produce a multivariate ts series, i.e. an
mts series, with one column for each series:
f <- function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
do.call(cbind, by(Data, Data$Index, f))