Denise Xifara
2008-Aug-19 10:40 UTC
[R] Converting monthly data to quarterly dataMonday, August 18, 2008 11:38 AM
Dear Gavin, This is really great, thank you! I created some long loops to get rid of extra months at the beginning and the end of my data but your code is great for putting it then together quarterly. thanks again, Denise On Mon, 2008-08-18 at 14:31 +0100, Denise Xifara wrote:> Thank you very much Stephen, but how will aggregate deal with months that > fall outside annual quarters? eg, one extra month at the end of thedataset? [Without your data I'm kind of guessing at the exact format and problem, but the example below shows one way to deal with quarters that span years so should be adaptable to your problem. You also have a different idea of quarters to my ecological one...] ## first some dummy data set.seed(12345) dat <- data.frame(price = cumsum(rnorm(450)), date = seq(from = as.Date("2008-01-01"), length = 450, by = "days")) ## get the months of observations dat$month <- factor(format(dat$date, format = "%b"), levels = month.abb) ## and then format this for the quarters dat$quarter <- character(length = NROW(dat)) ## I'm sure these 4 steps can be simplified but just ## how escapes me at the moment dat$quarter[dat$month %in% month.abb[c(12,1,2)]] <- "Winter" dat$quarter[dat$month %in% month.abb[c(3:5)]] <- "Spring" dat$quarter[dat$month %in% month.abb[c(6:8)]] <- "Summer" dat$quarter[dat$month %in% month.abb[c(9:11)]] <- "Autumn" dat$quarter <- factor(dat$quarter, levels = c("Spring","Summer","Autumn","Winter")) ## look at the fruits of our labour head(dat) ## create period runs <- rle(as.numeric(dat$quarter))$lengths dat$period <- factor(rep(seq_along(runs), times = runs)) ## aggregate with(dat, aggregate(price, list(quarter = quarter, period = period), FUN = mean)) I use the rle() function (run length encoding) calculate the number of observations where the 'quarter' remains the same:> rle(as.numeric(dat$quarter))Run Length Encoding lengths: int [1:6] 60 92 92 91 90 25 values : num [1:6] 4 1 2 3 4 1 The 'values' here are the numeric representation of the quarter factor. The most interesting for us is the second 4 - this is the winter 2008/9. I use the lengths to replicate a period number (1,2,...,n) the correct number of times. Now we have the period correctly calculated, we just aggregate by quarter and period to give the averages you want. If you are working on months 1-3 as quarter 1, 4-6 as quarter 2 etc, then it is much easier, just aggregate by quarter and year: ## copy the data above dat2 <- dat ## change meaning of quarter dat2$quarter <- character(length = NROW(dat2)) dat2$quarter[dat2$month %in% month.abb[c(1:3)]] <- "Q1" dat2$quarter[dat2$month %in% month.abb[c(4:6)]] <- "Q2" dat2$quarter[dat2$month %in% month.abb[c(7:9)]] <- "Q3" dat2$quarter[dat2$month %in% month.abb[c(10:12)]] <- "Q4" dat2$quarter <- factor(dat2$quarter, levels = c("Q1","Q2","Q3","Q4")) ## year variable dat2$year <- factor(format(dat2$date, format = "%Y")) ## drop the first 40 days to simulate a late starting record ## and aggregate with(dat2[-(1:40), ], aggregate(price, list(quarter = quarter, year year), FUN = mean)) Which gives: quarter year x 1 Q1 2008 13.58644 2 Q2 2008 24.16523 3 Q3 2008 28.56004 4 Q4 2008 32.60900 5 Q1 2009 44.86594 Do these examples help solve your problem? G> > 2008/8/18 stephen sefick <ssefick@gmail.com<http://us.mc339.mail.yahoo.com/mc/compose?to=ssefick@gmail.com> > > > > ?aggregate > > may do what you want > > > > On Mon, Aug 18, 2008 at 8:19 AM, Denise Xifara > > <dionysia-kiara.xifaras@st-hildas.ox.ac.uk<http://us.mc339.mail.yahoo.com/mc/compose?to=dionysia-kiara.xifaras@st-hildas.ox.ac.uk>>wrote:> > > Dear R users, > > > > > > I have a dataframe where column is has countries, column 2 is dates > > > (monthly) for each countrly, the next 10 columns are my factors whereI> > have > > > measurements for each country and for each date. I have attached a > > sample > > > of the data in csv format with the data for 3 countries. > > > > > > I would like to convert my monthly data into quarterly data, findingthe> > > mean over 3 month periods for factors a-i, and the sum for factor j.My> > > problem is that not all countries have starting date at the beginningof> > a > > > quarter for a particular year, ie some countries start in May or > > September, > > > and also some countries have one extra month, some have two extramonths> > so > > > there's no way of deleting some rows with a simple command (I want toget> > > rid of all extra data that does not fall into the quarters for each > > > country), since the amount of data to get rid of for each countryvaries.> > > > > > I tried for example: > > > i=1 > > > denise<-data[((data$country)==unique(data$country[i]),] > > > denise[,2]<- as.Date(denise$date, "%Y-%m-%d") > > > denise2<-denise[order(denise[,2],decreasing=FALSE),] > > > len<-length(denise[,1]) > > > limit<-floor(len/3)+1 > > > splitter<-rep(1:limit,each=3) > > > spl.dat<-split(denise2,splitter) > > > new.data<-as.matrix(lapply(spl.dat,FUN="mean")) > > > > > > This finds the mean every 3 rows but this doesnt consider the data > > quarterly > > > in a calendar sense. ie if the data starts in november, it doesnt > > discard > > > the data for november, december and start calculating the means from > > january > > > onwards, until the month where the last quarter finishes, discardingany> > > extra month, or two months at the end. > > > > > > I tried converting my data frame/matrix to a time series but the dates > > are > > > not kept. I got: > > > > > >>tser<-as.ts(denise) > > > Warning message: > > > In data.matrix(data) : class information lost from one or more columns > > > > > > and column 2 has become a list of numbers rather than dates. > > > > > > I tried: > > > > > >> library(fCalendar) > > >> den.tseries<-as.timeSeries(denise) > > > Warning messages: > > > 1: In .whichFormat(charvec, ...) : Could not determine time(date)format> > > 2: In .whichFormat(charvec, ...) : Could not determine time(date)format> > >> is.timeSeries(den.tseries) > > > [1] TRUE > > >> apply.quarterly(den.tseries,FUN="mean") > > > data > > > 1970-01-01 -2.425000000 > > > 1970-04-01 -0.557961111 > > > 1970-04-28 0.009814815 > > > > > > Here, it calculates things quarterly but the the as.timeSeries command > > has > > > assigned its own daily dates to the data, instead of keeping mymonthly> > > dates. Also, I don't understand how it deals with the extra dates. > > > > > > Sorry for the long email, > > > Any help would be very much appreciated, > > > Kind regards, > > > Denise > > > > > > ______________________________________________ > > > R-help@r-project.org<http://us.mc339.mail.yahoo.com/mc/compose?to=R-help@r-project.org>mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html><http://www.r-project.org/posting-guide.html>> > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > -- > > Let's not spend our time and resources thinking about things that are > > so little or so large that all they really do for us is puff us up and > > make us feel like gods. We are mammals, and have not exhausted the > > annoying little problems of being mammals. > > > > -K. Mullis > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org<http://us.mc339.mail.yahoo.com/mc/compose?to=R-help@r-project.org>mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]