Hi, I am a total newbie to R so I apologize if the answer to my question is too obvious. I a data set of the following form: Date V1 V... VN Region Industry 22/03/1995 23:01:12 1 3 2 15 A 21/03/1995 21:01:12 3 3 1 9 C 1/04/1995 17:01:06 3 2 1 3 B Now I would like to analyze the data in the data.frame by Region, Industry, Date (I would like to collapse the whole think to weekly data) and by the three different answering options {1,2,3} in V1...VN. In stata which I used before i did this step by step with a loop over all questions (V1...VN): egen pos_`X'=total(`X'==1), by(industry week_year); egen pos_`X'=total(`X'==2, by(industry week_year). This step-by-step procedure works because stata, even if the dates are displayed as weeks, doesn't aggregate the values immediately. Unfortunately there seems to be no command which works exactly in the same manner as by() (from stata) in R. My by now most successful attempt accomplish the above described task was by using: as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry), mean)) (where date is formatted as ISO-weekly %U) Of course I would have to loop this over all questions (20) and all answering possibilities (3) but at least it gives me an out put of the structure: . industry.region Industry.region industry.region industry.region 10-1995 32 45 10 9 15-1995 2 47 5 6 I could live with that because I could recombine the so created different dataframes thenafter. My problem however is tapply doesn't preserve the dataframe's format as a time series (xts). This means R aggregates by time (week) (and industry and region) but the weeks on the x-axis are not in the right order. I also tried to apply.weekly() but this doesn't seem to do what I want to do. Could anyone give me a hint how i could to this? Maybe with formatting the data frame as time series data beforehand with preserving this during that procedure. And maybe somebody also has an idea how I can maybe avoid all this looping. I would appreciate it very much much if somebody of you could give me a hint! Best regards, Andreas -- View this message in context: http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]
Tal Galili
2012-Jan-24 16:00 UTC
[R] Splitting up large set of survey data into categories
Hi andreas, Please give a sample of your data, and how you want it to be after the manipulation. Consider using ?dput ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Tue, Jan 24, 2012 at 11:54 AM, ak13 <andreas.karpf@gmail.com> wrote:> Hi, > > I am a total newbie to R so I apologize if the answer to my question is too > obvious. I a data set of the following form: > > > > > > Date > V1 > V... > VN > Region > Industry > > > > 22/03/1995 23:01:12 > 1 > 3 > 2 > 15 > A > > > > 21/03/1995 21:01:12 > 3 > 3 > 1 > 9 > C > > > > 1/04/1995 17:01:06 > 3 > 2 > 1 > 3 > B > > > > Now I would like to analyze the data in the data.frame by Region, Industry, > Date (I would like to collapse the whole think to weekly data) and by the > three different answering options {1,2,3} in V1...VN. In stata which I used > before i did this step by step with a loop over all questions (V1...VN): > egen pos_`X'=total(`X'==1), by(industry week_year); egen > pos_`X'=total(`X'==2, by(industry week_year). This step-by-step procedure > works because stata, even if the dates are displayed as weeks, doesn't > aggregate the values immediately. Unfortunately there seems to be no > command > which works exactly in the same manner as by() (from stata) in R. My by now > most successful attempt accomplish the above described task was by using: > > as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry), > mean)) > > (where date is formatted as ISO-weekly %U) > Of course I would have to loop this over all questions (20) and all > answering possibilities (3) but at least it gives me an out put of the > structure: > > > > > > . > industry.region > Industry.region > industry.region > industry.region > > > > 10-1995 > 32 > 45 > 10 > 9 > > > > 15-1995 > 2 > 47 > 5 > 6 > > > > I could live with that because I could recombine the so created different > dataframes thenafter. My problem however is tapply doesn't preserve the > dataframe's format as a time series (xts). This means R aggregates by time > (week) (and industry and region) but the weeks on the x-axis are not in the > right order. I also tried to apply.weekly() but this doesn't seem to do > what > I want to do. > > Could anyone give me a hint how i could to this? Maybe with formatting the > data frame as time series data beforehand with preserving this during that > procedure. And maybe somebody also has an idea how I can maybe avoid all > this looping. > > I would appreciate it very much much if somebody of you could give me a > hint! > > Best regards, > > Andreas > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html > Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]