Hi,
I am a total newbie to R so I apologize if the answer to my question is too
obvious. I a data set of the following form:
Date
V1
V...
VN
Region
Industry
22/03/1995 23:01:12
1
3
2
15
A
21/03/1995 21:01:12
3
3
1
9
C
1/04/1995 17:01:06
3
2
1
3
B
Now I would like to analyze the data in the data.frame by Region, Industry,
Date (I would like to collapse the whole think to weekly data) and by the
three different answering options {1,2,3} in V1...VN. In stata which I used
before i did this step by step with a loop over all questions (V1...VN):
egen pos_`X'=total(`X'==1), by(industry week_year); egen
pos_`X'=total(`X'==2, by(industry week_year). This step-by-step
procedure
works because stata, even if the dates are displayed as weeks, doesn't
aggregate the values immediately. Unfortunately there seems to be no command
which works exactly in the same manner as by() (from stata) in R. My by now
most successful attempt accomplish the above described task was by using:
as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry),
mean))
(where date is formatted as ISO-weekly %U)
Of course I would have to loop this over all questions (20) and all
answering possibilities (3) but at least it gives me an out put of the
structure:
.
industry.region
Industry.region
industry.region
industry.region
10-1995
32
45
10
9
15-1995
2
47
5
6
I could live with that because I could recombine the so created different
dataframes thenafter. My problem however is tapply doesn't preserve the
dataframe's format as a time series (xts). This means R aggregates by time
(week) (and industry and region) but the weeks on the x-axis are not in the
right order. I also tried to apply.weekly() but this doesn't seem to do what
I want to do.
Could anyone give me a hint how i could to this? Maybe with formatting the
data frame as time series data beforehand with preserving this during that
procedure. And maybe somebody also has an idea how I can maybe avoid all
this looping.
I would appreciate it very much much if somebody of you could give me a
hint!
Best regards,
Andreas
--
View this message in context:
http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]
Tal Galili
2012-Jan-24 16:00 UTC
[R] Splitting up large set of survey data into categories
Hi andreas, Please give a sample of your data, and how you want it to be after the manipulation. Consider using ?dput ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Tue, Jan 24, 2012 at 11:54 AM, ak13 <andreas.karpf@gmail.com> wrote:> Hi, > > I am a total newbie to R so I apologize if the answer to my question is too > obvious. I a data set of the following form: > > > > > > Date > V1 > V... > VN > Region > Industry > > > > 22/03/1995 23:01:12 > 1 > 3 > 2 > 15 > A > > > > 21/03/1995 21:01:12 > 3 > 3 > 1 > 9 > C > > > > 1/04/1995 17:01:06 > 3 > 2 > 1 > 3 > B > > > > Now I would like to analyze the data in the data.frame by Region, Industry, > Date (I would like to collapse the whole think to weekly data) and by the > three different answering options {1,2,3} in V1...VN. In stata which I used > before i did this step by step with a loop over all questions (V1...VN): > egen pos_`X'=total(`X'==1), by(industry week_year); egen > pos_`X'=total(`X'==2, by(industry week_year). This step-by-step procedure > works because stata, even if the dates are displayed as weeks, doesn't > aggregate the values immediately. Unfortunately there seems to be no > command > which works exactly in the same manner as by() (from stata) in R. My by now > most successful attempt accomplish the above described task was by using: > > as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry), > mean)) > > (where date is formatted as ISO-weekly %U) > Of course I would have to loop this over all questions (20) and all > answering possibilities (3) but at least it gives me an out put of the > structure: > > > > > > . > industry.region > Industry.region > industry.region > industry.region > > > > 10-1995 > 32 > 45 > 10 > 9 > > > > 15-1995 > 2 > 47 > 5 > 6 > > > > I could live with that because I could recombine the so created different > dataframes thenafter. My problem however is tapply doesn't preserve the > dataframe's format as a time series (xts). This means R aggregates by time > (week) (and industry and region) but the weeks on the x-axis are not in the > right order. I also tried to apply.weekly() but this doesn't seem to do > what > I want to do. > > Could anyone give me a hint how i could to this? Maybe with formatting the > data frame as time series data beforehand with preserving this during that > procedure. And maybe somebody also has an idea how I can maybe avoid all > this looping. > > I would appreciate it very much much if somebody of you could give me a > hint! > > Best regards, > > Andreas > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html > Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]