thr3ads.net - R help - [R] Splitting up large set of survey data into categories [Jan 2012]

If this information is useful, please help other people find it:
Share via:

ak13

2012-Jan-24 09:54 UTC

[R] Splitting up large set of survey data into categories

Hi,

I am a total newbie to R so I apologize if the answer to my question is too
obvious.  I a data set of the following form:

 

  

    	Date
    	V1
    	V...
    	VN
    	Region
    	Industry

  

    	22/03/1995 23:01:12
    	1
    	3
    	2
    	15
    	A
  
 

    	21/03/1995 21:01:12
    	3
    	3
    	1
    	9
    	C
  
 

    	1/04/1995 17:01:06
    	3
    	2
    	1
    	3
    	B
  
 

Now I would like to analyze the data in the data.frame by Region, Industry,
Date (I would like to collapse the whole think to weekly data) and by the
three different answering options {1,2,3} in V1...VN. In stata which I used
before i did this step by step with a loop over all questions (V1...VN):
egen pos_`X'=total(`X'==1), by(industry week_year); egen
pos_`X'=total(`X'==2, by(industry week_year). This step-by-step
procedure
works because stata, even if the dates are displayed as weeks, doesn't
aggregate the values immediately. Unfortunately there seems to be no command
which works exactly in the same manner as by() (from stata) in R. My by now
most successful attempt accomplish the above described task was by using: 

as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry),
mean))

(where date is formatted as ISO-weekly %U)
Of course I would have to loop this over all questions (20) and all
answering possibilities (3) but at least it gives me an out put of the
structure:

 

  

    	 . 
    	industry.region
    	Industry.region
    	industry.region
    	industry.region

  

     	 10-1995
    	32
    	45
    	10
    	9
  
 

    	 15-1995
    	2
    	47
    	5
    	6
  
 

I could live with that because I could recombine the so created different
dataframes thenafter. My problem however is tapply doesn't preserve the
dataframe's format as a time series (xts). This means R aggregates by time
(week) (and industry and region) but the weeks on the x-axis are not in the
right order. I also tried to apply.weekly() but this doesn't seem to do what
I want to do.

Could anyone give me a hint how i could to this? Maybe with formatting the
data frame as time series data beforehand with preserving this during that
procedure. And maybe somebody also has an idea how I can maybe avoid all
this looping.

I would appreciate it very much much if somebody of you could give me a
hint!

Best regards,

Andreas 


 

--
View this message in context:
http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]

Tal Galili

2012-Jan-24 16:00 UTC

head link

[R] Splitting up large set of survey data into categories

Hi andreas,
Please give a sample of your data, and how you want it to be after the
manipulation.
Consider using
?dput



----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Tue, Jan 24, 2012 at 11:54 AM, ak13 <andreas.karpf@gmail.com> wrote:
> Hi,
>
> I am a total newbie to R so I apologize if the answer to my question is too
> obvious.  I a data set of the following form:
>
>
>
>
>
>        Date
>        V1
>        V...
>        VN
>        Region
>        Industry
>
>
>
>        22/03/1995 23:01:12
>        1
>        3
>        2
>        15
>        A
>
>
>
>        21/03/1995 21:01:12
>        3
>        3
>        1
>        9
>        C
>
>
>
>        1/04/1995 17:01:06
>        3
>        2
>        1
>        3
>        B
>
>
>
> Now I would like to analyze the data in the data.frame by Region, Industry,
> Date (I would like to collapse the whole think to weekly data) and by the
> three different answering options {1,2,3} in V1...VN. In stata which I used
> before i did this step by step with a loop over all questions (V1...VN):
> egen pos_`X'=total(`X'==1), by(industry week_year); egen
> pos_`X'=total(`X'==2, by(industry week_year). This step-by-step
procedure
> works because stata, even if the dates are displayed as weeks, doesn't
> aggregate the values immediately. Unfortunately there seems to be no
> command
> which works exactly in the same manner as by() (from stata) in R. My by now
> most successful attempt accomplish the above described task was by using:
>
> as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry),
> mean))
>
> (where date is formatted as ISO-weekly %U)
> Of course I would have to loop this over all questions (20) and all
> answering possibilities (3) but at least it gives me an out put of the
> structure:
>
>
>
>
>
>         .
>        industry.region
>        Industry.region
>        industry.region
>        industry.region
>
>
>
>         10-1995
>        32
>        45
>        10
>        9
>
>
>
>         15-1995
>        2
>        47
>        5
>        6
>
>
>
> I could live with that because I could recombine the so created different
> dataframes thenafter. My problem however is tapply doesn't preserve the
> dataframe's format as a time series (xts). This means R aggregates by
time
> (week) (and industry and region) but the weeks on the x-axis are not in the
> right order. I also tried to apply.weekly() but this doesn't seem to do
> what
> I want to do.
>
> Could anyone give me a hint how i could to this? Maybe with formatting the
> data frame as time series data beforehand with preserving this during that
> procedure. And maybe somebody also has an idea how I can maybe avoid all
> this looping.
>
> I would appreciate it very much much if somebody of you could give me a
> hint!
>
> Best regards,
>
> Andreas
>
>
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html
> Sent from the R help mailing list archive at Nabble.com.
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2012 - Splitting up large set of survey data into categories

[R] Splitting up large set of survey data into categories

[R] Splitting up large set of survey data into categories

Maybe Matching Threads