dereksloan
2011-May-03 14:06 UTC
[R] Generating summary statistics and simple statistical analysis from my data-set: how can I automate the analysis?
I am fairly new to R and have a (for me) slightly complicated set of data to analyse. It contains several continuous and categorical variables for a group of individuals ? e.g; ID Sex Age Familysize Phone Education 1 M 23 3 Yes Primary 2 F 25 4 Yes Secondary 3 M 33 5 No Tertiary 4 F 45 1 Yes Secondary 5 F 67 10 Yes Secondary I want to summarise it in a table as follows; All individuals Male Female Comparison between sexes (I want to put p-values in this column) Age Median (range) Median (range) Median (range) Wilcoxon rank sum test Family size Median (range) Median (range) Median (range) Wilcoxon rank sum test Phone Number Yes (%) Number Yes (%) Number Yes (%) Chi-squared test Education Chi-squared test Primary Number (%) Number (%) Number (%) Secondary Number (%) Number (%) Number (%) Tertiary Number (%) Number (%) Number (%) How can I use R to do this? For the continuous variables I know I can write code like; summary(Age) by(Age,data["Sex"],summary) wilcox.test(Age~Sex) summary(Familysize) by(Familysize,data[?Sex?],summary) Wilcox.test(Familysize~Sex) but is there any way of automating/looping the analysis so that I get summaries and comparative statistical analysis of all of the continuous variables in a single command? I?m sure this could be done by some kind of ?looping? given that the analysis is always the same. Presumably I then still have to copy the output of interest (medians, ranges, p-values) into the summary table manually? For each categorical variable I have really cumbersome code from which I can extract the information I need from each variable for the summary table? e.g, tphone<-xtabs(~Phone+Sex,data=data) N<-margin.table(tphone,2) tphone1<-rbind(tphone,N) Total<-margin.table(tphone1,1) tphone1<-cbind(tfbise3xul1,Total) tphone1<-t(tphone1) tphone1<-as.data.frame(tphone1) tphone2<-within(tphone1,{ per.No<-100*(No/N) per.Yes<-100*(Yes/N) tphone2<-tphone2[,c(3,2,4,1,5)] tphone2 chisq.test(tphone) but there must be better ways of generating the counts, percentages, and simple statistical analysis which I need. Again, can I loop it to do all of my categorical variables at once? Obviously my dataset has more continuous and categorical variables than those shown above but I?ve abbreviated it for simplicity of explanation ? I need to write simpler/looped code so that the whole thing is not crazily long-winded. Sorry that my approach so far is so bad and long-winded! R is a long uphill curve to start with, so I?m be very grateful for any help I can get from anyone who won?t laugh at me. Derek -- View this message in context: http://r.789695.n4.nabble.com/Generating-summary-statistics-and-simple-statistical-analysis-from-my-data-set-how-can-I-automate-th-tp3492537p3492537.html Sent from the R help mailing list archive at Nabble.com.