Sacha Viquerat
2012-Aug-10 10:48 UTC
[R] creating a contingency table from a data.frame automatically (NOT BY HAND)
Hello there! I am still struggling with a binomial response over all categorical variables (some of them with 3 levels, most with 2 levels). After initial struggles with glm's (struggle coming from the data, not the actual analysis) I have decided to prefer contingency tables. I have my data such as: response: hunting.prev=c("success","fail","success","success","success","fail",...) one of 21 surveyed variables: groupsize=c("small","large","small","small","small","large"...) ... now... It is intuitive to me that I will have to split up each variable by its level(s), thus creating 2 new variables for groupsize (as an example) holding the counts of small hunting parties when the hunting.prev was a success and so on. I could write a function to do that for me, however, never intend to reinvent the wheel. I would like my data to look like that: hunting prev groupsize-small groupsize-large dogs-yes dogs-no guns-yes guns-no... success 12 2 4 14 23 12... failure 1 6 34 0 12 3... of course, hunting.prev would only be needed to create the index via hunting.prev=="success" and is here used to indicate what each row means. My questions would be: a) how to count and split each categorical variable by a response variable, how to create a 2x20something (contingency) table and how far a prop.test() approach or a chi? may be more appropriate to actually analyze the data. b) how do you guys create R output so that it's formatted in nice columns and rows? Hope to see help, Thanks!
Rui Barradas
2012-Aug-10 11:04 UTC
[R] creating a contingency table from a data.frame automatically (NOT BY HAND)
Hello, Try the following to format nicely. n <- 1e2 hunting.prev <- sample(c("success","fail"), n, TRUE) groupsize <- sample(c("small", "medium", "large"), n, TRUE) dog <- sample(c("yes", "no"), n, TRUE) guns <- sample(c("yes", "no"), n, TRUE) ftable(hunting.prev, groupsize, dog, guns) As for the tests, that dependes on your needs, see the respective help pages. An example would be chisq.test( table(hunting.prev, dog) ) Hope this helps, Rui Barradas Em 10-08-2012 11:48, Sacha Viquerat escreveu:> Hello there! > I am still struggling with a binomial response over all categorical > variables (some of them with 3 levels, most with 2 levels). After > initial struggles with glm's (struggle coming from the data, not the > actual analysis) I have decided to prefer contingency tables. I have > my data such as: > > response: > hunting.prev=c("success","fail","success","success","success","fail",...) > > one of 21 surveyed variables: > groupsize=c("small","large","small","small","small","large"...) > ... > > now... > It is intuitive to me that I will have to split up each variable by > its level(s), thus creating 2 new variables for groupsize (as an > example) holding the counts of small hunting parties when the > hunting.prev was a success and so on. I could write a function to do > that for me, however, never intend to reinvent the wheel. I would like > my data to look like that: > > hunting prev groupsize-small groupsize-large dogs-yes > dogs-no guns-yes guns-no... > success 12 2 4 14 23 12... > failure 1 6 34 0 12 3... > > of course, hunting.prev would only be needed to create the index via > hunting.prev=="success" and is here used to indicate what each row > means. My questions would be: > > a) how to count and split each categorical variable by a response > variable, how to create a 2x20something (contingency) table and how > far a prop.test() approach or a chi? may be more appropriate to > actually analyze the data. > > b) how do you guys create R output so that it's formatted in nice > columns and rows? > > Hope to see help, > Thanks! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
arun
2012-Aug-10 13:20 UTC
[R] creating a contingency table from a data.frame automatically (NOT BY HAND)
HI, Try this: n<-100 dat1<-data.frame(hunting.prev=sample(c("success","fail"),n, replace=TRUE),groupsize=sample(c("small","large"),n,replace=TRUE),dogs=sample(c("yes","no"),n,replace=TRUE), guns=sample(c("yes","no"),n,replace=TRUE)) mytable<-xtabs(~hunting.prev+groupsize+dogs+guns,data=dat1) ?ftable(mytable) ??????????????????????????? guns no yes hunting.prev groupsize dogs??????????? fail???????? large???? no???????? 5? 10 ?????????????????????? yes??????? 3?? 9 ???????????? small???? no???????? 8?? 7 ?????????????????????? yes??????? 6?? 2 success????? large???? no??????? 10?? 3 ?????????????????????? yes??????? 7? 10 ???????????? small???? no???????? 7?? 6 ?????????????????????? yes??????? 6?? 1 ?summary(mytable) #Call: xtabs(formula = ~hunting.prev + groupsize + dogs + guns, data = dat1) #Number of cases in table: 100 #Number of factors: 4 #Test for independence of all factors: ? #? Chisq = 16.749, df = 11, p-value = 0.1155 ?? # Chi-squared approximation may be incorrect A.K. ----- Original Message ----- From: Sacha Viquerat <dawa.ya.moto at googlemail.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, August 10, 2012 6:48 AM Subject: [R] creating a contingency table from a data.frame automatically (NOT BY HAND) Hello there! I am still struggling with a binomial response over all categorical variables (some of them with 3 levels, most with 2 levels). After initial struggles with glm's (struggle coming from the data, not the actual analysis) I have decided to prefer contingency tables. I have my data such as: response: hunting.prev=c("success","fail","success","success","success","fail",...) one of 21 surveyed variables: groupsize=c("small","large","small","small","small","large"...) ... now... It is intuitive to me that I will have to split up each variable by its level(s), thus creating 2 new variables for groupsize (as an example) holding the counts of small hunting parties when the hunting.prev was a success and so on. I could write a function to do that for me, however, never intend to reinvent the wheel. I would like my data to look like that: hunting prev? ? groupsize-small? ? groupsize-large? ? dogs-yes dogs-no? ? guns-yes? ? guns-no... success? ? 12? ? 2? ? 4? ? 14? ? 23? ? 12... failure? ? 1? ? 6? ? 34? ? 0? ? 12? ? 3... of course, hunting.prev would only be needed to create the index via hunting.prev=="success" and is here used to indicate what each row means. My questions would be: a) how to count and split each categorical variable by a response variable, how to create a 2x20something (contingency) table and how far a prop.test() approach or a chi? may be more appropriate to actually analyze the data. b) how do you guys create R output so that it's formatted in nice columns and rows? Hope to see help, Thanks! ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.