Sacha Viquerat
2012-Aug-10 10:48 UTC
[R] creating a contingency table from a data.frame automatically (NOT BY HAND)
Hello there!
I am still struggling with a binomial response over all categorical
variables (some of them with 3 levels, most with 2 levels). After
initial struggles with glm's (struggle coming from the data, not the
actual analysis) I have decided to prefer contingency tables. I have my
data such as:
response:
hunting.prev=c("success","fail","success","success","success","fail",...)
one of 21 surveyed variables:
groupsize=c("small","large","small","small","small","large"...)
...
now...
It is intuitive to me that I will have to split up each variable by its
level(s), thus creating 2 new variables for groupsize (as an example)
holding the counts of small hunting parties when the hunting.prev was a
success and so on. I could write a function to do that for me, however,
never intend to reinvent the wheel. I would like my data to look like that:
hunting prev groupsize-small groupsize-large dogs-yes
dogs-no guns-yes guns-no...
success 12 2 4 14 23 12...
failure 1 6 34 0 12 3...
of course, hunting.prev would only be needed to create the index via
hunting.prev=="success" and is here used to indicate what each row
means. My questions would be:
a) how to count and split each categorical variable by a response
variable, how to create a 2x20something (contingency) table and how far
a prop.test() approach or a chi? may be more appropriate to actually
analyze the data.
b) how do you guys create R output so that it's formatted in nice
columns and rows?
Hope to see help,
Thanks!
Rui Barradas
2012-Aug-10 11:04 UTC
[R] creating a contingency table from a data.frame automatically (NOT BY HAND)
Hello,
Try the following to format nicely.
n <- 1e2
hunting.prev <- sample(c("success","fail"), n, TRUE)
groupsize <- sample(c("small", "medium",
"large"), n, TRUE)
dog <- sample(c("yes", "no"), n, TRUE)
guns <- sample(c("yes", "no"), n, TRUE)
ftable(hunting.prev, groupsize, dog, guns)
As for the tests, that dependes on your needs, see the respective help
pages. An example would be
chisq.test( table(hunting.prev, dog) )
Hope this helps,
Rui Barradas
Em 10-08-2012 11:48, Sacha Viquerat escreveu:> Hello there!
> I am still struggling with a binomial response over all categorical
> variables (some of them with 3 levels, most with 2 levels). After
> initial struggles with glm's (struggle coming from the data, not the
> actual analysis) I have decided to prefer contingency tables. I have
> my data such as:
>
> response:
>
hunting.prev=c("success","fail","success","success","success","fail",...)
>
> one of 21 surveyed variables:
>
groupsize=c("small","large","small","small","small","large"...)
> ...
>
> now...
> It is intuitive to me that I will have to split up each variable by
> its level(s), thus creating 2 new variables for groupsize (as an
> example) holding the counts of small hunting parties when the
> hunting.prev was a success and so on. I could write a function to do
> that for me, however, never intend to reinvent the wheel. I would like
> my data to look like that:
>
> hunting prev groupsize-small groupsize-large dogs-yes
> dogs-no guns-yes guns-no...
> success 12 2 4 14 23 12...
> failure 1 6 34 0 12 3...
>
> of course, hunting.prev would only be needed to create the index via
> hunting.prev=="success" and is here used to indicate what each
row
> means. My questions would be:
>
> a) how to count and split each categorical variable by a response
> variable, how to create a 2x20something (contingency) table and how
> far a prop.test() approach or a chi? may be more appropriate to
> actually analyze the data.
>
> b) how do you guys create R output so that it's formatted in nice
> columns and rows?
>
> Hope to see help,
> Thanks!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
arun
2012-Aug-10 13:20 UTC
[R] creating a contingency table from a data.frame automatically (NOT BY HAND)
HI,
Try this:
n<-100
dat1<-data.frame(hunting.prev=sample(c("success","fail"),n,
replace=TRUE),groupsize=sample(c("small","large"),n,replace=TRUE),dogs=sample(c("yes","no"),n,replace=TRUE),
guns=sample(c("yes","no"),n,replace=TRUE))
mytable<-xtabs(~hunting.prev+groupsize+dogs+guns,data=dat1)
?ftable(mytable)
??????????????????????????? guns no yes
hunting.prev groupsize dogs???????????
fail???????? large???? no???????? 5? 10
?????????????????????? yes??????? 3?? 9
???????????? small???? no???????? 8?? 7
?????????????????????? yes??????? 6?? 2
success????? large???? no??????? 10?? 3
?????????????????????? yes??????? 7? 10
???????????? small???? no???????? 7?? 6
?????????????????????? yes??????? 6?? 1
?summary(mytable)
#Call: xtabs(formula = ~hunting.prev + groupsize + dogs + guns, data = dat1)
#Number of cases in table: 100
#Number of factors: 4
#Test for independence of all factors:
? #? Chisq = 16.749, df = 11, p-value = 0.1155
?? # Chi-squared approximation may be incorrect
A.K.
----- Original Message -----
From: Sacha Viquerat <dawa.ya.moto at googlemail.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Friday, August 10, 2012 6:48 AM
Subject: [R] creating a contingency table from a data.frame automatically (NOT
BY HAND)
Hello there!
I am still struggling with a binomial response over all categorical variables
(some of them with 3 levels, most with 2 levels). After initial struggles with
glm's (struggle coming from the data, not the actual analysis) I have
decided to prefer contingency tables. I have my data such as:
response:
hunting.prev=c("success","fail","success","success","success","fail",...)
one of 21 surveyed variables:
groupsize=c("small","large","small","small","small","large"...)
...
now...
It is intuitive to me that I will have to split up each variable by its
level(s), thus creating 2 new variables for groupsize (as an example) holding
the counts of small hunting parties when the hunting.prev was a success and so
on. I could write a function to do that for me, however, never intend to
reinvent the wheel. I would like my data to look like that:
hunting prev? ? groupsize-small? ? groupsize-large? ? dogs-yes dogs-no? ?
guns-yes? ? guns-no...
success? ? 12? ? 2? ? 4? ? 14? ? 23? ? 12...
failure? ? 1? ? 6? ? 34? ? 0? ? 12? ? 3...
of course, hunting.prev would only be needed to create the index via
hunting.prev=="success" and is here used to indicate what each row
means. My questions would be:
a) how to count and split each categorical variable by a response variable, how
to create a 2x20something (contingency) table and how far a prop.test() approach
or a chi? may be more appropriate to actually analyze the data.
b) how do you guys create R output so that it's formatted in nice columns
and rows?
Hope to see help,
Thanks!
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.