Hi, I have a dataset where the response for each person on one of the 2 treatments was a proportion (percentage of certain number of markers being positive), I also have the number of positive & negative markers available for each person. what is the best way to analyze this kind of data? I can think of analyzing this data using glm() with the attached dataset: test<-read.table('test.txt',sep='\t') fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? Third, is there an equivalent model where I can use variable "percentage" instead of "positive" & "total"? Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Thanks John -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101220/b65c978f/attachment.txt>
array chip <arrayprofile <at> yahoo.com> writes: [snip]> I can think of analyzing this data using glm() with the attached dataset: > > test<-read.table('test.txt',sep='\t') > fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial) > summary(fit) > anova(fit, test='Chisq')> First, is this still called logistic regression or something else? I thought > with logistic regression, the response variable is a binary factor?Sometimes I've seen it called "binomial regression", or just "a binomial generalized linear model"> Second, then summary(fit) and anova(fit, test='Chisq') gave me different p > values, why is that? which one should I use?summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate.> Third, is there an equivalent model where I can use variable "percentage" > instead of "positive" & "total"?glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above.> > Finally, what is the best way to analyze this kind of dataset > where it's almost the same as ANOVA except that the response variable > is a proportion (or success and failure)?Don't quite know what you mean here. How is the situation "almost the same as ANOVA" different from the situation you described above? Do you mean when there are multiple factors? or ???
A possible caveat here. Traditionally, logistic regression was performed on the logit-transformed proportions, with the standard errors based on the residuals for the resulting linear fit. This accommodates overdispersion naturally, but without telling you that you have any. glm with a binomial family does not allow for overdispoersion unless you use the quasibinomial family. If you have overdispersion, standard errors from glm will be unrealistically small. Make sure your model fits in glm before you believe the standard errors, or use the quasibionomial family. Steve Ellison LGC>>> Ben Bolker <bbolker at gmail.com> 21/12/2010 13:08:34 >>>array chip <arrayprofile <at> yahoo.com> writes: [snip]> I can think of analyzing this data using glm() with the attacheddataset:> > test<-read.table('test.txt',sep='\t') >fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)> summary(fit) > anova(fit, test='Chisq')> First, is this still called logistic regression or something else? Ithought> with logistic regression, the response variable is a binary factor?Sometimes I've seen it called "binomial regression", or just "a binomial generalized linear model"> Second, then summary(fit) and anova(fit, test='Chisq') gave medifferent p> values, why is that? which one should I use?summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate.> Third, is there an equivalent model where I can use variable"percentage"> instead of "positive" & "total"?glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above.> > Finally, what is the best way to analyze this kind of dataset > where it's almost the same as ANOVA except that the responsevariable> is a proportion (or success and failure)?Don't quite know what you mean here. How is the situation "almost the same as ANOVA" different from the situation you described above? Do you mean when there are multiple factors? or ??? ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
>...and before you believe in overdispersion, make sure you have acredible explanation for it. All too often, what you really have>is a model that doesn't fit your data properly.Well put. A possible fortune? S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}