Shuhua Zhan
2016-Sep-28 03:03 UTC
[R] How to test a difference in ratios of count data in R
Hello R-experts, I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments. I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS. data test; input replicate treatment$ n X; datalines; 1 A 32 4 1 B 33 18 2 A 20 6 2 B 21 18 3 A 7 0 3 B 8 4 ; proc glimmix data=test; class replicate treatment; model X/n = treatment / solution; random intercept / subject=replicate; run; ods select lsmeans; proc glimmix data=test; class replicate treatment; model X/n = treatment / solution; random intercept / subject=replicate; lsmeans treatment / cl ilink; run; I appreciate your help in advance! Joshua [[alternative HTML version deleted]]
There are multiple ways of doing this, but here are a couple. To just test the fixed effect of treatment you can use the glm function: test <- read.table(text=" replicate treatment n X 1 A 32 4 1 B 33 18 2 A 20 6 2 B 21 18 3 A 7 0 3 B 8 4 ", header=TRUE) fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial) summary(fit1) Note that the default baseline value may differ between R and SAS, which would result in a reversed sign on the slope coefficient (and different intercept). To include replicate as a random effect you need an additional package, here I use lme4 and the glmer function: library(lme4) fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test, family=binomial) summary(fit2) On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca> wrote:> Hello R-experts, > I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments. > > I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS. > > > data test; > input replicate treatment$ n X; > datalines; > 1 A 32 4 > 1 B 33 18 > 2 A 20 6 > 2 B 21 18 > 3 A 7 0 > 3 B 8 4 > ; > > proc glimmix data=test; > class replicate treatment; > model X/n = treatment / solution; > random intercept / subject=replicate; > run; > > ods select lsmeans; > proc glimmix data=test; > class replicate treatment; > model X/n = treatment / solution; > random intercept / subject=replicate; > lsmeans treatment / cl ilink; > run; > > I appreciate your help in advance! > Joshua > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
David Winsemius
2016-Sep-28 20:54 UTC
[R] How to test a difference in ratios of count data in R
> On Sep 28, 2016, at 9:49 AM, Greg Snow <538280 at gmail.com> wrote: > > There are multiple ways of doing this, but here are a couple. > > To just test the fixed effect of treatment you can use the glm function: > > test <- read.table(text=" > replicate treatment n X > 1 A 32 4 > 1 B 33 18 > 2 A 20 6 > 2 B 21 18 > 3 A 7 0 > 3 B 8 4 > ", header=TRUE) > > fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial) > summary(fit1) > > Note that the default baseline value may differ between R and SAS, > which would result in a reversed sign on the slope coefficient (and > different intercept). > > To include replicate as a random effect you need an additional > package, here I use lme4 and the glmer function: > > library(lme4) > fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test, > family=binomial) > summary(fit2) > > > > On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca> wrote: >> Hello R-experts, >> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments. >> >> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS. >> >> >> data test; >> input replicate treatment$ n X; >> datalines; >> 1 A 32 4 >> 1 B 33 18 >> 2 A 20 6 >> 2 B 21 18 >> 3 A 7 0 >> 3 B 8 4 >> ; >>Greg has already shown you how that is done in R and how to do logistic regression: # I usually think of Poisson regression when I hear a desire is to model ratios of counts that have a denominator. The log(sample_size) is supplied as an offset to correct for the variation in size of subsamples. fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson) summary(fit1) # And the lme4 analogue with replication: library(lme4) fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test, family=poisson) summary(fit2) #----output---- Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: poisson ( log ) Formula: X ~ treatment + offset(log(n)) + (1 | replicate) Data: test AIC BIC logLik deviance df.resid 31.9 31.3 -13.0 25.9 3 Scaled residuals: Min 1Q Median 3Q Max -1.0504 -0.4146 -0.3487 0.3956 1.0791 Random effects: Groups Name Variance Std.Dev. replicate (Intercept) 0.03159 0.1777 Number of obs: 6, groups: replicate, 3 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.7875 0.3372 -5.301 1.15e-07 *** treatmentB 1.3365 0.3529 3.787 0.000152 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Correlation of Fixed Effects: (Intr) treatmentB -0.838 Compare with the binomial model: #=========== fitBin <- glmer( cbind(X,n-X) ~ treatment + (1|replicate), data=test, family=binomial) coef(fitBin) #---- $replicate (Intercept) treatmentB 1 -2.0487694 2.364695 2 -0.9908556 2.364695 3 -2.1844435 2.364695 attr(,"class") [1] "coef.mer" #----- summary(fitBin) #--------- Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: binomial ( logit ) Formula: cbind(X, n - X) ~ treatment + (1 | replicate) Data: test AIC BIC logLik deviance df.resid 30.1 29.4 -12.0 24.1 3 Scaled residuals: Min 1Q Median 3Q Max -0.88757 -0.35065 -0.03137 0.26897 0.67505 Random effects: Groups Name Variance Std.Dev. replicate (Intercept) 0.4123 0.6421 Number of obs: 6, groups: replicate, 3 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.7442 0.5438 -3.208 0.00134 ** treatmentB 2.3647 0.4741 4.988 6.11e-07 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Correlation of Fixed Effects: (Intr) treatmentB -0.568 The binomial model has a logit link. Your glimmix procedure appears to have a gaussian/normal distributional assumption and an identity link by default. If we run this using those assumptions in lme4::glmer we get these results (with a warning that in this case we can overlook since the results with lmer turned out to be identical) #-------- fitNorm <- glmer( I(X/n) ~ treatment + (1|replicate), data=test, family=gaussian) #------- Warning message: In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) : calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly> coef(fitNorm); summary(fitNorm)$replicate (Intercept) treatmentB 1 0.091096925 0.4925325 2 0.324579602 0.4925325 3 0.009323473 0.4925325 attr(,"class") [1] "coef.mer" Linear mixed model fit by REML ['lmerMod'] Formula: I(X/n) ~ treatment + (1 | replicate) Data: test REML criterion at convergence: -4.2 Scaled residuals: Min 1Q Median 3Q Max -0.7864 -0.4278 -0.1152 0.5143 0.8246 Random effects: Groups Name Variance Std.Dev. replicate (Intercept) 0.027895 0.16702 Residual 0.002356 0.04854 Number of obs: 6, groups: replicate, 3 Fixed effects: Estimate Std. Error t value (Intercept) 0.14167 0.10042 1.411 treatmentB 0.49253 0.03963 12.427 Correlation of Fixed Effects: (Intr) treatmentB -0.197 That's (probably) the model to compare to your SAS results if my reading of the SAS Proc GLIMMIX manual page is correct. -- David.>> proc glimmix data=test; >> class replicate treatment; >> model X/n = treatment / solution; >> random intercept / subject=replicate; >> run; >> >> ods select lsmeans; >> proc glimmix data=test; >> class replicate treatment; >> model X/n = treatment / solution; >> random intercept / subject=replicate; >> lsmeans treatment / cl ilink; >> run; >> >> I appreciate your help in advance! >> Joshua >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538280 at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
It is usually best to keep these discussions on the list. Someone else may have a better answer than mine, or be able to respond quicker, and if I answer on R-help then it is community service/involvement. If I respond directly then it is consulting and then we need a contract and I have to charge you (not that I would see the money myself, but it would help my budget be a little less red). For your question, your fit4 and my fit2 are just 2 different ways of fitting the exact same model to the exact same data, so there is no surprise that the results match. Which one to use is personal preference. The line that starts with "treatmentB" is the coefficient (log-odds-ratio) for B compared to A, so that is the main line to look at for interpretation. The correlation of the fixed effects is mainly there for diagnostics, if it is too close to -1 or 1 then that indicates that assumptions may not hold, or computations may be in doubt. Your value is not of concern. On Wed, Sep 28, 2016 at 2:14 PM, Shuhua Zhan <szhan at uoguelph.ca> wrote:> Hi Greg, > Thank you very much for your help! > I'd like to use glmer. From the output of summary(fit2) as below, Could I > draw a conclusion that the treatment B > significantly increases the counts of x group (p=6.11e-07)? I'm wondering if > I could know that the treatment B > significantly increases the ratio of x group (X/n) and how I could obtain > the mean ratios of treatment A and B. > To this end, should I fit the model using the ratio of X group (X/n)? I > tried it as > fit4 <- glmer( X/n ~ treatment + (1|replicate), data=test, family=binomial, > weights=n) > but summary(fit4) is the same as summary(fit2). > I also don't know how to interpret "Correlation of Fixed Effects: treatmentB > -0.568 in the output. >> summary(fit2) > Generalized linear mixed model fit by maximum likelihood (Laplace > Approximation) [glmerMod] > Family: binomial ( logit ) > Formula: cbind(X, n - X) ~ treatment + (1 | replicate) > Data: test > > AIC BIC logLik deviance df.resid > 30.1 29.4 -12.0 24.1 3 > > Scaled residuals: > Min 1Q Median 3Q Max > -0.88757 -0.35065 -0.03137 0.26897 0.67505 > > Random effects: > Groups Name Variance Std.Dev. > replicate (Intercept) 0.4123 0.6421 > Number of obs: 6, groups: replicate, 3 > > Fixed effects: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.7442 0.5438 -3.208 0.00134 ** > treatmentB 2.3647 0.4741 4.988 6.11e-07 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Correlation of Fixed Effects: > (Intr) > treatmentB -0.568 > > Thanks again, > Joshua > > ________________________________ > From: Greg Snow <538280 at gmail.com> > Sent: Wednesday, September 28, 2016 12:49:49 PM > To: Shuhua Zhan > Cc: r-help at R-project.org > Subject: Re: [R] How to test a difference in ratios of count data in R > > There are multiple ways of doing this, but here are a couple. > > To just test the fixed effect of treatment you can use the glm function: > > test <- read.table(text=" > replicate treatment n X > 1 A 32 4 > 1 B 33 18 > 2 A 20 6 > 2 B 21 18 > 3 A 7 0 > 3 B 8 4 > ", header=TRUE) > > fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial) > summary(fit1) > > Note that the default baseline value may differ between R and SAS, > which would result in a reversed sign on the slope coefficient (and > different intercept). > > To include replicate as a random effect you need an additional > package, here I use lme4 and the glmer function: > > library(lme4) > fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test, > family=binomial) > summary(fit2) > > > > On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca> wrote: >> Hello R-experts, >> I am interested to determine if the ratio of counts from two groups differ >> across two distinct treatments. For example, we have three replicates of >> treatment A, and three replicates of treatment B. For each treatment, we >> have counts X from one group and counts Y from another group. My >> understanding is that that GLIMMIX procedure in SAS can calculate whether >> the ratio of counts in one group (X/X+Y) significantly differs between >> treatments. >> >> I think this is the way you do it in SAS. The replicate and treatment >> variables are self-explanatory. The first number (n) refers to the total >> counts X + Y; the second number (X) refers to the counts X. Is there a way >> to do this in R? Since we have 20,000 datasets to be tested, it may be >> easier to retrive the significant test as the given dataset below and its >> p>F value and mean ratios of treatments in R than SAS. >> >> >> data test; >> input replicate treatment$ n X; >> datalines; >> 1 A 32 4 >> 1 B 33 18 >> 2 A 20 6 >> 2 B 21 18 >> 3 A 7 0 >> 3 B 8 4 >> ; >> >> proc glimmix data=test; >> class replicate treatment; >> model X/n = treatment / solution; >> random intercept / subject=replicate; >> run; >> >> ods select lsmeans; >> proc glimmix data=test; >> class replicate treatment; >> model X/n = treatment / solution; >> random intercept / subject=replicate; >> lsmeans treatment / cl ilink; >> run; >> >> I appreciate your help in advance! >> Joshua >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538280 at gmail.com-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com