thr3ads.net - R help - [R] How to test a difference in ratios of count data in R [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Shuhua Zhan

2016-Sep-28 03:03 UTC

[R] How to test a difference in ratios of count data in R

Hello R-experts,
I am interested to determine if the ratio of counts from two groups differ
across two distinct treatments. For example, we have three replicates of
treatment A, and three replicates of treatment B. For each treatment, we have
counts X from one group and counts Y from another group. My understanding is
that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in
one group (X/X+Y) significantly differs between treatments.

I think this is the way you do it in SAS. The replicate and treatment variables
are self-explanatory. The first number (n) refers to the total counts X + Y; the
second number (X) refers to the counts X. Is there a way to do this in R? Since
we have 20,000 datasets to be tested, it may be easier to retrive the
significant test as the given dataset below and its p>F value and mean ratios
of treatments in R than SAS.


data test;
input replicate treatment$ n X;
datalines;
1 A 32 4
1 B 33 18
2 A 20 6
2 B 21 18
3 A 7 0
3 B 8 4
;

proc glimmix data=test;
class replicate treatment;
model X/n = treatment / solution;
random intercept / subject=replicate;
run;

ods select lsmeans;
proc glimmix data=test;
class replicate treatment;
model X/n = treatment / solution;
random intercept / subject=replicate;
lsmeans treatment / cl ilink;
run;

I appreciate your help in advance!
Joshua


	[[alternative HTML version deleted]]

Greg Snow

2016-Sep-28 16:49 UTC

head link

[R] How to test a difference in ratios of count data in R

There are multiple ways of doing this, but here are a couple.

To just test the fixed effect of treatment you can use the glm function:

test <- read.table(text="
replicate treatment n X
1 A 32 4
1 B 33 18
2 A 20 6
2 B 21 18
3 A 7 0
3 B 8 4
", header=TRUE)

fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
summary(fit1)

Note that the default baseline value may differ between R and SAS,
which would result in a reversed sign on the slope coefficient (and
different intercept).

To include replicate as a random effect you need an additional
package, here I use lme4 and the glmer function:

library(lme4)
fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
family=binomial)
summary(fit2)



On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca>
wrote:> Hello R-experts,
> I am interested to determine if the ratio of counts from two groups differ
across two distinct treatments. For example, we have three replicates of
treatment A, and three replicates of treatment B. For each treatment, we have
counts X from one group and counts Y from another group. My understanding is
that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in
one group (X/X+Y) significantly differs between treatments.
>
> I think this is the way you do it in SAS. The replicate and treatment
variables are self-explanatory. The first number (n) refers to the total counts
X + Y; the second number (X) refers to the counts X. Is there a way to do this
in R? Since we have 20,000 datasets to be tested, it may be easier to retrive
the significant test as the given dataset below and its p>F value and mean
ratios of treatments in R than SAS.
>
>
> data test;
> input replicate treatment$ n X;
> datalines;
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ;
>
> proc glimmix data=test;
> class replicate treatment;
> model X/n = treatment / solution;
> random intercept / subject=replicate;
> run;
>
> ods select lsmeans;
> proc glimmix data=test;
> class replicate treatment;
> model X/n = treatment / solution;
> random intercept / subject=replicate;
> lsmeans treatment / cl ilink;
> run;
>
> I appreciate your help in advance!
> Joshua
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com

David Winsemius

2016-Sep-28 20:54 UTC

head link

[R] How to test a difference in ratios of count data in R

> On Sep 28, 2016, at 9:49 AM, Greg Snow <538280 at gmail.com> wrote:
> 
> There are multiple ways of doing this, but here are a couple.
> 
> To just test the fixed effect of treatment you can use the glm function:
> 
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
> 
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
> 
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
> 
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
> 
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
> 
> 
> 
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca>
wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups
differ across two distinct treatments. For example, we have three replicates of
treatment A, and three replicates of treatment B. For each treatment, we have
counts X from one group and counts Y from another group. My understanding is
that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in
one group (X/X+Y) significantly differs between treatments.
>> 
>> I think this is the way you do it in SAS. The replicate and treatment
variables are self-explanatory. The first number (n) refers to the total counts
X + Y; the second number (X) refers to the counts X. Is there a way to do this
in R? Since we have 20,000 datasets to be tested, it may be easier to retrive
the significant test as the given dataset below and its p>F value and mean
ratios of treatments in R than SAS.
>> 
>> 
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>> 
Greg has already shown you how that is done in R and how to do logistic
regression:

#  I usually think of Poisson regression when I hear a desire is to model ratios
of counts that have a denominator. The log(sample_size) is supplied as an offset
to correct for the variation in size of subsamples.


fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
summary(fit1)

#  And the lme4 analogue with replication:

library(lme4)
fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
family=poisson)
summary(fit2)
#----output----
Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation)
 [glmerMod]
 Family: poisson  ( log )
Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid 
    31.9     31.3    -13.0     25.9        3 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.0504 -0.4146 -0.3487  0.3956  1.0791 

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.03159  0.1777  
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
treatmentB    1.3365     0.3529   3.787 0.000152 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.838

Compare with the binomial model:
#===========

 fitBin <- glmer( cbind(X,n-X) ~ treatment +  (1|replicate), data=test,
 family=binomial)
 coef(fitBin)
#----
$replicate
  (Intercept) treatmentB
1  -2.0487694   2.364695
2  -0.9908556   2.364695
3  -2.1844435   2.364695

attr(,"class")
[1] "coef.mer"
#-----
 summary(fitBin)
#---------
Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation)
 [glmerMod]
 Family: binomial  ( logit )
Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid 
    30.1     29.4    -12.0     24.1        3 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-0.88757 -0.35065 -0.03137  0.26897  0.67505 

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.4123   0.6421  
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7442     0.5438  -3.208  0.00134 ** 
treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.568

The binomial model has a logit link. Your glimmix procedure appears to have a
gaussian/normal distributional assumption and an identity link by default. If we
run this using those assumptions in lme4::glmer we get these results (with a
warning that in this case we can overlook since the results with lmer turned out
to be identical)
#--------
 fitNorm <- glmer( I(X/n) ~ treatment +  (1|replicate), data=test,
                  family=gaussian)

#-------
Warning message:
In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
  calling glmer() with family=gaussian (identity link) as a shortcut to lmer()
is deprecated; please call lmer() directly> coef(fitNorm); summary(fitNorm)$replicate
  (Intercept) treatmentB
1 0.091096925  0.4925325
2 0.324579602  0.4925325
3 0.009323473  0.4925325

attr(,"class")
[1] "coef.mer"
Linear mixed model fit by REML ['lmerMod']
Formula: I(X/n) ~ treatment + (1 | replicate)
   Data: test

REML criterion at convergence: -4.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.7864 -0.4278 -0.1152  0.5143  0.8246 

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.027895 0.16702 
 Residual              0.002356 0.04854 
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.14167    0.10042   1.411
treatmentB   0.49253    0.03963  12.427

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.197

That's (probably) the model to compare to your SAS results if my reading of
the SAS Proc GLIMMIX manual page is correct.

-- 
David.
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>> 
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>> 
>> I appreciate your help in advance!
>> Joshua
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Greg Snow

2016-Sep-29 19:06 UTC

head link

[R] How to test a difference in ratios of count data in R

It is usually best to keep these discussions on the list.  Someone
else may have a better answer than mine, or be able to respond
quicker, and if I answer on R-help then it is community
service/involvement.  If I respond directly then it is consulting and
then we need a contract and I have to charge you (not that I would see
the money myself, but it would help my budget be a little less red).

For your question, your fit4 and my fit2 are just 2 different ways of
fitting the exact same model to the exact same data, so there is no
surprise that the results match.  Which one to use is personal
preference.

The line that starts with "treatmentB" is the coefficient
(log-odds-ratio) for B compared to A, so that is the main line to look
at for interpretation.

The correlation of the fixed effects is mainly there for diagnostics,
if it is too close to -1 or 1 then that indicates that assumptions may
not hold, or computations may be in doubt.  Your value is not of
concern.


On Wed, Sep 28, 2016 at 2:14 PM, Shuhua Zhan <szhan at uoguelph.ca>
wrote:> Hi Greg,
> Thank you very much for your help!
> I'd like to use glmer. From the output of summary(fit2) as below, Could
I
> draw a conclusion that the treatment B
> significantly increases the counts of x group (p=6.11e-07)? I'm
wondering if
> I could know that the treatment B
> significantly increases the ratio of x group (X/n) and how I could obtain
> the mean ratios of treatment A and B.
> To this end, should I fit the model using the ratio of X group (X/n)?  I
> tried it as
> fit4 <- glmer( X/n ~ treatment + (1|replicate), data=test,
family=binomial,
> weights=n)
> but summary(fit4) is the same as summary(fit2).
> I also don't know how to interpret "Correlation of Fixed Effects:
treatmentB
> -0.568 in the output.
>> summary(fit2)
> Generalized linear mixed model fit by maximum likelihood (Laplace
>   Approximation) [glmerMod]
>  Family: binomial  ( logit )
> Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
>    Data: test
>
>      AIC      BIC   logLik deviance df.resid
>     30.1     29.4    -12.0     24.1        3
>
> Scaled residuals:
>      Min       1Q   Median       3Q      Max
> -0.88757 -0.35065 -0.03137  0.26897  0.67505
>
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  replicate (Intercept) 0.4123   0.6421
> Number of obs: 6, groups:  replicate, 3
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -1.7442     0.5438  -3.208  0.00134 **
> treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Correlation of Fixed Effects:
>            (Intr)
> treatmentB -0.568
>
> Thanks again,
> Joshua
>
> ________________________________
> From: Greg Snow <538280 at gmail.com>
> Sent: Wednesday, September 28, 2016 12:49:49 PM
> To: Shuhua Zhan
> Cc: r-help at R-project.org
> Subject: Re: [R] How to test a difference in ratios of count data in R
>
> There are multiple ways of doing this, but here are a couple.
>
> To just test the fixed effect of treatment you can use the glm function:
>
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
>
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
>
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
>
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
>
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
>
>
>
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca>
wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups
differ
>> across two distinct treatments. For example, we have three replicates
of
>> treatment A, and three replicates of treatment B. For each treatment,
we
>> have counts X from one group and counts Y from another group. My
>> understanding is that that GLIMMIX procedure in SAS can calculate
whether
>> the ratio of counts in one group (X/X+Y) significantly differs between
>> treatments.
>>
>> I think this is the way you do it in SAS. The replicate and treatment
>> variables are self-explanatory. The first number (n) refers to the
total
>> counts X + Y; the second number (X) refers to the counts X. Is there a
way
>> to do this in R? Since we have 20,000 datasets to be tested, it may be
>> easier to retrive the significant test as the given dataset below and
its
>> p>F value and mean ratios of treatments in R than SAS.
>>
>>
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>>
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>>
>> I appreciate your help in advance!
>> Joshua
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com


-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com

R help - Sep 2016 - How to test a difference in ratios of count data in R

[R] How to test a difference in ratios of count data in R

[R] How to test a difference in ratios of count data in R

[R] How to test a difference in ratios of count data in R

[R] How to test a difference in ratios of count data in R