thr3ads.net - R help - [R] comparing proportions [Feb 2011]

If this information is useful, please help other people find it:
Share via:

array chip

2011-Feb-09 23:10 UTC

[R] comparing proportions

Hi, I have a dataset that has 2 groups of samples. For each sample, then 
response measured is the number of success (no.success) obatined with the number
of trials (no.trials). So a porportion of success (prpop.success) can be 
computed as no.success/no.trials. Now the objective is to test if there is a 
statistical significant difference in the proportion of success between the 2 
groups of samples (say n1=20, n2=30).

I can think of 2 ways to do the test:

1. regular t test based on the variable prop.success
2. Mann-Whitney test based on the variable prop.success
2. do a binomial regression as:
    fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data, 
         family=binomial)
    anova(fit, test='Chisq')

My questions is:
1. Is t test appropriate for comparing 2 groups of proportions?
2. how about Mann-Whitney non-parametric test?
3. Among the 3, which technique is more appropriate?
4. any other technique you can suggest?

Thank you,

John


      
	[[alternative HTML version deleted]]

array chip

2011-Feb-10 00:51 UTC

head link

[R] comparing proportions

Hi Bert,

Thanks for your reply. If I understand correctly, prop.test() is not suitable to
my situation. The input to prop.test() is 2 numbers for each group (# of success
and # of trials, for example, groups 1 has 5 success out of 10 trials; group 2 
has 3 success out of 7 trials; etc. prop.test() tests whether the probability of
success is the same across groups.

In my case, each group has several subjects and each subject has 2 numbers (# 
success and # trials). So 

for group 1:
subject 1: 5 success, 10 trials
subject 2: 3 success, 8 trials
:
:

for group 2:
subject a: 7 success, 9 trials
subject b: 6 success, 7 trials
:
:

I want to test whether the probability of success in group 1 is the same as in 
group 2. It's like comparing 2 groups of samples using t test, what I am 
uncertain about is that whether regular t test (or non-pamametric test) is still
appropriate here when the response variable is actually proportions.

I guess prop.test() can not be used with my dataset, or I may be wrong?

Thanks

John
 
 
 




________________________________
From: Bert Gunter <gunter.berton@gene.com>

Sent: Wed, February 9, 2011 3:58:05 PM
Subject: Re: [R] comparing proportions

1. Is this a homework problem?

2. ?prop.test

3. If you haven't done so already, get and consult a basic statistical
methods book to help you with questions such as this.

-- Bert

> Hi, I have a dataset that has 2 groups of samples. For each sample, then
> response measured is the number of success (no.success) obatined with the 
>number
> of trials (no.trials). So a porportion of success (prpop.success) can be
> computed as no.success/no.trials. Now the objective is to test if there is
a
> statistical significant difference in the proportion of success between the
2
> groups of samples (say n1=20, n2=30).
>
> I can think of 2 ways to do the test:
>
> 1. regular t test based on the variable prop.success
> 2. Mann-Whitney test based on the variable prop.success
> 2. do a binomial regression as:
>     fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
>          family=binomial)
>     anova(fit, test='Chisq')
>
> My questions is:
> 1. Is t test appropriate for comparing 2 groups of proportions?
> 2. how about Mann-Whitney non-parametric test?
> 3. Among the 3, which technique is more appropriate?
> 4. any other technique you can suggest?
>
> Thank you,
>
> John
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Bert Gunter
Genentech Nonclinical Biostatistics



      
	[[alternative HTML version deleted]]

Robert A LaBudde

2011-Feb-10 20:54 UTC

head link

[R] comparing proportions

1. If you use a random effects model, you should make Subject the 
random factor. I.e., a random intercepts model with 1|Subject. Group 
is a fixed effect: You have only 2 groups. Even if you had more than 
2 groups, treating Group as random would return a standard deviation, 
not a P-value as you wanted. Finally, I doubt you believe the groups 
used are meaningless, and only the population of groups is of 
interest. Instead you consider them special, so Group is a fixed effect.

2. The number of observations for each Subject is the number of 
trials, which you previously indicated were 7 to 10 in the cases listed.

3. If you have no interest in the Subject effect, you can use a fixed 
Subject factor instead with glm() instead of glmer() or other mixed 
model function. This is a good idea so long as the number of subjects 
is, say, less than 10. Otherwise a mixed model would be a better idea.

I suggest you fit all three models to learn about what you're doing: 
1) glmer() or equivalent, with cbind(successes, failures) ~ 1|Subject 
+ Group; 2) glm() with cbind(successes, failures) ~ Subject + Group; 
and 3) lm(p ~ Subject + Group), where p is the proportion success for 
a particular subject and group.

Then compare the results. They will probably all 3 give the same 
conclusion to the hypothesis question about Group. I would guess the 
glmer() P-value will be larger, then the glm() and finally the lm(), 
but the last two may reverse. The lm() model may actually perform 
fairly well, as the Edgeworth series converges rapidly to normal for 
binomial distributions with p within 0.15 to 0.85 and 10+ replicates, 
as I stated before.

I'd be interested in seeing the results of these 3 fits myself just 
for curiosity.

At 01:21 PM 2/10/2011, array chip wrote:>Robert and Bert, thank you both very much for the response, really 
>appreciated. I agree that using regular ANOVA (or regular t test) 
>may not be wise during the normality issue. So I am debating between 
>generalized linear model using glm(.., family=binomial) or 
>generalized linear mixed effect model using glmer(..., 
>family=binomial). I will forward to Robert an offline list email I 
>sent to Bert about whether using (1|subject) versus (1|group) in 
>mixed model specification. If using (1|group), both models will give 
>me the same testing for fixed effects, which is what I am mainly 
>interested in. So do I really need a mixed model here?
>
>Thanks again
>
>John
>
>
>From: Bert Gunter <gunter.berton at gene.com>
>To: Robert A LaBudde <ral at lcfltd.com>
>Cc: array chip <arrayprofile at yahoo.com>
>Sent: Thu, February 10, 2011 10:04:06 AM
>Subject: Re: [R] comparing proportions
>
>Robert:
>
>Yes, exactly. In an offlist email exchange, he clarified this for me,
>and I suggested exactly what you did, also with the cautions that his
>initial ad hoc suggestions were unwise. His subsequent post to R-help
>and the sig-mixed-models lists were the result, although he appears to
>have specified the model incorrectly in his glmer function (as
>(1|Group) instead of (1|subject).
>
>Cheers,
>Bert
>
>On Thu, Feb 10, 2011 at 9:55 AM, Robert A LaBudde 
><<mailto:ral at lcfltd.com>ral at lcfltd.com> wrote:
> > prop.test() is applicable to a binomial experiment in each of two
classes.
> >
> > Your experiment is binomial only at the subject level. You then have
> > multiple subjects in each of your groups.
> >
> > You have a random factor "Subjects" that must be accounted
for.
> >
> > The best way to analyze is a generalized linear mixed model with a
binomial
> > distribution family and a logit or probit link. You will probably have
to
> > investigate overdispersion. If you have a small number of subjects,
and
> > don't care about the among-subject effect, you can model them as
fixed
> > effects and use glm() instead.
> >
> > Your original question, I believe, related to doing an ANOVA assuming
> > normality. In order for this to work with this kind of proportion
problem,
> > you generally won't get good results unless the number of
replicates per
> > subject is 12 or more, and the proportions involved are within 
> 0.15 to 0.85.
> > Otherwise you will have biased confidence intervals and significance
tests.
> >
> >
> >
> > At 07:51 PM 2/9/2011, array chip wrote:
> >>
> >> Content-type: text/plain
> >> Content-disposition: inline
> >> Content-length: 2969
> >>
> >> Hi Bert,
> >>
> >> Thanks for your reply. If I understand correctly, prop.test() is
not
> >> suitable to
> >> my situation. The input to prop.test() is 2 numbers for each group
(# of
> >> success
> >> and # of trials, for example, groups 1 has 5 success out of 10
trials;
> >> group 2
> >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> >> probability of
> >> success is the same across groups.
> >>
> >> In my case, each group has several subjects and each subject has 2
numbers
> >> (#
> >> success and # trials). So
> >>
> >> for group 1:
> >> subject 1: 5 success, 10 trials
> >> subject 2: 3 success, 8 trials
> >> :
> >> :
> >>
> >> for group 2:
> >> subject a: 7 success, 9 trials
> >> subject b: 6 success, 7 trials
> >> :
> >> :
> >>
> >> I want to test whether the probability of success in group 1 is
the same
> >> as in
> >> group 2. It's like comparing 2 groups of samples using t test,
what I am
> >> uncertain about is that whether regular t test (or non-pamametric
test) is
> >> still
> >> appropriate here when the response variable is actually
proportions.
> >>
> >> I guess prop.test() can not be used with my dataset, or I may be
wrong?
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Bert Gunter <<mailto:gunter.berton at
gene.com>gunter.berton at gene.com>
> >>
> >> Sent: Wed, February 9, 2011 3:58:05 PM
> >> Subject: Re: [R] comparing proportions
> >>
> >> 1. Is this a homework problem?
> >>
> >> 2. ?prop.test
> >>
> >> 3. If you haven't done so already, get and consult a basic
statistical
> >> methods book to help you with questions such as this.
> >>
> >> -- Bert
> >>
> >>
> >> > Hi, I have a dataset that has 2 groups of samples. For each
sample, then
> >> > response measured is the number of success (no.success)
obatined with
> >> > the
> >> >number
> >> > of trials (no.trials). So a porportion of success
(prpop.success) can be
> >> > computed as no.success/no.trials. Now the objective is to
test if there
> >> > is a
> >> > statistical significant difference in the proportion of
success between
> >> > the 2
> >> > groups of samples (say n1=20, n2=30).
> >> >
> >> > I can think of 2 ways to do the test:
> >> >
> >> > 1. regular t test based on the variable prop.success
> >> > 2. Mann-Whitney test based on the variable prop.success
> >> > 2. do a binomial regression as:
> >> >     fit<-glm(cbind(no.success,no.trials-no.success) ~
group, data=data,
> >> >          family=binomial)
> >> >     anova(fit, test='Chisq')
> >> >
> >> > My questions is:
> >> > 1. Is t test appropriate for comparing 2 groups of
proportions?
> >> > 2. how about Mann-Whitney non-parametric test?
> >> > 3. Among the 3, which technique is more appropriate?
> >> > 4. any other technique you can suggest?
> >> >
> >> > Thank you,
> >> >
> >> > John
> >> >
> >> >
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> >
> >> > ______________________________________________
> >> > <mailto:R-help at r-project.org>R-help at r-project.org
mailing list
> >> > 
>
<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > 
>
<http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible
code.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> <mailto:R-help at r-project.org>R-help at r-project.org
mailing list
> >> 
>
<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> 
>
<http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ===============================================================>
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail:
> <mailto:ral at lcfltd.com>ral at lcfltd.com
> > Least Cost Formulations, Ltd.            URL: 
> <http://lcfltd.com/>http://lcfltd.com/
> > 824 Timberlake Drive                     Tel: 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> >
> > "Vere scire est per causas scire"
> > ===============================================================>
>
> >
>
>
>
>--
>Bert Gunter
>Genentech Nonclinical Biostatistics
>467-7374
><http://devo.gene.com/groups/devo/depts/ncb/home.shtml>http://devo.gene.com/groups/devo/depts/ncb/home.shtml
===============================================================Robert A.
LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

Apparently Analagous Threads

Search for more reasonably related threads

R help - Feb 2011 - comparing proportions

[R] comparing proportions

[R] comparing proportions

[R] comparing proportions

Apparently Analagous Threads