thr3ads.net - R help - [R] OT: A test with dependent samples. [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Rolf Turner

2009-Feb-10 21:33 UTC

[R] OT: A test with dependent samples.

I am appealing to the general collective wisdom of this
list in respect of a statistics (rather than R) question.  This question
comes to me from a friend who is a veterinary oncologist.  In a study  
that
she is writing up there were 73 cats who were treated with a drug called
piroxicam.  None of the cats were observed to be subject to vomiting  
prior
to treatment; 12 of the cats were subject to vomiting after treatment
commenced.  She wants to be able to say that the treatment had a  
``significant''
impact with respect to this unwanted side-effect.

Initially she did a chi-squared test.  (Presumably on the matrix
matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't  
pursue
this.) I pointed out to her that because of the dependence --- same 73
cats pre- and post- treatment --- the chi-squared test is inappropriate.

So what *is* appropriate?  There is a dependence structure of some sort,
but it seems to me to be impossible to estimate.

After mulling it over for a long while (I'm slow!) I decided that a
non-parametric approach, along the following lines, makes sense:

We have 73 independent pairs of outcomes (a,b) where a or b is 0
if the cat didn't barf, and is 1 if it did barf.

We actually observe 61 (0,0) pairs and 12 (0,1) pairs.

If there is no effect from the piroxicam, then (0,1) and (1,0) are
equally likely.  So given that the outcome is in {(0,1),(1,0)} the
probability of each is 1/2.

Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis)
the probability of each entry is 1/2.  Hence the probability of this
sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) test
is 0.00024.  Hence the result is ``significant'' at the usual levels,
and my vet friend is happy.

I would very much appreciate comments on my reasoning.  Have I made any
goof-ups, missed any obvious pit-falls?  Gone down a wrong garden path?

Is there a better approach?

Most importantly (!!!): Is there any literature in which this  
approach is
spelled out?  (The journal in which she wishes to publish will almost  
surely
demand a citation.  They *won't* want to see the reasoning spelled  
out in
the paper.)

I would conjecture that this sort of scenario must arise reasonably  
often
in medical statistics and the suggested approach (if it is indeed valid
and sensible) would be ``standard''.  It might even have a name!  But I
have no idea where to start looking, so I thought I'd ask this  
wonderfully
learned list.

Thanks for any input.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

David Winsemius

2009-Feb-10 21:50 UTC

head link

[R] OT: A test with dependent samples.

In the biomedical arena, at least as I learned from Rosner's  
introductory text, the usual approach to analyzing paired 2 x 2 tables  
is McNemar's test.

?mcnemar.test

 > mcnemar.test(matrix(c(73,0,61,12),2,2))

	McNemar's Chi-squared test with continuity correction

data:  matrix(c(73, 0, 61, 12), 2, 2)
McNemar's chi-squared = 59.0164, df = 1, p-value = 1.564e-14

The help page has citation to Agresti.

-- 
David winsemius
On Feb 10, 2009, at 4:33 PM, Rolf Turner wrote:
>
> I am appealing to the general collective wisdom of this
> list in respect of a statistics (rather than R) question.  This  
> question
> comes to me from a friend who is a veterinary oncologist.  In a  
> study that
> she is writing up there were 73 cats who were treated with a drug  
> called
> piroxicam.  None of the cats were observed to be subject to vomiting  
> prior
> to treatment; 12 of the cats were subject to vomiting after treatment
> commenced.  She wants to be able to say that the treatment had a  
> ``significant''
> impact with respect to this unwanted side-effect.
>
> Initially she did a chi-squared test.  (Presumably on the matrix
> matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't
> pursue
> this.) I pointed out to her that because of the dependence --- same 73
> cats pre- and post- treatment --- the chi-squared test is  
> inappropriate.
>
> So what *is* appropriate?  There is a dependence structure of some  
> sort,
> but it seems to me to be impossible to estimate.
>
> After mulling it over for a long while (I'm slow!) I decided that a
> non-parametric approach, along the following lines, makes sense:
>
> We have 73 independent pairs of outcomes (a,b) where a or b is 0
> if the cat didn't barf, and is 1 if it did barf.
>
> We actually observe 61 (0,0) pairs and 12 (0,1) pairs.
>
> If there is no effect from the piroxicam, then (0,1) and (1,0) are
> equally likely.  So given that the outcome is in {(0,1),(1,0)} the
> probability of each is 1/2.
>
> Thus we have a sequence of 12 (0,1)-s where (under the null  
> hypothesis)
> the probability of each entry is 1/2.  Hence the probability of this
> sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided)  
> test
> is 0.00024.  Hence the result is ``significant'' at the usual
levels,
> and my vet friend is happy.
>
> I would very much appreciate comments on my reasoning.  Have I made  
> any
> goof-ups, missed any obvious pit-falls?  Gone down a wrong garden  
> path?
>
> Is there a better approach?
>
> Most importantly (!!!): Is there any literature in which this  
> approach is
> spelled out?  (The journal in which she wishes to publish will  
> almost surely
> demand a citation.  They *won't* want to see the reasoning spelled  
> out in
> the paper.)
>
> I would conjecture that this sort of scenario must arise reasonably  
> often
> in medical statistics and the suggested approach (if it is indeed  
> valid
> and sensible) would be ``standard''.  It might even have a name!   
> But I
> have no idea where to start looking, so I thought I'd ask this  
> wonderfully
> learned list.
>
> Thanks for any input.
>
> 	cheers,
>
> 		Rolf Turner
>
> ######################################################################
> Attention:\ This e-mail message is privileged and confid...{{dropped: 
> 9}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

2009-Feb-10 22:49 UTC

head link

[R] OT: A test with dependent samples.

on 02/10/2009 03:33 PM Rolf Turner wrote:> 
> I am appealing to the general collective wisdom of this
> list in respect of a statistics (rather than R) question.  This question
> comes to me from a friend who is a veterinary oncologist.  In a study that
> she is writing up there were 73 cats who were treated with a drug called
> piroxicam.  None of the cats were observed to be subject to vomiting prior
> to treatment; 12 of the cats were subject to vomiting after treatment
> commenced.  She wants to be able to say that the treatment had a
> ``significant''
> impact with respect to this unwanted side-effect.
> 
> Initially she did a chi-squared test.  (Presumably on the matrix
> matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't
pursue
> this.) I pointed out to her that because of the dependence --- same 73
> cats pre- and post- treatment --- the chi-squared test is inappropriate.
> 
> So what *is* appropriate?  There is a dependence structure of some sort,
> but it seems to me to be impossible to estimate.
> 
> After mulling it over for a long while (I'm slow!) I decided that a
> non-parametric approach, along the following lines, makes sense:
> 
> We have 73 independent pairs of outcomes (a,b) where a or b is 0
> if the cat didn't barf, and is 1 if it did barf.
> 
> We actually observe 61 (0,0) pairs and 12 (0,1) pairs.
> 
> If there is no effect from the piroxicam, then (0,1) and (1,0) are
> equally likely.  So given that the outcome is in {(0,1),(1,0)} the
> probability of each is 1/2.
> 
> Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis)
> the probability of each entry is 1/2.  Hence the probability of this
> sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) test
> is 0.00024.  Hence the result is ``significant'' at the usual
levels,
> and my vet friend is happy.
> 
> I would very much appreciate comments on my reasoning.  Have I made any
> goof-ups, missed any obvious pit-falls?  Gone down a wrong garden path?
> 
> Is there a better approach?
> 
> Most importantly (!!!): Is there any literature in which this approach is
> spelled out?  (The journal in which she wishes to publish will almost
> surely
> demand a citation.  They *won't* want to see the reasoning spelled out
in
> the paper.)
> 
> I would conjecture that this sort of scenario must arise reasonably often
> in medical statistics and the suggested approach (if it is indeed valid
> and sensible) would be ``standard''.  It might even have a name! 
But I
> have no idea where to start looking, so I thought I'd ask this
wonderfully
> learned list.
> 
> Thanks for any input.
Rolf,

I am a little confused, perhaps due to lack of sleep (sick dog with CHF).

Typically in this type of study, essentially looking at the
efficacy/safety profile of a treatment, there are two options.

One does a two arm randomized study, whereby "subjects" are randomized
to one of two treatments. The two treatments may both be "active" or
one
may be a placebo. Then a typical two sample comparison of the primary
hypothesis is made. In this setting, you would have a second group of 73
cats who received a comparative treatment (or a placebo) to compare
against the 16.4% observed in this treatment group.

For example, say that patients were undergoing cancer treatment, which
has nausea and vomiting as a side effect. Due to the side effect, it is
common to see a reduction in dosing, which of course reduces treatment
effectiveness. You might want to study a treatment that favorably
reduces that side effect, to enable improved treatment dosing and
patient tolerance.

The other option would be to perform a single sample study, where there
is an a priori hypothesis, based upon prior work, of the expected
incidence of the adverse event or perhaps a "clinically acceptable"
incidence of the adverse event. This would seem to be the scenario
indicated above.

What is lacking is some a priori expectation of the incidence of the
event in question, so that one can show that you have reduced the
incidence from the expected.

50% would not make sense here, though if it did, a single sample
binomial test would be used, presuming a two-sided hypothesis:
> binom.test(12, 73, 0.5)$p.value[1] 4.802197e-09

That none of them had vomiting prior to treatment seems to be of little
interest here. You could just as easily argue that there was a
significant increase in the incidence of vomiting from 0% to 16.4% due
to the treatment.

What am I missing?

Regards,

Marc Schwartz

markleeds at verizon.net

2009-Feb-10 22:53 UTC

head link

[R] OT: A test with dependent samples.

Hi: Bert:  can you do that because the null is that they are equal 
before and after,
  not that the proportion is zero ? Thank for any clarification to my 
lack of understanding.




On Tue, Feb 10, 2009 at  5:43 PM, Bert Gunter wrote:
> Ah, experimental units,again ... a subject little taught by 
> statisticians
> that is often the crux of the matter. As here.
>
> The cat is the experimental unit. There are 73 of them. 12 of them
> experienced vomiting after treatment. What's a confidence interval for 
> the
> true proportion based on our sample of 73? binom.test(12,72) gives us 
> .088
> to .27 for an exact 2 sided interval (and a P value of 2.2e-16 for the 
> null
> = 0).
>
> Seems rather convincing -- and simple -- to me!
>
> -- Bert Gunter
>
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On
> Behalf Of David Winsemius
> Sent: Tuesday, February 10, 2009 1:51 PM
> To: Rolf Turner
> Cc: R-help Forum
> Subject: Re: [R] OT: A test with dependent samples.
>
> In the biomedical arena, at least as I learned from Rosner's 
> introductory text, the usual approach to analyzing paired 2 x 2 tables 
> is McNemar's test.
>
> ?mcnemar.test
>
>> mcnemar.test(matrix(c(73,0,61,12),2,2))
>
> 	McNemar's Chi-squared test with continuity correction
>
> data:  matrix(c(73, 0, 61, 12), 2, 2)
> McNemar's chi-squared = 59.0164, df = 1, p-value = 1.564e-14
>
> The help page has citation to Agresti.
>
> -- 
> David winsemius
> On Feb 10, 2009, at 4:33 PM, Rolf Turner wrote:
>
>>
>> I am appealing to the general collective wisdom of this
>> list in respect of a statistics (rather than R) question.  This 
>> question
>> comes to me from a friend who is a veterinary oncologist.  In a 
>> study that
>> she is writing up there were 73 cats who were treated with a drug 
>> called
>> piroxicam.  None of the cats were observed to be subject to vomiting 
>> prior
>> to treatment; 12 of the cats were subject to vomiting after treatment
>> commenced.  She wants to be able to say that the treatment had a 
>> ``significant''
>> impact with respect to this unwanted side-effect.
>>
>> Initially she did a chi-squared test.  (Presumably on the matrix
>> matrix(c(73,0,61,12),2,2) --- she didn't give details and I
didn't
>> pursue
>> this.) I pointed out to her that because of the dependence --- same 
>> 73
>> cats pre- and post- treatment --- the chi-squared test is 
>> inappropriate.
>>
>> So what *is* appropriate?  There is a dependence structure of some 
>> sort,
>> but it seems to me to be impossible to estimate.
>>
>> After mulling it over for a long while (I'm slow!) I decided that a
>> non-parametric approach, along the following lines, makes sense:
>>
>> We have 73 independent pairs of outcomes (a,b) where a or b is 0
>> if the cat didn't barf, and is 1 if it did barf.
>>
>> We actually observe 61 (0,0) pairs and 12 (0,1) pairs.
>>
>> If there is no effect from the piroxicam, then (0,1) and (1,0) are
>> equally likely.  So given that the outcome is in {(0,1),(1,0)} the
>> probability of each is 1/2.
>>
>> Thus we have a sequence of 12 (0,1)-s where (under the null 
>> hypothesis)
>> the probability of each entry is 1/2.  Hence the probability of this
>> sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) 
>> test
>> is 0.00024.  Hence the result is ``significant'' at the usual
levels,
>> and my vet friend is happy.
>>
>> I would very much appreciate comments on my reasoning.  Have I made 
>> any
>> goof-ups, missed any obvious pit-falls?  Gone down a wrong garden 
>> path?
>>
>> Is there a better approach?
>>
>> Most importantly (!!!): Is there any literature in which this 
>> approach is
>> spelled out?  (The journal in which she wishes to publish will 
>> almost surely
>> demand a citation.  They *won't* want to see the reasoning spelled 
>> out in
>> the paper.)
>>
>> I would conjecture that this sort of scenario must arise reasonably 
>> often
>> in medical statistics and the suggested approach (if it is indeed 
>> valid
>> and sensible) would be ``standard''.  It might even have a
name!
>> But I
>> have no idea where to start looking, so I thought I'd ask this 
>> wonderfully
>> learned list.
>>
>> Thanks for any input.
>>
>> 	cheers,
>>
>> 		Rolf Turner
>>
>> 
>> ######################################################################
>> Attention:\ This e-mail message is privileged and confid...{{dropped: 
>> 9}}
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

markleeds at verizon.net

2009-Feb-11 00:12 UTC

head link

[R] OT: A test with dependent samples.

Hi Bert:  I think what you said about a prior guess for  the NULL is 
also similar to what Chuck said about people looking with a blank stare. 
Thanks for the clarification.






On Tue, Feb 10, 2009 at  7:06 PM, Bert Gunter wrote:
> The only question at issue (i.e. capable of being addressed) is: is 
> giving
> the drug to non-vomiting cats associated with vomiting? (I would 
> strongly
> suspect that cats that were vomiting beforehand would have been 
> excluded
> from the study, as the researcher would have felt that one couldn't 
> then
> tell whether or not the drug caused vomiting problems for them. No?)
>
> There were 73 non-vomiting cats, 12 of which started vomiting after
> receiving the drug. All I can do is give a confidence interval for the
> estimated proportion of nonvomiting cats that vomit when given this 
> drug and
> perhaps ask whether it is consistent with their nonvomiting status 
> before.
> Which is what I did. And it's pretty convincing that giving the pill 
> is
> associated with vomiting, right?
>
> Whether the vomiting was associated with the giving of this 
> **particular**
> drug is, of course, impossible to tell, because the researcher failed 
> to
> include placebo controls. I chose 0 for a null as a representation of 
> their
> non-vomiting status, but the scientific question of interest is 
> probably to
> compare them to the proportion of cats that would vomit if given any 
> pill at
> all. Without any placebo controls, who can tell? Substitute a prior 
> guess if
> you like for a Null. Which is exactly the point that Marc Schwartz 
> made --
> that is, that the data are probably completely useless to answer the
> question of interest because the researcher messed up the design.
> -- Bert Gunter
>
> -----Original Message-----
> From: markleeds at verizon.net [mailto:markleeds at verizon.net] Sent: 
> Tuesday, February 10, 2009 2:54 PM
> To: Bert Gunter
> Cc: 'David Winsemius'; 'Rolf Turner'; r-help at
r-project.org
> Subject: Re: [R] OT: A test with dependent samples.
>
>   Hi: Bert:  can you do that because the null is that they are equal 
> before and after,
>   not that the proportion is zero ? Thank for any clarification to my 
> lack of understanding.
>
>
>
>
> On Tue, Feb 10, 2009 at  5:43 PM, Bert Gunter wrote:
>
>> Ah, experimental units,again ... a subject little taught by 
>> statisticians
>> that is often the crux of the matter. As here.
>>
>> The cat is the experimental unit. There are 73 of them. 12 of them
>> experienced vomiting after treatment. What's a confidence interval 
>> for the
>> true proportion based on our sample of 73? binom.test(12,72) gives us 
>> .088
>> to .27 for an exact 2 sided interval (and a P value of 2.2e-16 for 
>> the null
>> = 0).
>>
>> Seems rather convincing -- and simple -- to me!
>>
>> -- Bert Gunter
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On
>> Behalf Of David Winsemius
>> Sent: Tuesday, February 10, 2009 1:51 PM
>> To: Rolf Turner
>> Cc: R-help Forum
>> Subject: Re: [R] OT: A test with dependent samples.
>>
>> In the biomedical arena, at least as I learned from Rosner's 
>> introductory text, the usual approach to analyzing paired 2 x 2 
>> tables is McNemar's test.
>>
>> ?mcnemar.test
>>
>>> mcnemar.test(matrix(c(73,0,61,12),2,2))
>>
>> 	McNemar's Chi-squared test with continuity correction
>>
>> data:  matrix(c(73, 0, 61, 12), 2, 2)
>> McNemar's chi-squared = 59.0164, df = 1, p-value = 1.564e-14
>>
>> The help page has citation to Agresti.
>>
>> -- 
>> David winsemius
>> On Feb 10, 2009, at 4:33 PM, Rolf Turner wrote:
>>
>>>
>>> I am appealing to the general collective wisdom of this
>>> list in respect of a statistics (rather than R) question.  This 
>>> question
>>> comes to me from a friend who is a veterinary oncologist.  In a 
>>> study that
>>> she is writing up there were 73 cats who were treated with a drug 
>>> called
>>> piroxicam.  None of the cats were observed to be subject to
vomiting
>>> prior
>>> to treatment; 12 of the cats were subject to vomiting after 
>>> treatment
>>> commenced.  She wants to be able to say that the treatment had a 
>>> ``significant''
>>> impact with respect to this unwanted side-effect.
>>>
>>> Initially she did a chi-squared test.  (Presumably on the matrix
>>> matrix(c(73,0,61,12),2,2) --- she didn't give details and I
didn't
>>> pursue
>>> this.) I pointed out to her that because of the dependence --- same
>>> 73
>>> cats pre- and post- treatment --- the chi-squared test is 
>>> inappropriate.
>>>
>>> So what *is* appropriate?  There is a dependence structure of some 
>>> sort,
>>> but it seems to me to be impossible to estimate.
>>>
>>> After mulling it over for a long while (I'm slow!) I decided
that a
>>> non-parametric approach, along the following lines, makes sense:
>>>
>>> We have 73 independent pairs of outcomes (a,b) where a or b is 0
>>> if the cat didn't barf, and is 1 if it did barf.
>>>
>>> We actually observe 61 (0,0) pairs and 12 (0,1) pairs.
>>>
>>> If there is no effect from the piroxicam, then (0,1) and (1,0) are
>>> equally likely.  So given that the outcome is in {(0,1),(1,0)} the
>>> probability of each is 1/2.
>>>
>>> Thus we have a sequence of 12 (0,1)-s where (under the null 
>>> hypothesis)
>>> the probability of each entry is 1/2.  Hence the probability of
this
>>> sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) 
>>> test
>>> is 0.00024.  Hence the result is ``significant'' at the
usual
>>> levels,
>>> and my vet friend is happy.
>>>
>>> I would very much appreciate comments on my reasoning.  Have I made
>>> any
>>> goof-ups, missed any obvious pit-falls?  Gone down a wrong garden 
>>> path?
>>>
>>> Is there a better approach?
>>>
>>> Most importantly (!!!): Is there any literature in which this 
>>> approach is
>>> spelled out?  (The journal in which she wishes to publish will 
>>> almost surely
>>> demand a citation.  They *won't* want to see the reasoning
spelled
>>> out in
>>> the paper.)
>>>
>>> I would conjecture that this sort of scenario must arise reasonably
>>> often
>>> in medical statistics and the suggested approach (if it is indeed 
>>> valid
>>> and sensible) would be ``standard''.  It might even have a
name! But
>>> I
>>> have no idea where to start looking, so I thought I'd ask this 
>>> wonderfully
>>> learned list.
>>>
>>> Thanks for any input.
>>>
>>> 	cheers,
>>>
>>> 		Rolf Turner
>>>
>>>
>>> 
>>>
######################################################################
>>> Attention:\ This e-mail message is privileged and 
>>> confid...{{dropped: 9}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Meyners, Michael, LAUSANNE, AppliedMathematics

2009-Feb-11 10:27 UTC

head link

[R] OT: A test with dependent samples.

Rolf,

as you explicitly asked for a comment on your proposal: It is generally
equivalent to McNemar's test and maybe even more appropriate because of
the asymptotics involved in the chi-squared distribution, which might
not be too good with n=12. In some more detail:

McNemar's test basically considers the difference between the
off-diagonal elements, normalized by their sum. You do the same,
ignoring all (0,0) and (1,1) pairs (the latter do not occur in your
example anyway). binom.test(12,12,alternative="greater") gives you the
one-sided p-value.

If, however, you do not use the exact binomial distribution (with n=12)
but the normal approximation, you find E(X)=6 and VAR(X)=3 with X the
number of sequences (0,1), and you observed x=12. This gives you a
z-score of (12-6)/sqrt(3) and a corresponding one-sided p-value:
> pnorm(6/sqrt(3), lower.tail=F)[1] 0.0002660028

Further, McNemar's test WITHOUT continuity correction gives 
> mcnemar.test(matrix(c(61,0,12,0),2,2), correct=F)
        McNemar's Chi-squared test

data:  matrix(c(61, 0, 12, 0), 2, 2) 
McNemar's chi-squared = 12, df = 1, p-value = 0.000532

It comes as no surprise that the reported (two-sided!) p-value is
exactly twice the (one-sided!) p-value from the binomial test with
normal approximation:
> pnorm(6/sqrt(3), lower.tail=F)*2[1] 0.0005320055

I do not want to stress the pros and cons of continuity corrections, but
if neglected, you see that you get the same results (except that McNemar
is generally two-sided), due to the relation of normal and chi-squared
distribution. 

If you use the binomial test, you can forget about asymptotics and
continuity correction, that's why I indeed consider you approach
superior to the chi-squared approximation used in McNemar's test. But
you might fail to find a reference for the exact approach...


You briefly asked in a later mail testing for p=0. Indeed, _any_
incident will disprove this hypothesis, and the p value reported by 
> binom.test(1,73,p=0)
        Exact binomial test

data:  1 and 73 
number of successes = 1, number of trials = 73, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0 
95 percent confidence interval:
 0.0003467592 0.0739763232 
sample estimates:
probability of success 
            0.01369863 

is not wrong (as R just tells us it is BELOW a certain value), but could
be refined by saying "p-value = 0". BTW, I am not sure what
"p-value=TRUE" tells me, as derived from binom.test(0,73,p=0). I
personally don't care about either, as to test H0: p=0, I would not use
any software but rather stress some basic statistical theory.


It remains the questions whether McNemar (or your proposal) is generally
appropriate here. Like others, I have doubts, except maybe if
observations were made on two subsequent days, on the second of which
the drug was administered. How could you otherwise be sure that the
increase in incidences it not due to the progression of cancer, say,
which would be difficult to rule out. Also, from my experience and in
contrast to you point of view, I'd rather assume it likely that a vet is
much more reluctant to apply the new treatment to an already vomitting
cat, your "baseline" with 0 out of 73 at least seems suspicious, and
until proven wrong, I'd take the assumption that this is not by chance
only.
I understand that this is an observational study with no
control/randomization, but concluding a side-effect from a statistically
significant change in incidence rates from this study seems questionable
to me. What I'd propose (and used in comparable situations) is to
confine yourself on the confidence interval and let the investigator
decide / discuss whether the lower bound of this is higher than what she
would expect under control or alternative treatment. She needs to have
some experience, if not historical data, and using a test or just this
confidence interval, you will only get a hint on a possible side-effect,
no formal proof (which is always difficult from observational studies).
Discussing the CI would seem fairer and even stronger to me then. In
other words: Mathematically, you'll get an appropriate test, while
conceptually, I'm far from being convinced.

Hope this makes sense, 
Michael




-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Rolf Turner
Sent: Dienstag, 10. Februar 2009 22:33
To: R-help Forum
Subject: [R] OT: A test with dependent samples.


I am appealing to the general collective wisdom of this list in respect
of a statistics (rather than R) question.  This question comes to me
from a friend who is a veterinary oncologist.  In a study that she is
writing up there were 73 cats who were treated with a drug called
piroxicam.  None of the cats were observed to be subject to vomiting
prior to treatment; 12 of the cats were subject to vomiting after
treatment commenced.  She wants to be able to say that the treatment had
a ``significant''
impact with respect to this unwanted side-effect.

Initially she did a chi-squared test.  (Presumably on the matrix
matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't
pursue
this.) I pointed out to her that because of the dependence --- same 73
cats pre- and post- treatment --- the chi-squared test is inappropriate.

So what *is* appropriate?  There is a dependence structure of some sort,
but it seems to me to be impossible to estimate.

After mulling it over for a long while (I'm slow!) I decided that a
non-parametric approach, along the following lines, makes sense:

We have 73 independent pairs of outcomes (a,b) where a or b is 0 if the
cat didn't barf, and is 1 if it did barf.

We actually observe 61 (0,0) pairs and 12 (0,1) pairs.

If there is no effect from the piroxicam, then (0,1) and (1,0) are
equally likely.  So given that the outcome is in {(0,1),(1,0)} the
probability of each is 1/2.

Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis)
the probability of each entry is 1/2.  Hence the probability of this
sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) test
is 0.00024.  Hence the result is ``significant'' at the usual levels,
and my vet friend is happy.

I would very much appreciate comments on my reasoning.  Have I made any
goof-ups, missed any obvious pit-falls?  Gone down a wrong garden path?

Is there a better approach?

Most importantly (!!!): Is there any literature in which this approach
is spelled out?  (The journal in which she wishes to publish will almost
surely demand a citation.  They *won't* want to see the reasoning
spelled out in the paper.)

I would conjecture that this sort of scenario must arise reasonably
often in medical statistics and the suggested approach (if it is indeed
valid and sensible) would be ``standard''.  It might even have a name!
But I have no idea where to start looking, so I thought I'd ask this
wonderfully learned list.

Thanks for any input.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Emmanuel Charpentier

2009-Feb-15 17:00 UTC

head link

[R] OT: A test with dependent samples.

Dear List,

Catching up with my backlog, I stumbled upon this :

On Wed, 11 Feb 2009 10:33:13 +1300, Rolf Turner wrote?:
> I am appealing to the general collective wisdom of this list in respect
> of a statistics (rather than R) question.  This question comes to me
> from a friend who is a veterinary oncologist.  In a study that
> she is writing up there were 73 cats who were treated with a drug called
> piroxicam.  None of the cats were observed to be subject to vomiting
> prior
> to treatment; 12 of the cats were subject to vomiting after treatment
> commenced.  She wants to be able to say that the treatment had a
> ``significant''
> impact with respect to this unwanted side-effect.
> 
> Initially she did a chi-squared test.  (Presumably on the matrix
> matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't
> pursue
> this.) I pointed out to her that because of the dependence --- same 73
> cats pre- and post- treatment --- the chi-squared test is inappropriate.
> 
> So what *is* appropriate?  There is a dependence structure of some sort,
> but it seems to me to be impossible to estimate.
> 
> After mulling it over for a long while (I'm slow!) I decided that a
> non-parametric approach, along the following lines, makes sense:
> 
> We have 73 independent pairs of outcomes (a,b) where a or b is 0 if the
> cat didn't barf, and is 1 if it did barf.
> 
> We actually observe 61 (0,0) pairs and 12 (0,1) pairs.
> 
> If there is no effect from the piroxicam, then (0,1) and (1,0) are
> equally likely.  So given that the outcome is in {(0,1),(1,0)} the
> probability of each is 1/2.
> 
> Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis)
> the probability of each entry is 1/2.  Hence the probability of this
> sequence is (1/2)^12 = 0.00024.  So the p-value of the (one-sided) test
> is 0.00024.  Hence the result is ``significant'' at the usual
levels,
> and my vet friend is happy.
> 
> I would very much appreciate comments on my reasoning.  Have I made any
> goof-ups, missed any obvious pit-falls?  Gone down a wrong garden path?
> 
> Is there a better approach?
> 
> Most importantly (!!!): Is there any literature in which this approach
> is
> spelled out?  (The journal in which she wishes to publish will almost
> surely
> demand a citation.  They *won't* want to see the reasoning spelled out
> in
> the paper.)
> 
> I would conjecture that this sort of scenario must arise reasonably
> often
> in medical statistics and the suggested approach (if it is indeed valid
> and sensible) would be ``standard''.  It might even have a name! 
But I
> have no idea where to start looking, so I thought I'd ask this
> wonderfully
> learned list.
I read with interest the answers given, but got frustrated by (among
other points) seeing the main point unanswered : what in hell do you
want to *test* ? And to prove what ?

Classical test theory (Neyman and Pearson's sin, pride and glory) gives 
you a (conventionnaly accepted) way to check if your data support your
assertions. It starts by computing somehow the probability of getting
your data by sheer chance under the hypothesis of your assertion being
*false* (i. e. the dreaded "null hypothesis); if this probability is
"too low" (less than 1 in 20, according to a R. A. Fisher's whim,
now
idolized as a standard), it proceeds by asserting that this "too low"
probability means "impossible" and, by way of modus tollens (a-->b
and
not(b) ==> not(a), in propositional logic), rejects your null
hypothesis. Therefore, what your test "proves" is just the negation of
your null hypothesis.

The "null" hypothesis that "the drug does not cause cats to
barf"
implies that the *probability* of seeing a cat barfing is zero. Any barf
is enough to disprove it and your alternative ("some cat(s) may barf
after having the drug") is therefore "supported" at all
conventional
(and unconventional) levels. (See below for reasons for which this
reasoning is partially false).

Did you really bother to treat 73 cats to learn this ? In this case,
you've way too much money to burn and time on your hands. You might have
learned that much cheaper by treating cats one at a time and stopping at
the first barf. You coud even have obtained a (low precision) estimate
of the post-treatement barf probability, by remembering that the
distribution of the number of cats treated is binomial negative...

This (point- and interval-) estimation is probably much more
interesting that a nonsensical "test". Getting some precision on this
estimation might well be worth treating 73 cats. In this case, both
classical ("Fisherian") and Bayesian points of view give
"interesting"
answers. You may note that "classical" confidence interval and
Bayesian
credible interval with a noninformative prior have the same (numerical)
bounds, with very different significations (pick your poison, but be
aware that the Bayesian point of view | gospel | madness is quite
ill-accepted in most medical journals nowadays...). But the "test
significance level" is still 0, meaning that this test is sheer pure,
unadulterated, analytical-quality nonsense. Because your "null" has no
possible realistic meaning.

Now, another "null" has been suggested : "the cats have the same
probability of barfing before and after drug administration", leading
you to a symmetry (McNemar) test. This is much more interesting, and
might have some value ... unless, at it has been suggested, your
subjects are not "random cats" but "cats that do not barf before
drug
administration". In this case, your whole experiment is biased, and your
null is effectively (almost) the same as before (i. e. "the drug does
not cause non-previously-barfing-cats to barf"), in which case the same
grilling can be applied to it.

In both cases, the null hypothesis tested is so far away from any
"reasonable" hypothesis that the test turns to a farce. A much better
way to present these results to a referee would be to give a
(well-motivated) point- and interval-estimation and plainly refuse to
"test" it against nonsense (and explaining why, of course). Medical
journals editors and referees have spent way too much time complying to
dead statisticians' fetishes, turning the whole methodological problem
in (para-)clinical research into an exercise of rubing blue mud in the
same place and at the same time their ancestors did, with no place for
statistical (and probabilistic) thinking...

Another, more interesting problem, would be to know if taking the drug
in question does not cause an "unacceptable" probability of barfs.
This
would entail 1) defining the lowest "unacceptable" amount of barfing,
2) defining first-species risk and power, 3) computing the number of
subjects necessary to a non-superiority trial against a prespecified
fixed hypothesis and 4) effectively running and analysing such a trial.
Such a test of a *realistic* hypothesis would indeed be worthy.

======== Cut here for a loosely related methodological rant =======
In any case, if I refereed this paper, I would not "buy" it : the
causal
link is very badly established. Because the intervention is not the
*drug*, it is *receiving the drug*, which is *quite* different, even in
animals (maybe especially in animals).

I happen to have worked on a very similar setup (testing an antiemetic
on dogs receiving platinium salts, which is a bit more intricate that
the present setup). I *know* for a fact that the combination of placebo
"platinium salts" and placebo "antiemetic" *will* cause some
dogs to
barf (and other stress manifestations) : it is even enough to have one
dog in the same room (or even the same (small) building) starting barfing
to start barfings in other, totally untouched (not even by placebos)
animals. You see, stress can be communicated by sight, sound and
smell, and it's bloody hard to isolate animals from each other on these
three aspects... Furthermore, if you were able to obtain such an
isolation, you'd get animals subject to a *major* stress : loleliness.
Dogs are social beasts, y'know...


And I don't suppose cats being less sensitive (? Bubastis, forgive them,
they do not know what they do...).

Therefore, to study the possible emetic effect of the *drug*, the
simple (maybe simplistic) way to test the (semi-realistic) null "the
*drug* doesn't modify the emetic risk entailed by taking any drug",
I'd
compare the "barfing rates" of a *random* sample of cats receiving the
drug and another, *distinct*, *random* *control* sample of cats
receiving a suitable placebo (same presentation, same coulour, same
smell, same "galenic",  etc ...). I'd blind any person in contact
with
any of the animals to the exact nature of the intervention (dogs and
cats will somehow "feel" your expectations, don't ask me how, but
they
can be worse than human patients in this respect...). An I'd analyze
this exactly as a clinical trial (this *is* a veterinary clinical trial,
indeed).

This simplistic scheme does not account for individual sensitivities of
animals. Various ways exist to "absorb" this, the simplest being of
course a properly randomized cross-over trial. The null becomes, of
course, triple : "the emetic properties of the administration do not
depend of the product administered", "the emetic properties of an
administration do not depend on a previous administration", and "the
emetic properties of a product do not depend of the previous 
administration of another product". The interpretation of such an
experiment may become ... delicate.

Other schemes are possible : e. g., repeated experiments on the same
animals may allow to *describe* individual sensitivities and the
variability thereof (analysis by mixed models including a "subject"
factor). However, I'd be very wary of the validity of such an
experiment, given the possibility (almost certainty...) of inducing
a stereotyped comportment in the subjects.

And the importance of the ultimate goal would have to be bloody mighty
in order to justify such a treatment being inflicted to cats (or dogs,
for that matter)...

One might note that my previous proposal of a non-superiority trial does
not involve a control. That's because this trial has a *pragmatic*
goal : checking the acceptability of the administration of a drug on an
a priori set of criteria. It does not allow inferences on the effect of
the drug, and *postulates* that the non-administration of the drug will
result in nothing of interest. This allows us to pull a realistic,
interesting, null hypothesis out of our hats.

On the other hand, the controlled plans, by virtue of having a control,
allow us to be analytic, and separate the effect of the administration
from the effect of the drug itself : this latter one might indeed be
zero, the associated null hypothesis isn't nonsensical and the test of
this null isn't worthless.

======== End of the loosely related methodological rant =======
In any case, my point is : hypothesis testing is *not* the alpha and
omega of biostatistics, and other methods of describing and analysing
experimental results are often much more interesting, nonwhistanding the
fetishes of journal referees. Furthermore, testing of impossible or
worthless hypotheses lead to worthless conclusions. Corollary : do not
test for the sake of testing, because "everybody does it" or because a
referee started a tantrum ; test realistic hypotheses, whose rejection
has at least some relation to your subject matter.

The two cents of someone tired of reading utter nonsense in prestigious
journals...

					Emmanuel Charpentier

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Feb 2009 - OT: A test with dependent samples.

[R] OT: A test with dependent samples.

[R] OT: A test with dependent samples.

[R] OT: A test with dependent samples.

[R] OT: A test with dependent samples.

[R] OT: A test with dependent samples.

[R] OT: A test with dependent samples.

[R] OT: A test with dependent samples.

Apparently Analagous Threads