I am appealing to the general collective wisdom of this list in respect of a statistics (rather than R) question. This question comes to me from a friend who is a veterinary oncologist. In a study that she is writing up there were 73 cats who were treated with a drug called piroxicam. None of the cats were observed to be subject to vomiting prior to treatment; 12 of the cats were subject to vomiting after treatment commenced. She wants to be able to say that the treatment had a ``significant'' impact with respect to this unwanted side-effect. Initially she did a chi-squared test. (Presumably on the matrix matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't pursue this.) I pointed out to her that because of the dependence --- same 73 cats pre- and post- treatment --- the chi-squared test is inappropriate. So what *is* appropriate? There is a dependence structure of some sort, but it seems to me to be impossible to estimate. After mulling it over for a long while (I'm slow!) I decided that a non-parametric approach, along the following lines, makes sense: We have 73 independent pairs of outcomes (a,b) where a or b is 0 if the cat didn't barf, and is 1 if it did barf. We actually observe 61 (0,0) pairs and 12 (0,1) pairs. If there is no effect from the piroxicam, then (0,1) and (1,0) are equally likely. So given that the outcome is in {(0,1),(1,0)} the probability of each is 1/2. Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis) the probability of each entry is 1/2. Hence the probability of this sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) test is 0.00024. Hence the result is ``significant'' at the usual levels, and my vet friend is happy. I would very much appreciate comments on my reasoning. Have I made any goof-ups, missed any obvious pit-falls? Gone down a wrong garden path? Is there a better approach? Most importantly (!!!): Is there any literature in which this approach is spelled out? (The journal in which she wishes to publish will almost surely demand a citation. They *won't* want to see the reasoning spelled out in the paper.) I would conjecture that this sort of scenario must arise reasonably often in medical statistics and the suggested approach (if it is indeed valid and sensible) would be ``standard''. It might even have a name! But I have no idea where to start looking, so I thought I'd ask this wonderfully learned list. Thanks for any input. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
In the biomedical arena, at least as I learned from Rosner's introductory text, the usual approach to analyzing paired 2 x 2 tables is McNemar's test. ?mcnemar.test > mcnemar.test(matrix(c(73,0,61,12),2,2)) McNemar's Chi-squared test with continuity correction data: matrix(c(73, 0, 61, 12), 2, 2) McNemar's chi-squared = 59.0164, df = 1, p-value = 1.564e-14 The help page has citation to Agresti. -- David winsemius On Feb 10, 2009, at 4:33 PM, Rolf Turner wrote:> > I am appealing to the general collective wisdom of this > list in respect of a statistics (rather than R) question. This > question > comes to me from a friend who is a veterinary oncologist. In a > study that > she is writing up there were 73 cats who were treated with a drug > called > piroxicam. None of the cats were observed to be subject to vomiting > prior > to treatment; 12 of the cats were subject to vomiting after treatment > commenced. She wants to be able to say that the treatment had a > ``significant'' > impact with respect to this unwanted side-effect. > > Initially she did a chi-squared test. (Presumably on the matrix > matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't > pursue > this.) I pointed out to her that because of the dependence --- same 73 > cats pre- and post- treatment --- the chi-squared test is > inappropriate. > > So what *is* appropriate? There is a dependence structure of some > sort, > but it seems to me to be impossible to estimate. > > After mulling it over for a long while (I'm slow!) I decided that a > non-parametric approach, along the following lines, makes sense: > > We have 73 independent pairs of outcomes (a,b) where a or b is 0 > if the cat didn't barf, and is 1 if it did barf. > > We actually observe 61 (0,0) pairs and 12 (0,1) pairs. > > If there is no effect from the piroxicam, then (0,1) and (1,0) are > equally likely. So given that the outcome is in {(0,1),(1,0)} the > probability of each is 1/2. > > Thus we have a sequence of 12 (0,1)-s where (under the null > hypothesis) > the probability of each entry is 1/2. Hence the probability of this > sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) > test > is 0.00024. Hence the result is ``significant'' at the usual levels, > and my vet friend is happy. > > I would very much appreciate comments on my reasoning. Have I made > any > goof-ups, missed any obvious pit-falls? Gone down a wrong garden > path? > > Is there a better approach? > > Most importantly (!!!): Is there any literature in which this > approach is > spelled out? (The journal in which she wishes to publish will > almost surely > demand a citation. They *won't* want to see the reasoning spelled > out in > the paper.) > > I would conjecture that this sort of scenario must arise reasonably > often > in medical statistics and the suggested approach (if it is indeed > valid > and sensible) would be ``standard''. It might even have a name! > But I > have no idea where to start looking, so I thought I'd ask this > wonderfully > learned list. > > Thanks for any input. > > cheers, > > Rolf Turner > > ###################################################################### > Attention:\ This e-mail message is privileged and confid...{{dropped: > 9}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
on 02/10/2009 03:33 PM Rolf Turner wrote:> > I am appealing to the general collective wisdom of this > list in respect of a statistics (rather than R) question. This question > comes to me from a friend who is a veterinary oncologist. In a study that > she is writing up there were 73 cats who were treated with a drug called > piroxicam. None of the cats were observed to be subject to vomiting prior > to treatment; 12 of the cats were subject to vomiting after treatment > commenced. She wants to be able to say that the treatment had a > ``significant'' > impact with respect to this unwanted side-effect. > > Initially she did a chi-squared test. (Presumably on the matrix > matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't pursue > this.) I pointed out to her that because of the dependence --- same 73 > cats pre- and post- treatment --- the chi-squared test is inappropriate. > > So what *is* appropriate? There is a dependence structure of some sort, > but it seems to me to be impossible to estimate. > > After mulling it over for a long while (I'm slow!) I decided that a > non-parametric approach, along the following lines, makes sense: > > We have 73 independent pairs of outcomes (a,b) where a or b is 0 > if the cat didn't barf, and is 1 if it did barf. > > We actually observe 61 (0,0) pairs and 12 (0,1) pairs. > > If there is no effect from the piroxicam, then (0,1) and (1,0) are > equally likely. So given that the outcome is in {(0,1),(1,0)} the > probability of each is 1/2. > > Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis) > the probability of each entry is 1/2. Hence the probability of this > sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) test > is 0.00024. Hence the result is ``significant'' at the usual levels, > and my vet friend is happy. > > I would very much appreciate comments on my reasoning. Have I made any > goof-ups, missed any obvious pit-falls? Gone down a wrong garden path? > > Is there a better approach? > > Most importantly (!!!): Is there any literature in which this approach is > spelled out? (The journal in which she wishes to publish will almost > surely > demand a citation. They *won't* want to see the reasoning spelled out in > the paper.) > > I would conjecture that this sort of scenario must arise reasonably often > in medical statistics and the suggested approach (if it is indeed valid > and sensible) would be ``standard''. It might even have a name! But I > have no idea where to start looking, so I thought I'd ask this wonderfully > learned list. > > Thanks for any input.Rolf, I am a little confused, perhaps due to lack of sleep (sick dog with CHF). Typically in this type of study, essentially looking at the efficacy/safety profile of a treatment, there are two options. One does a two arm randomized study, whereby "subjects" are randomized to one of two treatments. The two treatments may both be "active" or one may be a placebo. Then a typical two sample comparison of the primary hypothesis is made. In this setting, you would have a second group of 73 cats who received a comparative treatment (or a placebo) to compare against the 16.4% observed in this treatment group. For example, say that patients were undergoing cancer treatment, which has nausea and vomiting as a side effect. Due to the side effect, it is common to see a reduction in dosing, which of course reduces treatment effectiveness. You might want to study a treatment that favorably reduces that side effect, to enable improved treatment dosing and patient tolerance. The other option would be to perform a single sample study, where there is an a priori hypothesis, based upon prior work, of the expected incidence of the adverse event or perhaps a "clinically acceptable" incidence of the adverse event. This would seem to be the scenario indicated above. What is lacking is some a priori expectation of the incidence of the event in question, so that one can show that you have reduced the incidence from the expected. 50% would not make sense here, though if it did, a single sample binomial test would be used, presuming a two-sided hypothesis:> binom.test(12, 73, 0.5)$p.value[1] 4.802197e-09 That none of them had vomiting prior to treatment seems to be of little interest here. You could just as easily argue that there was a significant increase in the incidence of vomiting from 0% to 16.4% due to the treatment. What am I missing? Regards, Marc Schwartz
Hi: Bert: can you do that because the null is that they are equal before and after, not that the proportion is zero ? Thank for any clarification to my lack of understanding. On Tue, Feb 10, 2009 at 5:43 PM, Bert Gunter wrote:> Ah, experimental units,again ... a subject little taught by > statisticians > that is often the crux of the matter. As here. > > The cat is the experimental unit. There are 73 of them. 12 of them > experienced vomiting after treatment. What's a confidence interval for > the > true proportion based on our sample of 73? binom.test(12,72) gives us > .088 > to .27 for an exact 2 sided interval (and a P value of 2.2e-16 for the > null > = 0). > > Seems rather convincing -- and simple -- to me! > > -- Bert Gunter > > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On > Behalf Of David Winsemius > Sent: Tuesday, February 10, 2009 1:51 PM > To: Rolf Turner > Cc: R-help Forum > Subject: Re: [R] OT: A test with dependent samples. > > In the biomedical arena, at least as I learned from Rosner's > introductory text, the usual approach to analyzing paired 2 x 2 tables > is McNemar's test. > > ?mcnemar.test > >> mcnemar.test(matrix(c(73,0,61,12),2,2)) > > McNemar's Chi-squared test with continuity correction > > data: matrix(c(73, 0, 61, 12), 2, 2) > McNemar's chi-squared = 59.0164, df = 1, p-value = 1.564e-14 > > The help page has citation to Agresti. > > -- > David winsemius > On Feb 10, 2009, at 4:33 PM, Rolf Turner wrote: > >> >> I am appealing to the general collective wisdom of this >> list in respect of a statistics (rather than R) question. This >> question >> comes to me from a friend who is a veterinary oncologist. In a >> study that >> she is writing up there were 73 cats who were treated with a drug >> called >> piroxicam. None of the cats were observed to be subject to vomiting >> prior >> to treatment; 12 of the cats were subject to vomiting after treatment >> commenced. She wants to be able to say that the treatment had a >> ``significant'' >> impact with respect to this unwanted side-effect. >> >> Initially she did a chi-squared test. (Presumably on the matrix >> matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't >> pursue >> this.) I pointed out to her that because of the dependence --- same >> 73 >> cats pre- and post- treatment --- the chi-squared test is >> inappropriate. >> >> So what *is* appropriate? There is a dependence structure of some >> sort, >> but it seems to me to be impossible to estimate. >> >> After mulling it over for a long while (I'm slow!) I decided that a >> non-parametric approach, along the following lines, makes sense: >> >> We have 73 independent pairs of outcomes (a,b) where a or b is 0 >> if the cat didn't barf, and is 1 if it did barf. >> >> We actually observe 61 (0,0) pairs and 12 (0,1) pairs. >> >> If there is no effect from the piroxicam, then (0,1) and (1,0) are >> equally likely. So given that the outcome is in {(0,1),(1,0)} the >> probability of each is 1/2. >> >> Thus we have a sequence of 12 (0,1)-s where (under the null >> hypothesis) >> the probability of each entry is 1/2. Hence the probability of this >> sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) >> test >> is 0.00024. Hence the result is ``significant'' at the usual levels, >> and my vet friend is happy. >> >> I would very much appreciate comments on my reasoning. Have I made >> any >> goof-ups, missed any obvious pit-falls? Gone down a wrong garden >> path? >> >> Is there a better approach? >> >> Most importantly (!!!): Is there any literature in which this >> approach is >> spelled out? (The journal in which she wishes to publish will >> almost surely >> demand a citation. They *won't* want to see the reasoning spelled >> out in >> the paper.) >> >> I would conjecture that this sort of scenario must arise reasonably >> often >> in medical statistics and the suggested approach (if it is indeed >> valid >> and sensible) would be ``standard''. It might even have a name! >> But I >> have no idea where to start looking, so I thought I'd ask this >> wonderfully >> learned list. >> >> Thanks for any input. >> >> cheers, >> >> Rolf Turner >> >> >> ###################################################################### >> Attention:\ This e-mail message is privileged and confid...{{dropped: >> 9}} >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Bert: I think what you said about a prior guess for the NULL is also similar to what Chuck said about people looking with a blank stare. Thanks for the clarification. On Tue, Feb 10, 2009 at 7:06 PM, Bert Gunter wrote:> The only question at issue (i.e. capable of being addressed) is: is > giving > the drug to non-vomiting cats associated with vomiting? (I would > strongly > suspect that cats that were vomiting beforehand would have been > excluded > from the study, as the researcher would have felt that one couldn't > then > tell whether or not the drug caused vomiting problems for them. No?) > > There were 73 non-vomiting cats, 12 of which started vomiting after > receiving the drug. All I can do is give a confidence interval for the > estimated proportion of nonvomiting cats that vomit when given this > drug and > perhaps ask whether it is consistent with their nonvomiting status > before. > Which is what I did. And it's pretty convincing that giving the pill > is > associated with vomiting, right? > > Whether the vomiting was associated with the giving of this > **particular** > drug is, of course, impossible to tell, because the researcher failed > to > include placebo controls. I chose 0 for a null as a representation of > their > non-vomiting status, but the scientific question of interest is > probably to > compare them to the proportion of cats that would vomit if given any > pill at > all. Without any placebo controls, who can tell? Substitute a prior > guess if > you like for a Null. Which is exactly the point that Marc Schwartz > made -- > that is, that the data are probably completely useless to answer the > question of interest because the researcher messed up the design. > -- Bert Gunter > > -----Original Message----- > From: markleeds at verizon.net [mailto:markleeds at verizon.net] Sent: > Tuesday, February 10, 2009 2:54 PM > To: Bert Gunter > Cc: 'David Winsemius'; 'Rolf Turner'; r-help at r-project.org > Subject: Re: [R] OT: A test with dependent samples. > > Hi: Bert: can you do that because the null is that they are equal > before and after, > not that the proportion is zero ? Thank for any clarification to my > lack of understanding. > > > > > On Tue, Feb 10, 2009 at 5:43 PM, Bert Gunter wrote: > >> Ah, experimental units,again ... a subject little taught by >> statisticians >> that is often the crux of the matter. As here. >> >> The cat is the experimental unit. There are 73 of them. 12 of them >> experienced vomiting after treatment. What's a confidence interval >> for the >> true proportion based on our sample of 73? binom.test(12,72) gives us >> .088 >> to .27 for an exact 2 sided interval (and a P value of 2.2e-16 for >> the null >> = 0). >> >> Seems rather convincing -- and simple -- to me! >> >> -- Bert Gunter >> >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On >> Behalf Of David Winsemius >> Sent: Tuesday, February 10, 2009 1:51 PM >> To: Rolf Turner >> Cc: R-help Forum >> Subject: Re: [R] OT: A test with dependent samples. >> >> In the biomedical arena, at least as I learned from Rosner's >> introductory text, the usual approach to analyzing paired 2 x 2 >> tables is McNemar's test. >> >> ?mcnemar.test >> >>> mcnemar.test(matrix(c(73,0,61,12),2,2)) >> >> McNemar's Chi-squared test with continuity correction >> >> data: matrix(c(73, 0, 61, 12), 2, 2) >> McNemar's chi-squared = 59.0164, df = 1, p-value = 1.564e-14 >> >> The help page has citation to Agresti. >> >> -- >> David winsemius >> On Feb 10, 2009, at 4:33 PM, Rolf Turner wrote: >> >>> >>> I am appealing to the general collective wisdom of this >>> list in respect of a statistics (rather than R) question. This >>> question >>> comes to me from a friend who is a veterinary oncologist. In a >>> study that >>> she is writing up there were 73 cats who were treated with a drug >>> called >>> piroxicam. None of the cats were observed to be subject to vomiting >>> prior >>> to treatment; 12 of the cats were subject to vomiting after >>> treatment >>> commenced. She wants to be able to say that the treatment had a >>> ``significant'' >>> impact with respect to this unwanted side-effect. >>> >>> Initially she did a chi-squared test. (Presumably on the matrix >>> matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't >>> pursue >>> this.) I pointed out to her that because of the dependence --- same >>> 73 >>> cats pre- and post- treatment --- the chi-squared test is >>> inappropriate. >>> >>> So what *is* appropriate? There is a dependence structure of some >>> sort, >>> but it seems to me to be impossible to estimate. >>> >>> After mulling it over for a long while (I'm slow!) I decided that a >>> non-parametric approach, along the following lines, makes sense: >>> >>> We have 73 independent pairs of outcomes (a,b) where a or b is 0 >>> if the cat didn't barf, and is 1 if it did barf. >>> >>> We actually observe 61 (0,0) pairs and 12 (0,1) pairs. >>> >>> If there is no effect from the piroxicam, then (0,1) and (1,0) are >>> equally likely. So given that the outcome is in {(0,1),(1,0)} the >>> probability of each is 1/2. >>> >>> Thus we have a sequence of 12 (0,1)-s where (under the null >>> hypothesis) >>> the probability of each entry is 1/2. Hence the probability of this >>> sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) >>> test >>> is 0.00024. Hence the result is ``significant'' at the usual >>> levels, >>> and my vet friend is happy. >>> >>> I would very much appreciate comments on my reasoning. Have I made >>> any >>> goof-ups, missed any obvious pit-falls? Gone down a wrong garden >>> path? >>> >>> Is there a better approach? >>> >>> Most importantly (!!!): Is there any literature in which this >>> approach is >>> spelled out? (The journal in which she wishes to publish will >>> almost surely >>> demand a citation. They *won't* want to see the reasoning spelled >>> out in >>> the paper.) >>> >>> I would conjecture that this sort of scenario must arise reasonably >>> often >>> in medical statistics and the suggested approach (if it is indeed >>> valid >>> and sensible) would be ``standard''. It might even have a name! But >>> I >>> have no idea where to start looking, so I thought I'd ask this >>> wonderfully >>> learned list. >>> >>> Thanks for any input. >>> >>> cheers, >>> >>> Rolf Turner >>> >>> >>> >>> ###################################################################### >>> Attention:\ This e-mail message is privileged and >>> confid...{{dropped: 9}} >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Meyners, Michael, LAUSANNE, AppliedMathematics
2009-Feb-11 10:27 UTC
[R] OT: A test with dependent samples.
Rolf, as you explicitly asked for a comment on your proposal: It is generally equivalent to McNemar's test and maybe even more appropriate because of the asymptotics involved in the chi-squared distribution, which might not be too good with n=12. In some more detail: McNemar's test basically considers the difference between the off-diagonal elements, normalized by their sum. You do the same, ignoring all (0,0) and (1,1) pairs (the latter do not occur in your example anyway). binom.test(12,12,alternative="greater") gives you the one-sided p-value. If, however, you do not use the exact binomial distribution (with n=12) but the normal approximation, you find E(X)=6 and VAR(X)=3 with X the number of sequences (0,1), and you observed x=12. This gives you a z-score of (12-6)/sqrt(3) and a corresponding one-sided p-value:> pnorm(6/sqrt(3), lower.tail=F)[1] 0.0002660028 Further, McNemar's test WITHOUT continuity correction gives> mcnemar.test(matrix(c(61,0,12,0),2,2), correct=F)McNemar's Chi-squared test data: matrix(c(61, 0, 12, 0), 2, 2) McNemar's chi-squared = 12, df = 1, p-value = 0.000532 It comes as no surprise that the reported (two-sided!) p-value is exactly twice the (one-sided!) p-value from the binomial test with normal approximation:> pnorm(6/sqrt(3), lower.tail=F)*2[1] 0.0005320055 I do not want to stress the pros and cons of continuity corrections, but if neglected, you see that you get the same results (except that McNemar is generally two-sided), due to the relation of normal and chi-squared distribution. If you use the binomial test, you can forget about asymptotics and continuity correction, that's why I indeed consider you approach superior to the chi-squared approximation used in McNemar's test. But you might fail to find a reference for the exact approach... You briefly asked in a later mail testing for p=0. Indeed, _any_ incident will disprove this hypothesis, and the p value reported by> binom.test(1,73,p=0)Exact binomial test data: 1 and 73 number of successes = 1, number of trials = 73, p-value < 2.2e-16 alternative hypothesis: true probability of success is not equal to 0 95 percent confidence interval: 0.0003467592 0.0739763232 sample estimates: probability of success 0.01369863 is not wrong (as R just tells us it is BELOW a certain value), but could be refined by saying "p-value = 0". BTW, I am not sure what "p-value=TRUE" tells me, as derived from binom.test(0,73,p=0). I personally don't care about either, as to test H0: p=0, I would not use any software but rather stress some basic statistical theory. It remains the questions whether McNemar (or your proposal) is generally appropriate here. Like others, I have doubts, except maybe if observations were made on two subsequent days, on the second of which the drug was administered. How could you otherwise be sure that the increase in incidences it not due to the progression of cancer, say, which would be difficult to rule out. Also, from my experience and in contrast to you point of view, I'd rather assume it likely that a vet is much more reluctant to apply the new treatment to an already vomitting cat, your "baseline" with 0 out of 73 at least seems suspicious, and until proven wrong, I'd take the assumption that this is not by chance only. I understand that this is an observational study with no control/randomization, but concluding a side-effect from a statistically significant change in incidence rates from this study seems questionable to me. What I'd propose (and used in comparable situations) is to confine yourself on the confidence interval and let the investigator decide / discuss whether the lower bound of this is higher than what she would expect under control or alternative treatment. She needs to have some experience, if not historical data, and using a test or just this confidence interval, you will only get a hint on a possible side-effect, no formal proof (which is always difficult from observational studies). Discussing the CI would seem fairer and even stronger to me then. In other words: Mathematically, you'll get an appropriate test, while conceptually, I'm far from being convinced. Hope this makes sense, Michael -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rolf Turner Sent: Dienstag, 10. Februar 2009 22:33 To: R-help Forum Subject: [R] OT: A test with dependent samples. I am appealing to the general collective wisdom of this list in respect of a statistics (rather than R) question. This question comes to me from a friend who is a veterinary oncologist. In a study that she is writing up there were 73 cats who were treated with a drug called piroxicam. None of the cats were observed to be subject to vomiting prior to treatment; 12 of the cats were subject to vomiting after treatment commenced. She wants to be able to say that the treatment had a ``significant'' impact with respect to this unwanted side-effect. Initially she did a chi-squared test. (Presumably on the matrix matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't pursue this.) I pointed out to her that because of the dependence --- same 73 cats pre- and post- treatment --- the chi-squared test is inappropriate. So what *is* appropriate? There is a dependence structure of some sort, but it seems to me to be impossible to estimate. After mulling it over for a long while (I'm slow!) I decided that a non-parametric approach, along the following lines, makes sense: We have 73 independent pairs of outcomes (a,b) where a or b is 0 if the cat didn't barf, and is 1 if it did barf. We actually observe 61 (0,0) pairs and 12 (0,1) pairs. If there is no effect from the piroxicam, then (0,1) and (1,0) are equally likely. So given that the outcome is in {(0,1),(1,0)} the probability of each is 1/2. Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis) the probability of each entry is 1/2. Hence the probability of this sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) test is 0.00024. Hence the result is ``significant'' at the usual levels, and my vet friend is happy. I would very much appreciate comments on my reasoning. Have I made any goof-ups, missed any obvious pit-falls? Gone down a wrong garden path? Is there a better approach? Most importantly (!!!): Is there any literature in which this approach is spelled out? (The journal in which she wishes to publish will almost surely demand a citation. They *won't* want to see the reasoning spelled out in the paper.) I would conjecture that this sort of scenario must arise reasonably often in medical statistics and the suggested approach (if it is indeed valid and sensible) would be ``standard''. It might even have a name! But I have no idea where to start looking, so I thought I'd ask this wonderfully learned list. Thanks for any input. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear List, Catching up with my backlog, I stumbled upon this : On Wed, 11 Feb 2009 10:33:13 +1300, Rolf Turner wrote?:> I am appealing to the general collective wisdom of this list in respect > of a statistics (rather than R) question. This question comes to me > from a friend who is a veterinary oncologist. In a study that > she is writing up there were 73 cats who were treated with a drug called > piroxicam. None of the cats were observed to be subject to vomiting > prior > to treatment; 12 of the cats were subject to vomiting after treatment > commenced. She wants to be able to say that the treatment had a > ``significant'' > impact with respect to this unwanted side-effect. > > Initially she did a chi-squared test. (Presumably on the matrix > matrix(c(73,0,61,12),2,2) --- she didn't give details and I didn't > pursue > this.) I pointed out to her that because of the dependence --- same 73 > cats pre- and post- treatment --- the chi-squared test is inappropriate. > > So what *is* appropriate? There is a dependence structure of some sort, > but it seems to me to be impossible to estimate. > > After mulling it over for a long while (I'm slow!) I decided that a > non-parametric approach, along the following lines, makes sense: > > We have 73 independent pairs of outcomes (a,b) where a or b is 0 if the > cat didn't barf, and is 1 if it did barf. > > We actually observe 61 (0,0) pairs and 12 (0,1) pairs. > > If there is no effect from the piroxicam, then (0,1) and (1,0) are > equally likely. So given that the outcome is in {(0,1),(1,0)} the > probability of each is 1/2. > > Thus we have a sequence of 12 (0,1)-s where (under the null hypothesis) > the probability of each entry is 1/2. Hence the probability of this > sequence is (1/2)^12 = 0.00024. So the p-value of the (one-sided) test > is 0.00024. Hence the result is ``significant'' at the usual levels, > and my vet friend is happy. > > I would very much appreciate comments on my reasoning. Have I made any > goof-ups, missed any obvious pit-falls? Gone down a wrong garden path? > > Is there a better approach? > > Most importantly (!!!): Is there any literature in which this approach > is > spelled out? (The journal in which she wishes to publish will almost > surely > demand a citation. They *won't* want to see the reasoning spelled out > in > the paper.) > > I would conjecture that this sort of scenario must arise reasonably > often > in medical statistics and the suggested approach (if it is indeed valid > and sensible) would be ``standard''. It might even have a name! But I > have no idea where to start looking, so I thought I'd ask this > wonderfully > learned list.I read with interest the answers given, but got frustrated by (among other points) seeing the main point unanswered : what in hell do you want to *test* ? And to prove what ? Classical test theory (Neyman and Pearson's sin, pride and glory) gives you a (conventionnaly accepted) way to check if your data support your assertions. It starts by computing somehow the probability of getting your data by sheer chance under the hypothesis of your assertion being *false* (i. e. the dreaded "null hypothesis); if this probability is "too low" (less than 1 in 20, according to a R. A. Fisher's whim, now idolized as a standard), it proceeds by asserting that this "too low" probability means "impossible" and, by way of modus tollens (a-->b and not(b) ==> not(a), in propositional logic), rejects your null hypothesis. Therefore, what your test "proves" is just the negation of your null hypothesis. The "null" hypothesis that "the drug does not cause cats to barf" implies that the *probability* of seeing a cat barfing is zero. Any barf is enough to disprove it and your alternative ("some cat(s) may barf after having the drug") is therefore "supported" at all conventional (and unconventional) levels. (See below for reasons for which this reasoning is partially false). Did you really bother to treat 73 cats to learn this ? In this case, you've way too much money to burn and time on your hands. You might have learned that much cheaper by treating cats one at a time and stopping at the first barf. You coud even have obtained a (low precision) estimate of the post-treatement barf probability, by remembering that the distribution of the number of cats treated is binomial negative... This (point- and interval-) estimation is probably much more interesting that a nonsensical "test". Getting some precision on this estimation might well be worth treating 73 cats. In this case, both classical ("Fisherian") and Bayesian points of view give "interesting" answers. You may note that "classical" confidence interval and Bayesian credible interval with a noninformative prior have the same (numerical) bounds, with very different significations (pick your poison, but be aware that the Bayesian point of view | gospel | madness is quite ill-accepted in most medical journals nowadays...). But the "test significance level" is still 0, meaning that this test is sheer pure, unadulterated, analytical-quality nonsense. Because your "null" has no possible realistic meaning. Now, another "null" has been suggested : "the cats have the same probability of barfing before and after drug administration", leading you to a symmetry (McNemar) test. This is much more interesting, and might have some value ... unless, at it has been suggested, your subjects are not "random cats" but "cats that do not barf before drug administration". In this case, your whole experiment is biased, and your null is effectively (almost) the same as before (i. e. "the drug does not cause non-previously-barfing-cats to barf"), in which case the same grilling can be applied to it. In both cases, the null hypothesis tested is so far away from any "reasonable" hypothesis that the test turns to a farce. A much better way to present these results to a referee would be to give a (well-motivated) point- and interval-estimation and plainly refuse to "test" it against nonsense (and explaining why, of course). Medical journals editors and referees have spent way too much time complying to dead statisticians' fetishes, turning the whole methodological problem in (para-)clinical research into an exercise of rubing blue mud in the same place and at the same time their ancestors did, with no place for statistical (and probabilistic) thinking... Another, more interesting problem, would be to know if taking the drug in question does not cause an "unacceptable" probability of barfs. This would entail 1) defining the lowest "unacceptable" amount of barfing, 2) defining first-species risk and power, 3) computing the number of subjects necessary to a non-superiority trial against a prespecified fixed hypothesis and 4) effectively running and analysing such a trial. Such a test of a *realistic* hypothesis would indeed be worthy. ======== Cut here for a loosely related methodological rant ======= In any case, if I refereed this paper, I would not "buy" it : the causal link is very badly established. Because the intervention is not the *drug*, it is *receiving the drug*, which is *quite* different, even in animals (maybe especially in animals). I happen to have worked on a very similar setup (testing an antiemetic on dogs receiving platinium salts, which is a bit more intricate that the present setup). I *know* for a fact that the combination of placebo "platinium salts" and placebo "antiemetic" *will* cause some dogs to barf (and other stress manifestations) : it is even enough to have one dog in the same room (or even the same (small) building) starting barfing to start barfings in other, totally untouched (not even by placebos) animals. You see, stress can be communicated by sight, sound and smell, and it's bloody hard to isolate animals from each other on these three aspects... Furthermore, if you were able to obtain such an isolation, you'd get animals subject to a *major* stress : loleliness. Dogs are social beasts, y'know... And I don't suppose cats being less sensitive (? Bubastis, forgive them, they do not know what they do...). Therefore, to study the possible emetic effect of the *drug*, the simple (maybe simplistic) way to test the (semi-realistic) null "the *drug* doesn't modify the emetic risk entailed by taking any drug", I'd compare the "barfing rates" of a *random* sample of cats receiving the drug and another, *distinct*, *random* *control* sample of cats receiving a suitable placebo (same presentation, same coulour, same smell, same "galenic", etc ...). I'd blind any person in contact with any of the animals to the exact nature of the intervention (dogs and cats will somehow "feel" your expectations, don't ask me how, but they can be worse than human patients in this respect...). An I'd analyze this exactly as a clinical trial (this *is* a veterinary clinical trial, indeed). This simplistic scheme does not account for individual sensitivities of animals. Various ways exist to "absorb" this, the simplest being of course a properly randomized cross-over trial. The null becomes, of course, triple : "the emetic properties of the administration do not depend of the product administered", "the emetic properties of an administration do not depend on a previous administration", and "the emetic properties of a product do not depend of the previous administration of another product". The interpretation of such an experiment may become ... delicate. Other schemes are possible : e. g., repeated experiments on the same animals may allow to *describe* individual sensitivities and the variability thereof (analysis by mixed models including a "subject" factor). However, I'd be very wary of the validity of such an experiment, given the possibility (almost certainty...) of inducing a stereotyped comportment in the subjects. And the importance of the ultimate goal would have to be bloody mighty in order to justify such a treatment being inflicted to cats (or dogs, for that matter)... One might note that my previous proposal of a non-superiority trial does not involve a control. That's because this trial has a *pragmatic* goal : checking the acceptability of the administration of a drug on an a priori set of criteria. It does not allow inferences on the effect of the drug, and *postulates* that the non-administration of the drug will result in nothing of interest. This allows us to pull a realistic, interesting, null hypothesis out of our hats. On the other hand, the controlled plans, by virtue of having a control, allow us to be analytic, and separate the effect of the administration from the effect of the drug itself : this latter one might indeed be zero, the associated null hypothesis isn't nonsensical and the test of this null isn't worthless. ======== End of the loosely related methodological rant ======= In any case, my point is : hypothesis testing is *not* the alpha and omega of biostatistics, and other methods of describing and analysing experimental results are often much more interesting, nonwhistanding the fetishes of journal referees. Furthermore, testing of impossible or worthless hypotheses lead to worthless conclusions. Corollary : do not test for the sake of testing, because "everybody does it" or because a referee started a tantrum ; test realistic hypotheses, whose rejection has at least some relation to your subject matter. The two cents of someone tired of reading utter nonsense in prestigious journals... Emmanuel Charpentier
Maybe Matching Threads
- McNemar test in R & SPSS
- Problem applying McNemar's - Different values in SPSS and R
- Can I use "mcnemar.test" for 3*3 tables (or is there a bug in the command?)
- mcnemar.test odds ratios, CI, etc.
- Using McNemar's test to detect shifts in response from pre- to post-treatment