Bert Gunter
2011-Feb-21 18:17 UTC
[R] OT: R Square Help (this debate again, i know!) and The Experimental Unit
Dieter (et. al.) I am weak and therefore yield to temptation... This is OT for R, so stop reading and discard now if you're looking for real R Help. (see also one inline coment below) Mount soapbox; begin rant </ In addition to the points you made/alluded to, may I also suggest that confusion about the nature of The Experimental Unit results in a lot of garbage/non-replicable scientific results seemingly buttressed by "officially sanctioned" statistics. You can google the erm, but many of the results are not definitions, but examples from which one is (presumably) supposed to inductively infer the definition. Here is the one that I think is gfermane: Experimental Unit: The smallest division of the experimental material that can receive separate treatments. There is a brief but (imo) pertinent discussion of what this means in the context of food science here: http://ift.confex.com/ift/2005/techprogram/paper_27139.htm The importance of this idea -- never, insofar as I can tell, discussed in statistics texts -- is that it is the unit over which replication must occur for statistical calculations to provide scientifically meaningful estimates of experimental variability. Confusion or ignorance of the experimental unit means that statistically "significant" (or if you prefer, Bayesian versions thereof) results are based on pseudo-replicates that underestimate the true experimental variability and therefore lead to non-replicable results. A widespread example of this phenomenon with which I'm familiar occurs in animal experiments (think mice) in which several animals are housed together in a single cage, either for cost or animal care reasons (if the animals are social and need to live that way). So suppose one has, say 6 mice per cage, and one treats 2 cages = 12 mice with some sort of drug (e.g., these might be genetically engineered mice that express a human disease like diabetes,say) and 2 cages = 12 mice that are negative controls (receive a placebo). For logistical and perhaps scientific reasons all mice in a cage must receive the same treatment. So the experimental unit is the cage. The 6 mice in a cage are basically repeated measurements of the experimental unit.But have a look at the biological literature to see how such data are typically analyzed -- you'll most likely see 22 df for replicates. Of course mixed or other hierarchical models are out of the question with only 2 df for cage to cage variability. Do cage level effects really exist? Ask folks involved in such experiments about how a single overly aggressive animal (say) in a cage can skew results. To be fair, many scientists I know understand these issues, at least intuitively. That is, they are well aware how cage effects can screw things up. Nevertheless, they still worship at the P-value altar without understanding the nature of the bias. /> end rant; dismount soapbox. Due to the nature of my comments, vigorous and even impolite disagreement is acceptable -- as are private, offlist fusillades. I will not try to defend myself on list, though I will publicly acknowledge any error. Cheers, Bert> Surprisingly, it's mostly the not-so-top papers that cause problems here. > When Lancet, New English or BMJ reject some statistical argument, they have > good reasons.Oh? Data to support this assertion, please.> > Dieter >-- Bert Gunter Genentech Nonclinical Biostatistics (But these are my own views, of course, not that of my organization).