Polwart Calum (County Durham and Darlington NHS Foundation Trust)
2009-Aug-18 17:17 UTC
[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think)
I'm far from an expert on stats but what I think you are saying is if you try and compare Baseline with Version 3 you don't think your p-value is as good as version 1 and 2. I'm not 100% sure you are meant to do that with p-values but I'll let someone else comment on that!. total incorrect correct % correct baseline 898 708 190 21.2% version_1 898 688 210 23.4% version_2 898 680 218 24.3% version_3 1021 790 231 22.6%> > Here, the p value for version_3 (when compared with the baseline) seems to > make no sense whatsoever. It shouldn't be larger that the other two p > values, the increase in correct answers (that is what counts!) is bigger > after all. >No its not the raw numbers its the proportion of correct answers that counts. I've added a % correct to your data - does that make it clearer? Only 22.6% of version 3's answers were correct - so the difference in terms of % change is smaller than version 1 and 2 produced. From my niave persepctive I'd want to test for a difference between all results and baseline, and then v1 & v2, v1 & v3, v2 & v3 (you may tell me they are unsound things to test - in which case don't test them. You'd then need to determine a threshold for accepting that the test is valid (say p < 0.05). I'#d contest that the test should be two tailed - results could be better or worse? You should also develop a hypothesis. Let me create one for you: A. H1: version1 of the software is better than baseline (H0: version 1 is no better than baseline) B. H1: version2 of the software is better than version 1 (H0: version 2 is no better than version 1) C. H1: version3 of the software is better than version 2 (H0: version 3 is no better than version 2) Now look at you results and p-values and and work out if the H1 or H0 applies. You could develop further variants (D: version 3 is better than baseline). Finally - remember to consider the 'clinical significance' as well as the statistical significance. I'd have hoped a software change might have increase correct answers to say 40%? And remember also that p-value of 0.05 has a false positive rate of 1:20.> > Any idea what's going on here? I thought the sample size should have no > impact on the results? >Erm.. sample size always has an influence of results, If you show a difference in 100 samples - you would expect a larger p value for virtually any statistical test you chose than if you show that same difference in 1000 results. You have a bigger sample but a smaller overall difference so in effect you can be less sure that that change is not down to chance. (Purists statisticians will likely challenge that definition) ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:21}}