Hi, this is more a general statistics question I think. I am working on a system which automatically answers user questions (such systems are commonly called "Question Answering systems"). I evaluated different versions of the same system on a publicly available test set. This set contains 500 question. Naturally, for each question the answer can be wrong or right, which is coded as "0" (wrong) or "1" (correct). By adding up all values, and dividing them by the number of questions in the test set (that's 500), one gets a measure for how well the system performs, commonly called accuracy. As mentioned I evaluated two different versions of the system, and received two different accuracy values. Now I want to know whether the difference is statistically significant. Can I use a t-test? I know it has certain requirements, for example a somewhat normal distribution. That's difficult of course when the values in question are only "0" and "1"... Has anybody any ideas? Thanks a lot, Mika PS: The data I have looks something like this (of course I actually have 500 values, not only 10): results1: 0,1,1,1,0,1,1,0,1,0 accuracy: 0.6 results2: 0,0,1,1,0,0,1,1,1,0 accuracy: 0.5 -- View this message in context: http://www.nabble.com/Help-with-significance.-T-test--tp24699690p24699690.html Sent from the R help mailing list archive at Nabble.com.
mik07 wrote:> > Hi, > > this is more a general statistics question I think. > > I am working on a system which automatically answers user questions (such > systems are commonly called "Question Answering systems"). > I evaluated different versions of the same system on a publicly available > test set. > This set contains 500 question. Naturally, for each question the answer > can be wrong or right, which is coded as "0" (wrong) or "1" (correct). By > adding up all values, and dividing them by the number of questions in the > test set (that's 500), one gets a measure for how well the system > performs, commonly called accuracy. > As mentioned I evaluated two different versions of the system, and > received two different accuracy values. Now I want to know whether the > difference is statistically significant. >?prop.test -- View this message in context: http://www.nabble.com/Help-with-significance.-T-test--tp24699690p24701848.html Sent from the R help mailing list archive at Nabble.com.
Look up the McNemar test. That sounds right... Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von mik07 Gesendet: Tuesday, July 28, 2009 10:49 AM An: r-help at r-project.org Betreff: [R] Help with significance. T-test? Hi, this is more a general statistics question I think. I am working on a system which automatically answers user questions (such systems are commonly called "Question Answering systems"). I evaluated different versions of the same system on a publicly available test set. This set contains 500 question. Naturally, for each question the answer can be wrong or right, which is coded as "0" (wrong) or "1" (correct). By adding up all values, and dividing them by the number of questions in the test set (that's 500), one gets a measure for how well the system performs, commonly called accuracy. As mentioned I evaluated two different versions of the system, and received two different accuracy values. Now I want to know whether the difference is statistically significant. Can I use a t-test? I know it has certain requirements, for example a somewhat normal distribution. That's difficult of course when the values in question are only "0" and "1"... Has anybody any ideas? Thanks a lot, Mika PS: The data I have looks something like this (of course I actually have 500 values, not only 10): results1: 0,1,1,1,0,1,1,0,1,0 accuracy: 0.6 results2: 0,0,1,1,0,0,1,1,1,0 accuracy: 0.5 -- View this message in context: http://www.nabble.com/Help-with-significance.-T-test--tp24699690p24699690.ht ml Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Mika Are you familiar with item response theory? You might consider functions in ltm or MiscPsycho for dealing with binary response data.> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of mik07 > Sent: Tuesday, July 28, 2009 10:49 AM > To: r-help at r-project.org > Subject: [R] Help with significance. T-test? > > > Hi, > > this is more a general statistics question I think. > > I am working on a system which automatically answers user > questions (such systems are commonly called "Question > Answering systems"). > I evaluated different versions of the same system on a > publicly available test set. > This set contains 500 question. Naturally, for each question > the answer can be wrong or right, which is coded as "0" > (wrong) or "1" (correct). By adding up all values, and > dividing them by the number of questions in the test set > (that's 500), one gets a measure for how well the system > performs, commonly called accuracy. > As mentioned I evaluated two different versions of the > system, and received two different accuracy values. Now I > want to know whether the difference is statistically significant. > > Can I use a t-test? I know it has certain requirements, for > example a somewhat normal distribution. That's difficult of > course when the values in question are only "0" and "1"... > > > Has anybody any ideas? > > Thanks a lot, > Mika > > > PS: > > The data I have looks something like this (of course I > actually have 500 values, not only 10): > > results1: 0,1,1,1,0,1,1,0,1,0 accuracy: 0.6 > results2: 0,0,1,1,0,0,1,1,1,0 accuracy: 0.5 > -- > View this message in context: > http://www.nabble.com/Help-with-significance.-T-test--tp24699690p24699690.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >