The cells are interpreted as counts, so by scaling you're analyzing a different experiment (one with fewer observations). So the chi-squared value will change (the terms (O-E)^2/E in the statistic scale linearly ignoring rounding and "Yates' continuity correction"). The chisq.test on the original data is a test of association. Conventionally you decide ahead of time on a threshold for "false positives", say 5%, then use the reported p-value to determine whether to accept or reject the null hypothesis of no association. Had you chosen 5%, since the reported p-value is smaller than 5%, you would reject, i.e., decide that association is present. Chisq.test is not really a measure of association. Your observation is a nice illustration of why. There are many measures of association (e.g., odds ratio); see for example Alan Agresti's "Categorical Data Analysis" for some discussion. Reid Huntsinger -----Original Message----- From: juli g. pausas [mailto:juli at ceam.es] Sent: Tuesday, July 30, 2002 12:12 PM To: r-help Subject: [R] chisq.test, basic question Dear R-users, I have a question, which I'm not sure if it is related to my misunderstanding of basic statistics, or my misunderstanding of R, or both. I've got the counts of a 2 x 2 contingency table, and I'd like to test the association: m <- matrix(c(15,28,32,135), 2, 2) colnames(m) <- c("R-", "R+"); rownames(m) <- c("P-", "P+") m # R- R+ # P- 15 32 # P+ 28 135 chisq.test(m) # X-squared = 4.0027, df = 1, p-value = 0.04543 Is this the correct way to test association between P and R? (I haven't got the original data). My problem is that if I use percentage, then I get different results: m2 <- 100*m/sum(m) # chisq.test(round(m2)) # X-squared = 1.5318, df = 1, p-value = 0.2158 Should this give about the same (a part from the rounding)? Should the degree of association between P and R be he same? Or, am I using chisq.test() wrongly? Thanks in advance, Juli -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear R-users, I have a question, which I?m not sure if it is related to my misunderstanding of basic statistics, or my misunderstanding of R, or both. I?ve got the counts of a 2 x 2 contingency table, and I'd like to test the association: m <- matrix(c(15,28,32,135), 2, 2) colnames(m) <- c("R-", "R+"); rownames(m) <- c("P-", "P+") m # R- R+ # P- 15 32 # P+ 28 135 chisq.test(m) # X-squared = 4.0027, df = 1, p-value = 0.04543 Is this the correct way to test association between P and R? (I haven?t got the original data). My problem is that if I use percentage, then I get different results: m2 <- 100*m/sum(m) # chisq.test(round(m2)) # X-squared = 1.5318, df = 1, p-value = 0.2158 Should this give about the same (a part from the rounding)? Should the degree of association between P and R be he same? Or, am I using chisq.test() wrongly? Thanks in advance, Juli -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>From help(chisq.test)If `x' is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative "INTEGERS". -----Original Message----- From: juli g. pausas [mailto:juli at ceam.es] Sent: Tuesday, July 30, 2002 12:12 PM To: r-help Subject: [R] chisq.test, basic question Dear R-users, I have a question, which I'm not sure if it is related to my misunderstanding of basic statistics, or my misunderstanding of R, or both. I've got the counts of a 2 x 2 contingency table, and I'd like to test the association: m <- matrix(c(15,28,32,135), 2, 2) colnames(m) <- c("R-", "R+"); rownames(m) <- c("P-", "P+") m # R- R+ # P- 15 32 # P+ 28 135 chisq.test(m) # X-squared = 4.0027, df = 1, p-value = 0.04543 Is this the correct way to test association between P and R? (I haven't got the original data). My problem is that if I use percentage, then I get different results: m2 <- 100*m/sum(m) # chisq.test(round(m2)) # X-squared = 1.5318, df = 1, p-value = 0.2158 Should this give about the same (a part from the rounding)? Should the degree of association between P and R be he same? Or, am I using chisq.test() wrongly? Thanks in advance, Juli -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
My previous reply (below) uses "false positive" in a particularly misleading way. I intended this to mean "incorrect rejection of the null hypothesis of no association". I succumbed to the temptation to call a "rejection of the null hypothesis of no association" a "positive" (cancelling a double negative?), but as it is a rejection (of no matter what) I should have called it a "negative". Reid Huntsinger -----Original Message----- From: Huntsinger, Reid [mailto:reid_huntsinger at merck.com] Sent: Tuesday, July 30, 2002 12:07 PM To: 'juli g. pausas'; r-help Subject: RE: [R] chisq.test, basic question The cells are interpreted as counts, so by scaling you're analyzing a different experiment (one with fewer observations). So the chi-squared value will change (the terms (O-E)^2/E in the statistic scale linearly ignoring rounding and "Yates' continuity correction"). The chisq.test on the original data is a test of association. Conventionally you decide ahead of time on a threshold for "false positives", say 5%, then use the reported p-value to determine whether to accept or reject the null hypothesis of no association. Had you chosen 5%, since the reported p-value is smaller than 5%, you would reject, i.e., decide that association is present. Chisq.test is not really a measure of association. Your observation is a nice illustration of why. There are many measures of association (e.g., odds ratio); see for example Alan Agresti's "Categorical Data Analysis" for some discussion. Reid Huntsinger -----Original Message----- From: juli g. pausas [mailto:juli at ceam.es] Sent: Tuesday, July 30, 2002 12:12 PM To: r-help Subject: [R] chisq.test, basic question Dear R-users, I have a question, which I'm not sure if it is related to my misunderstanding of basic statistics, or my misunderstanding of R, or both. I've got the counts of a 2 x 2 contingency table, and I'd like to test the association: m <- matrix(c(15,28,32,135), 2, 2) colnames(m) <- c("R-", "R+"); rownames(m) <- c("P-", "P+") m # R- R+ # P- 15 32 # P+ 28 135 chisq.test(m) # X-squared = 4.0027, df = 1, p-value = 0.04543 Is this the correct way to test association between P and R? (I haven't got the original data). My problem is that if I use percentage, then I get different results: m2 <- 100*m/sum(m) # chisq.test(round(m2)) # X-squared = 1.5318, df = 1, p-value = 0.2158 Should this give about the same (a part from the rounding)? Should the degree of association between P and R be he same? Or, am I using chisq.test() wrongly? Thanks in advance, Juli -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, your first use of chisq.test is correct. But by multiplying by 100 and dividing by sum(m) (210), you analyze different experiment (with fewer "observations") and, in general, this is a _gross_ mistake. In general, our example is (very basic, though) a well-known problem with statistical vs. practical "significance". Just try to chisq.test(2*m), chisq.test(3*m), etc. With sufficiently large sample it is almost sure (in practical, not mathematical meaning) that you get statistically significant difference even when practical, "real-life" difference is negligible. An trivial example: m<-matrix(c(100,101,110,115),2,2) #rows and cols are "practically" independent chisq.test(m) #X-squared = 0.0065, df = 1, p-value = 0.9357 chisq.test(10*m) #X-squared = 0.2823, df = 1, p-value = 0.5952 chisq.test(100*m) #X-squared = 3.1241, df = 1, p-value = 0.07714 chisq.test(1000*m) #X-squared = 31.551, df = 1, p-value = 1.943e-08 Therefore, your question about m2 is due to misunderstanding of math-statistical principles behind chisq.test. HTH, Jan ------------------------------------------------- designed for _monospaced_ font ------------------------------------------------- /- Jan Svatos, PhD Sokolovska 855/225 -/ /- Data Analyst Prague 9 -/ /- Eurotel Praha 190 00 -/ /- jan_svatos at eurotel.cz Czechia -/ ------------------------------------------------- - - - Original message: - - - From: owner-r-help at stat.math.ethz.ch Send: 30.7.2002 18:47:51 To: r-help <r-help at stat.math.ethz.ch> Subject: [R] chisq.test, basic question Dear R-users, I have a question, which I?m not sure if it is related to my misunderstanding of basic statistics, or my misunderstanding of R, or both. I?ve got the counts of a 2 x 2 contingency table, and I'd like to test the association: m <- matrix(c(15,28,32,135), 2, 2) colnames(m) <- c("R-", "R+"); rownames(m) <- c("P-", "P+") m # R- R+ # P- 15 32 # P+ 28 135 chisq.test(m) # X-squared = 4.0027, df = 1, p-value = 0.04543 Is this the correct way to test association between P and R? (I haven?t got the original data). My problem is that if I use percentage, then I get different results: m2 <- 100*m/sum(m) # chisq.test(round(m2)) # X-squared = 1.5318, df = 1, p-value = 0.2158 Should this give about the same (a part from the rounding)? Should the degree of association between P and R be he same? Or, am I using chisq.test() wrongly? Thanks in advance, Juli -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._