Hi list, i've got a question about the chisq.test function. in the use of the "given probabilities" method (p= ...), normally there should be typed in probabilities in the range of 0 to 1 with the absolute sum of 1.0 (r-help) But it is possible to use probabilities > than 1. or the sum <1.! without any warning message Ok, now the question, what does r calcutate in these cases, this doesn't make sense in my (poor statistical) view. i thought it might calculate relations, but it differs in the results (example: of typing p=c(1,1,1,1), p=c(6,6,6,6), p=c(0.25,0.25,0.25,0.25)) can someone tell my about this chisq method, and perhaps show me an explaining example? thank you, Alex Keller
Alexander Keller wrote:> i've got a question about the chisq.test function. in the use of the > "given probabilities" method (p= ...), normally there should be > typed in probabilities in the range of 0 to 1 with the absolute sum > of 1.0 (r-help) But it is possible to use probabilities > than 1. or > the sum <1.! without any warning message. . .> Ok, now the question, what does r calcutate in these cases, > this doesn't make sense in my (poor statistical) view. > > can someone tell my about this chisq method, and perhaps show me an > explaining example?Take a look at the code; it just sets E <- n*p, and then calculates sum((x-E)^2/E). This sum is calculable even if p is not a vector of probabilities. And you are quite right; it ***doesn't*** make any sense in those circumstances. This might be termed a bug; of course only a silly user would supply a p argument that wasn't a vector of probabilities .... but there are a lot of silly users out there/here! :-) (A p could arise from some other calculations where things could go wrong in an unforseen way ....) At the very least the code fails to practice ``safe statistical computing''. IMHO there ought be a check in the code to make sure that the vector p makes sense --- perhaps renormalizing it to sum to 1 (if all entries are positive) along the lines of sample(). This would be very easy to write --- I'd volunteer, except that I'm sure that the R core team would disdain my assistance. cheers, Rolf Turner rolf at math.unb.ca
On 01-Dec-04 Alexander Keller wrote:> Hi list, > > i've got a question about the chisq.test function. > in the use of the "given probabilities" method (p= ...), normally > there should be typed in probabilities in the range of 0 to 1 with the > absolute sum of 1.0 (r-help) > But it is possible to use probabilities > than 1. or the sum <1.! > without any warning message > > Ok, now the question, what does r calcutate in these cases, > this doesn't make sense in my (poor statistical) view. > > i thought it might calculate relations, but it differs in the results > (example: of typing p=c(1,1,1,1), p=c(6,6,6,6), > p=c(0.25,0.25,0.25,0.25)) > > can someone tell my about this chisq method, and perhaps show me an > explaining example?Experiment shows: x<-c(6,6,6,6,6,6) P<-c(6,6,6,6,6,6) chisq.test(x,p=P) Chi-squared test for given probabilities data: x X-squared = 1225, df = 5, p-value = < 2.2e-16 sum(((x-36*x)^2)/(36*6)) [1] 1225 that chisq.test(x,p=P) appears to calculate sum( ((x - n*P)^2)/(n*P) ) whether the P sum to 1 or not (where n = sum(x)). So you'd better make sure that they do! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 01-Dec-04 Time: 21:42:11 ------------------------------ XFMail ------------------------------