allwrigh at maths.ox.ac.uk
2009-Dec-15 17:05 UTC
[Rd] apparently incorrect p-values from 2-sided Kolmogorov-Smirnov test (PR#14145)
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --1132542651-1468968864-1260896436=:8788 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII I am using R version 2.6.2 (2008-02-08) on a Ubuntu Linux system. I seemed to be finding occasional errors in the p-values produced by ks.test(a,b) when a and b were vectors of 10 to 20 values. So I tried to find simpler examples, and one that illustrates what I mean is included below and as the attached typescript file. It is just x<-1:5 y<-c(2.5,4.5) ks.test(x,y) The value of the D_2,5 statistic is calculated as 0.4 correctly, but the p-value is stated by R as 1, though in fact it should be 20/21=0.9524 As I say this is just one example of quite a lot of errors of this kind that I am finding. If you want more examples I can try to be more systematic about tracking them down. Thank you, David Allwright. ----------------------------------------------------------------------------- Dr DJ Allwright E-mail : allwrigh at maths.ox.ac.uk OCIAM(Oxford Centre for Industrial and Applied Mathematics) Mathematical Institute Telephone : [+44](0)1865-280604 24-29 St.Giles' FAX -270515 Oxford OX1 3LB, UK. ----------------------------------------------------------------------------- Script started on Tue 15 Dec 2009 16:49:46 GMT ]0;swamp-thing: ~/smithinst/be2swamp-thing:be2% R R version 2.6.2 (2008-02-08) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.> x<-1:5 > y<-c(2.5,4.5) > ks.test(x,y)Two-sample Kolmogorov-Smirnov test data: x and y D = 0.4, p-value = 1 alternative hypothesis: two-sided> q()Save workspace image? [y/n/c]: n ]0;swamp-thing: ~/smithinst/be2swamp-thing:be2% exit exit Script done on Tue 15 Dec 2009 16:50:13 GMT ------------------------------------------------------------------------------ --1132542651-1468968864-1260896436=:8788 Content-Type: TEXT/PLAIN; charset=US-ASCII; name=typescript Content-Transfer-Encoding: BASE64 Content-ID: <alpine.DEB.1.00.0912151700360.8788 at swamp-thing.maths.ox.ac.uk> Content-Description: Content-Disposition: attachment; filename=typescript U2NyaXB0IHN0YXJ0ZWQgb24gVHVlIDE1IERlYyAyMDA5IDE2OjQ5OjQ2IEdN VA0KG10wO3N3YW1wLXRoaW5nOiB+L3NtaXRoaW5zdC9iZTIHc3dhbXAtdGhp bmc6YmUyJSBSDQ0NCg0NClIgdmVyc2lvbiAyLjYuMiAoMjAwOC0wMi0wOCkN DQpDb3B5cmlnaHQgKEMpIDIwMDggVGhlIFIgRm91bmRhdGlvbiBmb3IgU3Rh dGlzdGljYWwgQ29tcHV0aW5nDQ0KSVNCTiAzLTkwMDA1MS0wNy0wDQ0KDQ0K UiBpcyBmcmVlIHNvZnR3YXJlIGFuZCBjb21lcyB3aXRoIEFCU09MVVRFTFkg Tk8gV0FSUkFOVFkuDQ0KWW91IGFyZSB3ZWxjb21lIHRvIHJlZGlzdHJpYnV0 ZSBpdCB1bmRlciBjZXJ0YWluIGNvbmRpdGlvbnMuDQ0KVHlwZSAnbGljZW5z ZSgpJyBvciAnbGljZW5jZSgpJyBmb3IgZGlzdHJpYnV0aW9uIGRldGFpbHMu DQ0KDQ0KICBOYXR1cmFsIGxhbmd1YWdlIHN1cHBvcnQgYnV0IHJ1bm5pbmcg aW4gYW4gRW5nbGlzaCBsb2NhbGUNDQoNDQpSIGlzIGEgY29sbGFib3JhdGl2 ZSBwcm9qZWN0IHdpdGggbWFueSBjb250cmlidXRvcnMuDQ0KVHlwZSAnY29u dHJpYnV0b3JzKCknIGZvciBtb3JlIGluZm9ybWF0aW9uIGFuZA0NCidjaXRh dGlvbigpJyBvbiBob3cgdG8gY2l0ZSBSIG9yIFIgcGFja2FnZXMgaW4gcHVi bGljYXRpb25zLg0NCg0NClR5cGUgJ2RlbW8oKScgZm9yIHNvbWUgZGVtb3Ms ICdoZWxwKCknIGZvciBvbi1saW5lIGhlbHAsIG9yDQ0KJ2hlbHAuc3RhcnQo KScgZm9yIGFuIEhUTUwgYnJvd3NlciBpbnRlcmZhY2UgdG8gaGVscC4NDQpU eXBlICdxKCknIHRvIHF1aXQgUi4NDQoNDQo+IHg8LTE6NQ0NCj4geTwtYygy LjUsNC41KQ0NCj4ga3MudGVzdCh4LHkpDQ0KDQ0KCVR3by1zYW1wbGUgS29s bW9nb3Jvdi1TbWlybm92IHRlc3QNDQoNDQpkYXRhOiAgeCBhbmQgeSANDQpE ID0gMC40LCBwLXZhbHVlID0gMQ0NCmFsdGVybmF0aXZlIGh5cG90aGVzaXM6 IHR3by1zaWRlZCANDQoNDQo+IHEoKQ0NClNhdmUgd29ya3NwYWNlIGltYWdl PyBbeS9uL2NdOiBuDQ0KG10wO3N3YW1wLXRoaW5nOiB+L3NtaXRoaW5zdC9i ZTIHc3dhbXAtdGhpbmc6YmUyJSBleGl0DQ0NCmV4aXQNDQoNClNjcmlwdCBk b25lIG9uIFR1ZSAxNSBEZWMgMjAwOSAxNjo1MDoxMyBHTVQNCg= --1132542651-1468968864-1260896436=:8788--
tlumley at u.washington.edu
2009-Dec-16 17:32 UTC
[Rd] apparently incorrect p-values from 2-sided Kolmogorov-Smirnov test (PR#14145)
On Tue, 15 Dec 2009, allwrigh at maths.ox.ac.uk wrote; (in part)> > x<-1:5 > y<-c(2.5,4.5) > ks.test(x,y) > > The value of the D_2,5 statistic is calculated as 0.4 correctly, but the > p-value is stated by R as 1, though in fact it should be 20/21=0.9524What we seem to have here is a rounding error problem. In ks.c:psmirnov2x, there is a double loop including if(fabs(i / md - j / nd) > q) u[j] = 0; where md=2, nd=5, and q=3/10. Now, to full precision abs(1/2 - 4/5) > 3/10 is false, but at least on my MacBook it is true in C double precision. I'm not sure why the loop is working with doubles, since multiplying by m*n should make everything an integer. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
tlumley at u.washington.edu
2009-Dec-18 16:26 UTC
[Rd] apparently incorrect p-values from 2-sided Kolmogorov-Smirnov test (PR#14145)
I've fixed this by adding 0.5/mn to q. The problem (at least in principle) with multiplying them all up is integer overflow. By the time 0.5/mn underflows to zero, missing one value in the distribution won't matter. -thomas On Fri, 18 Dec 2009, David John Allwright wrote:> Dear Thomas, Right, thank you. Yes, I haven't looked at the source code > (because I don't know C) but something like what you mention could well cause > the kind of problems I am seeing: a loop being exectued one too few or one > too many times. And yes, I think those quantities should be multiplied up by > m*n to all become integers so we escape rounding error problems. David. > ------------------------------------------------------------------------------ > On Wed, 16 Dec 2009, tlumley at u.washington.edu wrote: > >> On Tue, 15 Dec 2009, allwrigh at maths.ox.ac.uk wrote; (in part) >> >>> >>> x<-1:5 >>> y<-c(2.5,4.5) >>> ks.test(x,y) >>> >>> The value of the D_2,5 statistic is calculated as 0.4 correctly, but the >>> p-value is stated by R as 1, though in fact it should be 20/21=0.9524 >> >> >> What we seem to have here is a rounding error problem. >> >> In ks.c:psmirnov2x, there is a double loop including >> if(fabs(i / md - j / nd) > q) >> u[j] = 0; >> >> where md=2, nd=5, and q=3/10. >> >> Now, to full precision abs(1/2 - 4/5) > 3/10 is false, but at least on my >> MacBook it is true in C double precision. >> >> I'm not sure why the loop is working with doubles, since multiplying by m*n >> should make everything an integer. >> >> -thomas >> >> Thomas Lumley Assoc. Professor, Biostatistics >> tlumley at u.washington.edu University of Washington, Seattle >> >> >> >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Reasonably Related Threads
- apparently incorrect p-values from 2-sided Kolmogorov-Smirnov (PR#14158)
- apparently incorrect p-values from 2-sided Kolmogorov-Smirnov (PR#14178)
- apparently incorrect p-values from 2-sided Kolmogorov-Smirnov (PR#14157)
- Kolmogorov-Smirnov: calculate p value given as input the test statistic
- ks.test - The two-sample two-sided Kolmogorov-Smirnov test with ties (PR#13848)