Paul Johnson
2009-Aug-06 19:17 UTC
[Rd] About numerical accuracy: should there be a rational number class for fractions?
Hello, useRs. I was monitoring the recent thread about the numerical (in)accuracy of the C pow function and the apparent mismatch between R and matlab. That particular problem has been addressed, but it seems there are many others awaiting. I'm wondering if there is advice for package writers & prospective R contributors about avoiding the most common sorts of numerical mistakes. Has this been discussed here? I've googled, finding naught, but perhaps I do not know the magic words. I have seen two approaches to the numerical accuracy problem associated with fractions. I'm not a computer scientist, so perhaps I over-simplify in interpreting the result. Both of these approaches seem to recommend we put off doing division until the last possible moment. One is associated with a computer simulation article Luis R. Izquierdo and J. Gary Polhill (2006) Is Your Model Susceptible to Floating-Point Errors? Journal of Artificial Societies and Social Simulation vol. 9, no. 4 <http://jasss.soc.surrey.ac.uk/9/4/4.html> J. Gary Polhill, Luis R. Izquierdo and Nicholas M. Gotts (2005) The Ghost in the Model (and Other Effects of Floating Point Arithmetic) Journal of Artificial Societies and Social Simulation vol. 8, no. 1 <http://jasss.soc.surrey.ac.uk/8/1/5.html> And the other approach can be found in the Gambit program, which is a tool for use in computing Nash (and other types of) equilibria in game theory. (http://gambit.sourceforge.net/). Gambit was an NSF funded project led by Dr. Richard McKelvey, who was one of the leading mathematical political scientists in the US (before his untimely death). It has been continued by his students, primarily Ted Turocy. In Gambit, great care is taken to avoid any floating point calculations involving fractions until the latest possible step. There is a rational number class and fractions are added and subtracted in a more exact way. I think it "lowest common denominator" or similar concept. (I remember that detail mainly because the switch from 32 bit to 64 bit integers caused memory allocation troubles). (Come to think of it, I believe the famous problem of poor variance calculation in Microsoft Excel falls into the same category: A. Talha Yalta, "The accuracy of statistical distributions in Microsoft? Excel 2007" Computational Statistics & Data Analysis Volume 52, Issue 10, 15 June 2008, Pages 4579-4586 Use of Excel for Statistical Analysis Neil Cox, Statistician, AgResearch Ruakura http://www.agresearch.co.nz/Science/Statistics/exceluse1.htm avoid division until the last possible moment!) In R, we could protect ourselves somewhat if there were a class for handling rational numbers. Instead of taking a value like 2/3 and letting the floating point calculation occur, a function could create a "rational number object" myfrac <- rational(2, 3) And if I'm understanding the inheritance logic of objects in S 4 classes, then the rational class could implement +, -, / . Well, I'm just wondering what you think about this question. I don't think I could write this class, but I might be able to help somebody refine and test such a thing if it existed. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas