This is basically a question about where to start looking for a problem. I have a program that gives slightly different results on two Windows computers. It is a reasonably complicated numerical optimisation, with iterative calls to optim(). The two computers both run Windows 2000. On each computer I get the same results in two different versions of R (1.5.1 and 1.6.0 on one, 1.5.1 and 1.6.1 on the other, the standard binaries), and the results are stable from run to run on each machine. There's nothing lurking in the workspace. One computer has a 2GHz Pentium 4 cpu, the other has a 0.75GHz Pentium 3. I think the problem is with the Pentium 4 machine, since it's giving occasional errors due to NaNs in internal parts of optim that I don't understand, but the fault could quite possibly be in my understanding. A good-quality dual Pentium 4 Linux system doesn't give these internal errors in optim and seems to give the same results as the Pentium 3 machine (I haven't checked that they are all identical). -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley@u.washington.edu University of Washington, Seattle
Thomas Lumley <tlumley@u.washington.edu> writes:> This is basically a question about where to start looking for a problem. > > I have a program that gives slightly different results on two Windows > computers. It is a reasonably complicated numerical optimisation, with > iterative calls to optim(). > > The two computers both run Windows 2000. On each computer I get the same > results in two different versions of R (1.5.1 and 1.6.0 on one, 1.5.1 and > 1.6.1 on the other, the standard binaries), and the results are stable > from run to run on each machine. There's nothing lurking in the workspace. > > One computer has a 2GHz Pentium 4 cpu, the other has a 0.75GHz Pentium 3. > I think the problem is with the Pentium 4 machine, since it's giving > occasional errors due to NaNs in internal parts of optim that I don't > understand, but the fault could quite possibly be in my understanding. A > good-quality dual Pentium 4 Linux system doesn't give these internal > errors in optim and seems to give the same results as the Pentium 3 > machine (I haven't checked that they are all identical).I believe that there's a lot of FP activity inside msvcrt.dll (if I remember the name correctly) so if that isn't the same between the machines, it might explain things. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Thomas Lumley wrote:> > This is basically a question about where to start looking for a problem. > > I have a program that gives slightly different results on two Windows > computers. It is a reasonably complicated numerical optimisation, with > iterative calls to optim().... I don't test much in Windows, but I've had a far amount of trouble like this with Linux. Not so much with optim(), but with some numerically ill conditioned problems I get results that are different in the fourth or fifth significant digit, whereas I typically expect my tests to be better than nine significant digits, and are often good to fourteen. In Solaris my test values have much tighter tolerances and were very stable for years, but changed a bit recently when I switched from svd and eigen to La.svd and La.eigen. The obvious potential culprit is nonBLAS/BLAS/ATLAS, but the Linux problem does not seem to be related to that. It is a bit like problems that used to occur when the lower order bits of doubles did not get zeroed, but the values from run to run on the same machine are too consistent for a random problem like that. If you figure out how to track this down, I would like to know. I was going to try and keep track of the values I get more automatically, but I'm not sure what information needs to be recorded. OS and R version are obvious, but I suspect the issue has more to do with math library versions. Paul Gilbert
On Wed, 4 Dec 2002 08:59:17 -0800 (PST), you wrote in message <Pine.A41.4.44.0212040841570.81760-100000@homer37.u.washington.edu>:> >This is basically a question about where to start looking for a problem. > >I have a program that gives slightly different results on two Windows >computers. It is a reasonably complicated numerical optimisation, with >iterative calls to optim(). > >The two computers both run Windows 2000. On each computer I get the same >results in two different versions of R (1.5.1 and 1.6.0 on one, 1.5.1 and >1.6.1 on the other, the standard binaries), and the results are stable >from run to run on each machine. There's nothing lurking in the workspace.I've recently been trying to track down problems with a couple of DLLs, and have turned up Windows bugs where common dialogs (file open, etc) reduce the floating point precision. The current development version has code to fix these (everywhere I could think to put it), but that's not in 1.6.1. I'll be putting these changes into 1.6.2 as well, but it's not in r-patched yet (since I didn't know there was going to be a 1.6.2). So if you're set up to do a Windows build, you could try compiling r-devel, and should get consistent results (hopefully matching at least one of the results you've seen!) Duncan