Renaud Gaujoux
2010-Sep-10 10:46 UTC
[Rd] Non identical numerical results from R code vs C/C++ code?
Hi, suppose you have two versions of the same algorithm: one in pure R, the other one in C/C++ called via .Call(). Assuming there is no bug in the implementations (i.e. they both do the same thing), is there any well known reason why the C/C++ implementation could return numerical results non identical to the one obtained from the pure R code? (e.g. could it be rounding errors? please explain.) Has anybody had a similar experience? By not identical, I mean very small differences (< 2.4 e-14), but enough to have identical() returning FALSE. Maybe I should not bother, but I want to be sure where the differences come from, at least by mere curiosity. Briefly the R code perform multiple matrix product; the C code is an optimization of those specific products via custom for loops, where entries are not computed in the same order, etc... which improves both memory usage and speed. The result is theoretically the same. Thank you, Renaud -- Renaud Gaujoux Computational Biology - University of Cape Town South Africa ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ###
Duncan Murdoch
2010-Sep-10 11:00 UTC
[Rd] Non identical numerical results from R code vs C/C++ code?
On 10/09/2010 6:46 AM, Renaud Gaujoux wrote:> Hi, > > suppose you have two versions of the same algorithm: one in pure R, the > other one in C/C++ called via .Call(). > Assuming there is no bug in the implementations (i.e. they both do the > same thing), is there any well known reason why the C/C++ implementation > could return numerical results non identical to the one obtained from > the pure R code? (e.g. could it be rounding errors? please explain.) > Has anybody had a similar experience?R often uses extended reals (80 bit floating point values on Intel chips) for intermediate values. C compilers may or may not do that.> > By not identical, I mean very small differences (< 2.4 e-14), but enough > to have identical() returning FALSE. Maybe I should not bother, but I > want to be sure where the differences come from, at least by mere curiosity. > > Briefly the R code perform multiple matrix product; the C code is an > optimization of those specific products via custom for loops, where > entries are not computed in the same order, etc... which improves both > memory usage and speed. The result is theoretically the same.Changing the order of operations will often affect rounding. For example, suppose epsilon is the smallest number such that 1 + epsilon is not equal to 1. Then 1 + (epsilon/2) + (epsilon/2) will evaluate to either 1 or 1 + epsilon, depending on the order of computing the additions. Duncan Murdoch> > Thank you, > Renaud >
Barry Rowlingson
2010-Sep-10 11:24 UTC
[Rd] Non identical numerical results from R code vs C/C++ code?
On Fri, Sep 10, 2010 at 11:46 AM, Renaud Gaujoux <renaud at mancala.cbio.uct.ac.za> wrote:> Hi, > > suppose you have two versions of the same algorithm: one in pure R, the > other one in C/C++ called via .Call(). > Assuming there is no bug in the implementations (i.e. they both do the same > thing), is there any well known reason why the C/C++ implementation could > return numerical results non identical to the one obtained from the pure R > code? (e.g. could it be rounding errors? please explain.) > Has anybody had a similar experience? > > By not identical, I mean very small differences (< 2.4 e-14), but enough to > have identical() returning FALSE. Maybe I should not bother, but I want to > be sure where the differences come from, at least by mere curiosity. > > Briefly the R code perform multiple matrix product; the C code is an > optimization of those specific products via custom for loops, where entries > are not computed in the same order, etc... which improves both memory usage > and speed. The result is theoretically the same.I've had almost exactly this situation recently with an algorithm I first implemented in R and then in C. Guess what the problem was? Yes, a bug in the C code. At first I thought everything was okay because the returned values were close-ish, and I thought 'oh, rounding error', but I wasn't happy about that. So I dug a bit deeper. This is worth doing even if you are sure there no bugs in the C code (remember: there's always one more bug). First, construct a minimal dataset so you can demonstrate the problem with a manageable size of matrix. I went down to 7 data points. Then get the C to print out its inputs. Identical, and as expected? Good. Now debug intermediate calculations, printing or saving from C and checking they are the same as the intermediate calculations from R. If possible, try returning intermediate calculations in C and checking identical() with R intermediates. Eventually you will find out where the first diverge - and this is the important bit. It's no use just knowing the final answers come out different, it's likely your answer has a sensitive dependence on initial conditions. You need to track down when the first bits are different. Or it could be rounding error... Barry
Paul Gilbert
2010-Sep-10 14:07 UTC
[Rd] Non identical numerical results from R code vs C/C++ code?
With fortran I have always managed to be able to get identical results on the same computer with the same OS. You will have trouble if you switch OS or hardware, or try the same hardware and OS with different math libraries. All the real calculations need to be double, even intermediate variables. Also, I've had trouble with arrays not being initialized to double 0.0. If you initialize to single 0.0 the straggling bits can cause differences. You also need to be careful about conversion from integer to real, to do double conversion. I'm not sure about C, but I would guess there are some of the same problems. Paul>-----Original Message----- >From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r- >project.org] On Behalf Of Renaud Gaujoux >Sent: September 10, 2010 6:47 AM >To: r-devel at r-project.org >Subject: [Rd] Non identical numerical results from R code vs C/C++code?> >Hi, > >suppose you have two versions of the same algorithm: one in pure R, the >other one in C/C++ called via .Call(). >Assuming there is no bug in the implementations (i.e. they both do the >same thing), is there any well known reason why the C/C++implementation>could return numerical results non identical to the one obtained from >the pure R code? (e.g. could it be rounding errors? please explain.) >Has anybody had a similar experience? > >By not identical, I mean very small differences (< 2.4 e-14), butenough>to have identical() returning FALSE. Maybe I should not bother, but I >want to be sure where the differences come from, at least by mere >curiosity. > >Briefly the R code perform multiple matrix product; the C code is an >optimization of those specific products via custom for loops, where >entries are not computed in the same order, etc... which improves both >memory usage and speed. The result is theoretically the same. > >Thank you, >Renaud > >-- >Renaud Gaujoux >Computational Biology - University of Cape Town >South Africa > > > > >### >UNIVERSITY OF CAPE TOWN > >This e-mail is subject to the UCT ICT policies and e-mail disclaimer >published on our website at >http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from >+27 21 650 4500. This e-mail is intended only for the person(s) to whom >it is addressed. If the e-mail has reached you in error, please notify >the author. If you are not the intended recipient of the e-mail you may >not use, disclose, copy, redirect or print the content. If this e-mail >is not related to the business of UCT it is sent by the sender in the >sender's individual capacity. > >### > >______________________________________________ >R-devel at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel=================================================================================== La version fran?aise suit le texte anglais. ------------------------------------------------------------------------------------ This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. ------------------------------------------------------------------------------------ Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ? l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre ordinateur toute copie du courriel re?u.
Apparently Analagous Threads
- doMC - compiler - concatenate an expression vector into a single expression?
- Defining a method that behaves like '$'?
- R cmd check and multicore foreach loop
- R CMD check and error in an \Sexpr in an Rd file
- getNativeSymbolInfo("user_unif_rand") returns different results on windows and linux