Hello list, I have used scan function to import data into R. I have done some analysis and find strange results. I have found my problem : when importing data with scan, this can slightly modify the data : > write(c(0.251,3.399,-0.481,0.266),"essai.txt") > scan("essai.txt") Read 4 items [1] 0.251 3.399 -0.481 0.266 > print(scan("essai.txt"),17) Read 4 items [1] 0.25100000000000000 3.39900000000000000 -0.48099999999999998 0.26600000000000001 Is it normal ? Is it a bug ? thanks in advance, Sincerely. > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 8.1 year 2003 month 11 day 21 language R St?phane DRAY -------------------------------------------------------------------------------------------------- D?partement des Sciences Biologiques Universit? de Montr?al, C.P. 6128, succursale centre-ville Montr?al, Qu?bec H3C 3J7, Canada Tel : 514 343 6111 poste 1233 E-mail : stephane.dray at umontreal.ca -------------------------------------------------------------------------------------------------- Web http://www.steph280.freesurf.fr/
On Wed, 31 Mar 2004 12:24:38 -0500, Stephane DRAY <stephane.dray at umontreal.ca> wrote :>Hello list, >I have used scan function to import data into R. I have done some analysis >and find strange results. I have found my problem : when importing data >with scan, this can slightly modify the data : > > > write(c(0.251,3.399,-0.481,0.266),"essai.txt") > > scan("essai.txt") >Read 4 items >[1] 0.251 3.399 -0.481 0.266 > > print(scan("essai.txt"),17) >Read 4 items >[1] 0.25100000000000000 3.39900000000000000 >-0.48099999999999998 0.26600000000000001 > > > >Is it normal ? Is it a bug ?I think it's normal. Floating point formats aren't exact except for fractions with only powers of 2 in the denominator. There is no way to represent any of your values in the formats that R uses without slight errors. I do notice one oddity in the print routines in R:> x<-scan()1: 0.266 2: 0.251 3: Read 2 items> print(x,17)[1] 0.26600000000000001 0.25100000000000000> x<-scan()1: 0.266 2: Read 1 items> print(x,17)[1] 0.266 I don't know why the second print() prints 0.266 differently from the first one. (This is in the 1.9.0 beta in Windows). Duncan Murdoch
Stephane DRAY wrote:> > Hello list, > I have used scan function to import data into R. I have done some analysis > and find strange results. I have found my problem : when importing data > with scan, this can slightly modify the data : > > > write(c(0.251,3.399,-0.481,0.266),"essai.txt") > > scan("essai.txt") > Read 4 items > [1] 0.251 3.399 -0.481 0.266 > > print(scan("essai.txt"),17) > Read 4 items > [1] 0.25100000000000000 3.39900000000000000 > -0.48099999999999998 0.26600000000000001 > > Is it normal ? Is it a bug ?It is normal, that there are no exact representations for floating point numbers in a computer (you have only a limited number of bits to represent them!). In this case, it is not scan(), but just the representation: Try out and type print(c(0.251,3.399,-0.481,0.266), 17) Uwe Ligges> thanks in advance, > Sincerely. > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 1 > minor 8.1 > year 2003 > month 11 > day 21 > language R > St?phane DRAY > -------------------------------------------------------------------------------------------------- > > D?partement des Sciences Biologiques > Universit? de Montr?al, C.P. 6128, succursale centre-ville > Montr?al, Qu?bec H3C 3J7, Canada > > Tel : 514 343 6111 poste 1233 > E-mail : stephane.dray at umontreal.ca > -------------------------------------------------------------------------------------------------- > > Web http://www.steph280.freesurf.fr/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
St?phane, in the example below which you are concerned about, the large correlation you see is not a result of the small variance, but rather the 3 random numbers you generated just happened to have the same rank ordering as the magnitudes of the three coefficients you were correlating them with. Just try your example again but repeat your correlation command cor(x[1,],runif(3)) several times in succession. You will probably see correlations ranging from large and positive to large (in absolute value) and negative to everywhere in between. Dan Nordlund ------------------Original message---------------------- In a message dated 3/31/2004 10:53:03 AM Pacific Standard Time, dray at biomserv.univ-lyon1.fr writes:>At 13:34 31/03/2004, Prof Brian Ripley wrote: > >>Take a look at formatReal. scientific thinks 0.251 has 17 digits and >>0.255 has 3. It really doesn't make any sense to ask for more precision >>than you have (.Machine$double.eps) and you do often get spurious >>errors if you attempt to do so. So 15 digits is normally safe, but no >>more. >> >>Note that there are decimal -> binary -> decimal conversions and you >>can't say which one introduced the small changes. > >I completely agree with you. My problem arise when I try to compute a >correlation. One of the variable seems to have equal values but it does >not. Hence, it has a very low variance and so when I try to compute the >correlation with another variable, this correlation is very high. I wonder >if it would not be good to introduce a tolerance threshold. Is it >meaningful to produce correlation when a variance is very low ? >See the example below : > >> essai=matrix(c(0.266,.234,.005,.481,.1,.009,.4,.155,.255,.2,.34,.43),4,3) >> essai2=sweep(essai,2,apply(essai,2,sum),"/") >> x=coef(lm(essai2~scale(runif(4)))) >> x > [,1] [,2] [,3] >(Intercept) 0.25000000 0.2500000 0.25000000 >scale(runif(4)) 0.05307906 0.1330111 0.06936634 >> cor(x[1,],runif(3)) >[1] 0.932772 >> var(x) > [,1] [,2] [,3] >[1,] 0.01938893 0.011518783 0.01778528 >[2,] 0.01151878 0.006843202 0.01056607 >[3,] 0.01778528 0.010566067 0.01631426 >> var(x[1,]) >[1] 1.92593e-33