Hello list,
I have used scan function to import data into R. I have done some analysis
and find strange results. I have found my problem : when importing data
with scan, this can slightly modify the data :
> write(c(0.251,3.399,-0.481,0.266),"essai.txt")
> scan("essai.txt")
Read 4 items
[1] 0.251 3.399 -0.481 0.266
> print(scan("essai.txt"),17)
Read 4 items
[1] 0.25100000000000000 3.39900000000000000
-0.48099999999999998 0.26600000000000001
Is it normal ? Is it a bug ?
thanks in advance,
Sincerely.
> version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 1
minor 8.1
year 2003
month 11
day 21
language R
St?phane DRAY
--------------------------------------------------------------------------------------------------
D?partement des Sciences Biologiques
Universit? de Montr?al, C.P. 6128, succursale centre-ville
Montr?al, Qu?bec H3C 3J7, Canada
Tel : 514 343 6111 poste 1233
E-mail : stephane.dray at umontreal.ca
--------------------------------------------------------------------------------------------------
Web http://www.steph280.freesurf.fr/
On Wed, 31 Mar 2004 12:24:38 -0500, Stephane DRAY <stephane.dray at umontreal.ca> wrote :>Hello list, >I have used scan function to import data into R. I have done some analysis >and find strange results. I have found my problem : when importing data >with scan, this can slightly modify the data : > > > write(c(0.251,3.399,-0.481,0.266),"essai.txt") > > scan("essai.txt") >Read 4 items >[1] 0.251 3.399 -0.481 0.266 > > print(scan("essai.txt"),17) >Read 4 items >[1] 0.25100000000000000 3.39900000000000000 >-0.48099999999999998 0.26600000000000001 > > > >Is it normal ? Is it a bug ?I think it's normal. Floating point formats aren't exact except for fractions with only powers of 2 in the denominator. There is no way to represent any of your values in the formats that R uses without slight errors. I do notice one oddity in the print routines in R:> x<-scan()1: 0.266 2: 0.251 3: Read 2 items> print(x,17)[1] 0.26600000000000001 0.25100000000000000> x<-scan()1: 0.266 2: Read 1 items> print(x,17)[1] 0.266 I don't know why the second print() prints 0.266 differently from the first one. (This is in the 1.9.0 beta in Windows). Duncan Murdoch
Stephane DRAY wrote:> > Hello list, > I have used scan function to import data into R. I have done some analysis > and find strange results. I have found my problem : when importing data > with scan, this can slightly modify the data : > > > write(c(0.251,3.399,-0.481,0.266),"essai.txt") > > scan("essai.txt") > Read 4 items > [1] 0.251 3.399 -0.481 0.266 > > print(scan("essai.txt"),17) > Read 4 items > [1] 0.25100000000000000 3.39900000000000000 > -0.48099999999999998 0.26600000000000001 > > Is it normal ? Is it a bug ?It is normal, that there are no exact representations for floating point numbers in a computer (you have only a limited number of bits to represent them!). In this case, it is not scan(), but just the representation: Try out and type print(c(0.251,3.399,-0.481,0.266), 17) Uwe Ligges> thanks in advance, > Sincerely. > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 1 > minor 8.1 > year 2003 > month 11 > day 21 > language R > St?phane DRAY > -------------------------------------------------------------------------------------------------- > > D?partement des Sciences Biologiques > Universit? de Montr?al, C.P. 6128, succursale centre-ville > Montr?al, Qu?bec H3C 3J7, Canada > > Tel : 514 343 6111 poste 1233 > E-mail : stephane.dray at umontreal.ca > -------------------------------------------------------------------------------------------------- > > Web http://www.steph280.freesurf.fr/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
St?phane, in the example below which you are concerned about, the large correlation you see is not a result of the small variance, but rather the 3 random numbers you generated just happened to have the same rank ordering as the magnitudes of the three coefficients you were correlating them with. Just try your example again but repeat your correlation command cor(x[1,],runif(3)) several times in succession. You will probably see correlations ranging from large and positive to large (in absolute value) and negative to everywhere in between. Dan Nordlund ------------------Original message---------------------- In a message dated 3/31/2004 10:53:03 AM Pacific Standard Time, dray at biomserv.univ-lyon1.fr writes:>At 13:34 31/03/2004, Prof Brian Ripley wrote: > >>Take a look at formatReal. scientific thinks 0.251 has 17 digits and >>0.255 has 3. It really doesn't make any sense to ask for more precision >>than you have (.Machine$double.eps) and you do often get spurious >>errors if you attempt to do so. So 15 digits is normally safe, but no >>more. >> >>Note that there are decimal -> binary -> decimal conversions and you >>can't say which one introduced the small changes. > >I completely agree with you. My problem arise when I try to compute a >correlation. One of the variable seems to have equal values but it does >not. Hence, it has a very low variance and so when I try to compute the >correlation with another variable, this correlation is very high. I wonder >if it would not be good to introduce a tolerance threshold. Is it >meaningful to produce correlation when a variance is very low ? >See the example below : > >> essai=matrix(c(0.266,.234,.005,.481,.1,.009,.4,.155,.255,.2,.34,.43),4,3) >> essai2=sweep(essai,2,apply(essai,2,sum),"/") >> x=coef(lm(essai2~scale(runif(4)))) >> x > [,1] [,2] [,3] >(Intercept) 0.25000000 0.2500000 0.25000000 >scale(runif(4)) 0.05307906 0.1330111 0.06936634 >> cor(x[1,],runif(3)) >[1] 0.932772 >> var(x) > [,1] [,2] [,3] >[1,] 0.01938893 0.011518783 0.01778528 >[2,] 0.01151878 0.006843202 0.01056607 >[3,] 0.01778528 0.010566067 0.01631426 >> var(x[1,]) >[1] 1.92593e-33