Dear all,
I think that every number x in R can be represented in floating point
arithmetic as:
x = (-1)^s (1+f) 2^(e-1023)
where s is coded on 1 bit, e (positive integer) is coded on 11 bits, and
f (real in [0,1)) is coded on 52 bits.
Am I right?
We have f=\sum_{i=1}^{52} k_i 2^{-i} for some values k_i in {0,1}.
If this is the case (for the 52 bits), we should have:
The number next to 2^150 should be (-1)^02^150(1+2^(-52))=2^150+2^98
I can check this:
> a <- 2^150; b <- a + 2^97; b == a
[1] TRUE
> a <- 2^150; b <- a + 2^98; b == a
[1] FALSE
So it seems that the mantissa is really coded on 52 bits.
But now, if I issue the following commands (using some functions
provided below to translate from decimal to binary):
> dec2bin(0.1,52)
[1] "0.0001100110011001100110011001100110011001100110011001"
>
formatC(sum(as.numeric(strsplit(dec2bin(0.1,52),"")[[1]][-(1:2)])*2^(-(1:52))),50)
[1] "0.099999999999999866773237044981215149164199829101562"
> formatC(0.1,50)
[1] "0.1000000000000000055511151231257827021181583404541"
>
formatC(sum(as.numeric(strsplit(dec2bin(0.1,55),"")[[1]][-(1:2)])*2^(-(1:55))),50)
[1] "0.1000000000000000055511151231257827021181583404541"
> formatC(0.1,50)
[1] "0.1000000000000000055511151231257827021181583404541"
So now, using formatC() it seems that f is coded on 55 bits!
Do you have an explanation for this fact?
Many thanks!
Pierre
dec2bin.ent <- function(x) {
as.integer(paste(rev(as.integer(intToBits(x))), collapse=""))
}
dec2bin.frac <- function(x,prec=52) {
res <- rep(NA,prec)
for (i in 1:prec) {
res[i] <- as.integer(x*2)
x <- (x*2) %% 1
}
return(paste(res,collapse=""))
}
dec2bin <- function(x,prec=52) {
x <- as.character(x)
res <- strsplit(x,".",fixed=TRUE)[[1]]
return(paste(dec2bin.ent(as.numeric(res[1])),dec2bin.frac(as.numeric(paste("0.",res[2],sep="")),prec),sep="."))
}
--
Pierre Lafaye de Micheaux
Adresse courrier:
D?partement de Math?matiques et Statistique
Universit? de Montr?al
CP 6128, succ. Centre-ville
Montr?al, Qu?bec H3C 3J7
CANADA
Adresse physique:
D?partement de Math?matiques et Statistique
Bureau 4249, Pavillon Andr?-Aisenstadt
2920, chemin de la Tour
Montr?al, Qu?bec H3T 1J4
CANADA
T?l.: (00-1) 514-343-6607 / Fax: (00-1) 514-343-5700
lafaye at dms.umontreal.ca
http://www.biostatisticien.eu
Nordlund, Dan (DSHS/RDA)
2011-Nov-08 20:29 UTC
[R] Question about R mantissa and number of bits
I am not going through all of your code to understand what you are trying to demonstrate. R uses the IEEE Standard 754 for Floating Point Numbers. There is a sign bit, 11 bits for the exponent, and 52 bits for the mantissa. Because the standard normalizes the mantissa you get an extra bit of precision; i.e. you 53 bits of precision stored in 52 bits. You might want to read the following http://www.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Lafaye de Micheaux > Sent: Tuesday, November 08, 2011 10:49 AM > To: r-help at r-project.org > Subject: [R] Question about R mantissa and number of bits > > Dear all, > > I think that every number x in R can be represented in floating point > arithmetic as: > x = (-1)^s (1+f) 2^(e-1023) > where s is coded on 1 bit, e (positive integer) is coded on 11 bits, > and > f (real in [0,1)) is coded on 52 bits. > Am I right? > > We have f=\sum_{i=1}^{52} k_i 2^{-i} for some values k_i in {0,1}. > > If this is the case (for the 52 bits), we should have: > > The number next to 2^150 should be (-1)^02^150(1+2^(-52))=2^150+2^98 > I can check this: > > a <- 2^150; b <- a + 2^97; b == a > [1] TRUE > > a <- 2^150; b <- a + 2^98; b == a > [1] FALSE > > So it seems that the mantissa is really coded on 52 bits. > > But now, if I issue the following commands (using some functions > provided below to translate from decimal to binary): > > dec2bin(0.1,52) > [1] "0.0001100110011001100110011001100110011001100110011001" > > > formatC(sum(as.numeric(strsplit(dec2bin(0.1,52),"")[[1]][-(1:2)])*2^(- > (1:52))),50) > [1] "0.099999999999999866773237044981215149164199829101562" > > formatC(0.1,50) > [1] "0.1000000000000000055511151231257827021181583404541" > > > formatC(sum(as.numeric(strsplit(dec2bin(0.1,55),"")[[1]][-(1:2)])*2^(- > (1:55))),50) > [1] "0.1000000000000000055511151231257827021181583404541" > > formatC(0.1,50) > [1] "0.1000000000000000055511151231257827021181583404541" > > So now, using formatC() it seems that f is coded on 55 bits! > > Do you have an explanation for this fact? > > Many thanks! > > Pierre > > > dec2bin.ent <- function(x) { > as.integer(paste(rev(as.integer(intToBits(x))), collapse="")) > } > > dec2bin.frac <- function(x,prec=52) { > res <- rep(NA,prec) > for (i in 1:prec) { > res[i] <- as.integer(x*2) > x <- (x*2) %% 1 > } > return(paste(res,collapse="")) > } > > dec2bin <- function(x,prec=52) { > x <- as.character(x) > res <- strsplit(x,".",fixed=TRUE)[[1]] > > return(paste(dec2bin.ent(as.numeric(res[1])),dec2bin.frac(as.numeric(pa > ste("0.",res[2],sep="")),prec),sep=".")) > } > > > -- > Pierre Lafaye de Micheaux > > Adresse courrier: > D?partement de Math?matiques et Statistique > Universit? de Montr?al > CP 6128, succ. Centre-ville > Montr?al, Qu?bec H3C 3J7 > CANADA > > Adresse physique: > D?partement de Math?matiques et Statistique > Bureau 4249, Pavillon Andr?-Aisenstadt > 2920, chemin de la Tour > Montr?al, Qu?bec H3T 1J4 > CANADA > > T?l.: (00-1) 514-343-6607 / Fax: (00-1) 514-343-5700 > lafaye at dms.umontreal.ca > http://www.biostatisticien.eu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.