Dear all, I think that every number x in R can be represented in floating point arithmetic as: x = (-1)^s (1+f) 2^(e-1023) where s is coded on 1 bit, e (positive integer) is coded on 11 bits, and f (real in [0,1)) is coded on 52 bits. Am I right? We have f=\sum_{i=1}^{52} k_i 2^{-i} for some values k_i in {0,1}. If this is the case (for the 52 bits), we should have: The number next to 2^150 should be (-1)^02^150(1+2^(-52))=2^150+2^98 I can check this: > a <- 2^150; b <- a + 2^97; b == a [1] TRUE > a <- 2^150; b <- a + 2^98; b == a [1] FALSE So it seems that the mantissa is really coded on 52 bits. But now, if I issue the following commands (using some functions provided below to translate from decimal to binary): > dec2bin(0.1,52) [1] "0.0001100110011001100110011001100110011001100110011001" > formatC(sum(as.numeric(strsplit(dec2bin(0.1,52),"")[[1]][-(1:2)])*2^(-(1:52))),50) [1] "0.099999999999999866773237044981215149164199829101562" > formatC(0.1,50) [1] "0.1000000000000000055511151231257827021181583404541" > formatC(sum(as.numeric(strsplit(dec2bin(0.1,55),"")[[1]][-(1:2)])*2^(-(1:55))),50) [1] "0.1000000000000000055511151231257827021181583404541" > formatC(0.1,50) [1] "0.1000000000000000055511151231257827021181583404541" So now, using formatC() it seems that f is coded on 55 bits! Do you have an explanation for this fact? Many thanks! Pierre dec2bin.ent <- function(x) { as.integer(paste(rev(as.integer(intToBits(x))), collapse="")) } dec2bin.frac <- function(x,prec=52) { res <- rep(NA,prec) for (i in 1:prec) { res[i] <- as.integer(x*2) x <- (x*2) %% 1 } return(paste(res,collapse="")) } dec2bin <- function(x,prec=52) { x <- as.character(x) res <- strsplit(x,".",fixed=TRUE)[[1]] return(paste(dec2bin.ent(as.numeric(res[1])),dec2bin.frac(as.numeric(paste("0.",res[2],sep="")),prec),sep=".")) } -- Pierre Lafaye de Micheaux Adresse courrier: D?partement de Math?matiques et Statistique Universit? de Montr?al CP 6128, succ. Centre-ville Montr?al, Qu?bec H3C 3J7 CANADA Adresse physique: D?partement de Math?matiques et Statistique Bureau 4249, Pavillon Andr?-Aisenstadt 2920, chemin de la Tour Montr?al, Qu?bec H3T 1J4 CANADA T?l.: (00-1) 514-343-6607 / Fax: (00-1) 514-343-5700 lafaye at dms.umontreal.ca http://www.biostatisticien.eu
Nordlund, Dan (DSHS/RDA)
2011-Nov-08 20:29 UTC
[R] Question about R mantissa and number of bits
I am not going through all of your code to understand what you are trying to demonstrate. R uses the IEEE Standard 754 for Floating Point Numbers. There is a sign bit, 11 bits for the exponent, and 52 bits for the mantissa. Because the standard normalizes the mantissa you get an extra bit of precision; i.e. you 53 bits of precision stored in 52 bits. You might want to read the following http://www.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Lafaye de Micheaux > Sent: Tuesday, November 08, 2011 10:49 AM > To: r-help at r-project.org > Subject: [R] Question about R mantissa and number of bits > > Dear all, > > I think that every number x in R can be represented in floating point > arithmetic as: > x = (-1)^s (1+f) 2^(e-1023) > where s is coded on 1 bit, e (positive integer) is coded on 11 bits, > and > f (real in [0,1)) is coded on 52 bits. > Am I right? > > We have f=\sum_{i=1}^{52} k_i 2^{-i} for some values k_i in {0,1}. > > If this is the case (for the 52 bits), we should have: > > The number next to 2^150 should be (-1)^02^150(1+2^(-52))=2^150+2^98 > I can check this: > > a <- 2^150; b <- a + 2^97; b == a > [1] TRUE > > a <- 2^150; b <- a + 2^98; b == a > [1] FALSE > > So it seems that the mantissa is really coded on 52 bits. > > But now, if I issue the following commands (using some functions > provided below to translate from decimal to binary): > > dec2bin(0.1,52) > [1] "0.0001100110011001100110011001100110011001100110011001" > > > formatC(sum(as.numeric(strsplit(dec2bin(0.1,52),"")[[1]][-(1:2)])*2^(- > (1:52))),50) > [1] "0.099999999999999866773237044981215149164199829101562" > > formatC(0.1,50) > [1] "0.1000000000000000055511151231257827021181583404541" > > > formatC(sum(as.numeric(strsplit(dec2bin(0.1,55),"")[[1]][-(1:2)])*2^(- > (1:55))),50) > [1] "0.1000000000000000055511151231257827021181583404541" > > formatC(0.1,50) > [1] "0.1000000000000000055511151231257827021181583404541" > > So now, using formatC() it seems that f is coded on 55 bits! > > Do you have an explanation for this fact? > > Many thanks! > > Pierre > > > dec2bin.ent <- function(x) { > as.integer(paste(rev(as.integer(intToBits(x))), collapse="")) > } > > dec2bin.frac <- function(x,prec=52) { > res <- rep(NA,prec) > for (i in 1:prec) { > res[i] <- as.integer(x*2) > x <- (x*2) %% 1 > } > return(paste(res,collapse="")) > } > > dec2bin <- function(x,prec=52) { > x <- as.character(x) > res <- strsplit(x,".",fixed=TRUE)[[1]] > > return(paste(dec2bin.ent(as.numeric(res[1])),dec2bin.frac(as.numeric(pa > ste("0.",res[2],sep="")),prec),sep=".")) > } > > > -- > Pierre Lafaye de Micheaux > > Adresse courrier: > D?partement de Math?matiques et Statistique > Universit? de Montr?al > CP 6128, succ. Centre-ville > Montr?al, Qu?bec H3C 3J7 > CANADA > > Adresse physique: > D?partement de Math?matiques et Statistique > Bureau 4249, Pavillon Andr?-Aisenstadt > 2920, chemin de la Tour > Montr?al, Qu?bec H3T 1J4 > CANADA > > T?l.: (00-1) 514-343-6607 / Fax: (00-1) 514-343-5700 > lafaye at dms.umontreal.ca > http://www.biostatisticien.eu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.