Hi, I have a text (.dat) file, in which each row contains several long numeric strings. One of the strings is 38 digits long, for example: 03200801200801172008011720092904008901 When I read in the data file, this string shows up as 3.200801e+36. To get rid of the scientific notation, I used "options(scipen=999)." When I did this, the scientific notation went away, but the numeric string was incorrect. It showed as: 3200801200801172223262666846080882062 Why would the number be incorrect? All of the other strings within this row are correct. Thanks, Andrew -- View this message in context: http://www.nabble.com/R-numeric-string-problem-tp24940459p24940459.html Sent from the R help mailing list archive at Nabble.com.
Use colClasses argument in read.table to set the class of column: For a file with two columns, where the first is string and the other is numeric: read.table('your_file.dat', colClasses = c('character', 'numeric')) On Wed, Aug 12, 2009 at 1:43 PM, Andrew C <acarrig1@gmail.com> wrote:> > Hi, > > I have a text (.dat) file, in which each row contains several long numeric > strings. One of the strings is 38 digits long, for example: > > 03200801200801172008011720092904008901 > > When I read in the data file, this string shows up as 3.200801e+36. To get > rid of the scientific notation, I used "options(scipen=999)." When I did > this, the scientific notation went away, but the numeric string was > incorrect. It showed as: > > 3200801200801172223262666846080882062 > > Why would the number be incorrect? All of the other strings within this > row > are correct. > > Thanks, > > Andrew > > > -- > View this message in context: > http://www.nabble.com/R-numeric-string-problem-tp24940459p24940459.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
You need to read up about finite precision arithmetic and floating point representation. In brief, note that 2^64 requires 20 decimal digits, and some bits in double precision must be given up to sign, exponent, etc. leaving 53 bits for precision = 16 decimal digits. This is exactly the number of digits in the numeric representation that "match" your string. All other digits thereafter are essentially random numbers. If you just need to keep the string as a string and not manipulate it as a numeric, then read it in as a character variable, not a numeric. If you need to manipulate it exactly as a numeric, check out Ryacas or some other computer algebra package that is capable of infinite precision arithmetic. Bert Gunter Genentech Nonclinical Biostatisics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Andrew C Sent: Wednesday, August 12, 2009 9:44 AM To: r-help at r-project.org Subject: [R] R numeric string problem Hi, I have a text (.dat) file, in which each row contains several long numeric strings. One of the strings is 38 digits long, for example: 03200801200801172008011720092904008901 When I read in the data file, this string shows up as 3.200801e+36. To get rid of the scientific notation, I used "options(scipen=999)." When I did this, the scientific notation went away, but the numeric string was incorrect. It showed as: 3200801200801172223262666846080882062 Why would the number be incorrect? All of the other strings within this row are correct. Thanks, Andrew -- View this message in context: http://www.nabble.com/R-numeric-string-problem-tp24940459p24940459.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.