Hello. I have a dataset with about 500,000 observations, most of which are not unique. The first 10 observations look like 901000000000100000010100101011002 901101101110100000010100101011002 901000000000100000010100000001002 901000000000100000010101001011002 901000000000100000010101010011002 901000000000100000010100110101002 901000000000100000010100101011002 900000000000100000010010101011002 901000000000100000010100101101002 901000000000100000010100101011002 Each digit reflects a separate field, but above all spaces are removed. I read in the data with scan(), and then use unique() to get the unique observations. But, when I print these elements to a file I lose precision. For instance, let x be a vector of the first 10 observations from the dataset:> write (x,file="output",ncol=1)more output 9.01e+32 9.011011e+32 9.01e+32 9.01e+32 9.01e+32 9.01e+32 9.01e+32 9e+32 9.01e+32 9.01e+32 Is there a way to get all the digits back?> write (format(x,digits=22),file="output",ncol=1)does not do it, and I cannot seem to increase digits >22. thanks, michael -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 14 May 2001, Michael Herron wrote:> > Hello. > > I have a dataset with about 500,000 observations, most of which are > not unique. The first 10 observations look like > > 901000000000100000010100101011002 > 901101101110100000010100101011002 > 901000000000100000010100000001002 > 901000000000100000010101001011002 > 901000000000100000010101010011002 > 901000000000100000010100110101002 > 901000000000100000010100101011002 > 900000000000100000010010101011002 > 901000000000100000010100101101002 > 901000000000100000010100101011002 > > Each digit reflects a separate field, but above all spaces are > removed. > > I read in the data with scan(), and then use unique() to get theHow did you read them with scan? You seem to have doubles, despite your title. Reading them as integers overflows:> foo <- scan("foo.dat", integer(0))Read 10 items> foo[1] 2147483647 2147483647 2147483647 2147483647 2147483647 2147483647 [7] 2147483647 2147483647 2147483647 2147483647> unique observations. But, when I print these elements to a file I > lose precision. For instance, let x be a vector of the first 10Nope, you lost it reading them into the doubles. Why not just do this with character objects? foo <- scan("foo.dat", "") unique(foo) -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 14 May 2001, Michael Herron wrote:> > Hello. > > I have a dataset with about 500,000 observations, most of which are > not unique. The first 10 observations look like > > 901000000000100000010100101011002 > 901101101110100000010100101011002 > 901000000000100000010100000001002 > 901000000000100000010101001011002 > 901000000000100000010101010011002 > 901000000000100000010100110101002 > 901000000000100000010100101011002 > 900000000000100000010010101011002 > 901000000000100000010100101101002 > 901000000000100000010100101011002 > > Each digit reflects a separate field, but above all spaces are > removed. > > I read in the data with scan(), and then use unique() to get the > unique observations. But, when I print these elements to a file I > lose precision. For instance, let x be a vector of the first 10 > observations from the dataset: > > > write (x,file="output",ncol=1) > > more output > > 9.01e+32 > 9.011011e+32 > 9.01e+32 > 9.01e+32 > 9.01e+32 > 9.01e+32 > 9.01e+32 > 9e+32 > 9.01e+32 > 9.01e+32 > > Is there a way to get all the digits back? > > > write (format(x,digits=22),file="output",ncol=1) > > does not do it, and I cannot seem to increase digits >22. >You can't store numbers to more than the precision provided by your compiler/hardware, so there's probably only 16 accurate digits no matter how many R prints. In order to unique() them you can read them as strings, which have essentially unlimited precision. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Michael Herron wrote:> > Hello. > > I have a dataset with about 500,000 observations, most of which are > not unique. The first 10 observations look like > > 901000000000100000010100101011002 > 901101101110100000010100101011002 > 901000000000100000010100000001002 > 901000000000100000010101001011002 > 901000000000100000010101010011002 > 901000000000100000010100110101002 > 901000000000100000010100101011002 > 900000000000100000010010101011002 > 901000000000100000010100101101002 > 901000000000100000010100101011002 > > Each digit reflects a separate field, but above all spaces are > removed. > > I read in the data with scan(), and then use unique() to get the > unique observations. But, when I print these elements to a file I > lose precision. For instance, let x be a vector of the first 10 > observations from the dataset: > > > write (x,file="output",ncol=1) > > more output > > 9.01e+32 > 9.011011e+32 > 9.01e+32 > 9.01e+32 > 9.01e+32 > 9.01e+32 > 9.01e+32 > 9e+32 > 9.01e+32 > 9.01e+32 > > Is there a way to get all the digits back? > > > write (format(x,digits=22),file="output",ncol=1) > > does not do it, and I cannot seem to increase digits >22.Converting into characters should do the trick. Uwe Ligges BTW: You don't expect to get exact results calculating with such numbers, do you? If so, you have to think about some precision problems. Example:> 90000000000000000000 == 90000000000000000001[1] TRUE -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, Michael> De : owner-r-help at stat.math.ethz.ch > [mailto:owner-r-help at stat.math.ethz.ch]De la part de Michael Herron > Envoy? : lundi 14 mai 2001 17:09 > ? : r-help at stat.math.ethz.ch > Objet : [R] unique and precision of long integers > > > > Hello. > > I have a dataset with about 500,000 observations, most of which are > not unique. The first 10 observations look like > > 901000000000100000010100101011002 > 901101101110100000010100101011002[...]> I read in the data with scan(), and then use unique() to get the > unique observations. But, when I print these elements to a file I > lose precision.If you have long strings of digits, you can 'scan' them as character (actually, they are not exactly numbers) and manipulate them afterwards:> x<-scan("foo.dat", what="") > x[1] "901000000000100000010100101011002" "901101101110100000010100101011002" [3] "901000000000100000010100000001002" "901000000000100000010101001011002" [5] "901000000000100000010101010011002" "901000000000100000010100110101002" [7] "901000000000100000010100101011002" "900000000000100000010010101011002" [9] "901000000000100000010100101101002" "901000000000100000010100101011002"> write(x,"",ncol=1)901000000000100000010100101011002 901101101110100000010100101011002 901000000000100000010100000001002 901000000000100000010101001011002 901000000000100000010101010011002 901000000000100000010100110101002 901000000000100000010100101011002 900000000000100000010010101011002 901000000000100000010100101101002 901000000000100000010100101011002 I hope it helps. Christophe -- Christophe DECLERCQ, MD Observatoire R?gional de la Sant? Nord-Pas-de-Calais 13, rue Faidherbe 59046 LILLE Cedex FRANCE Phone +33 3 20 15 49 24 Fax +33 3 20 55 92 30 E-mail c.declercq at orsnpdc.org -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Michael Herron <mherron at latte.harvard.edu> writes:>I have a dataset with about 500,000 observations, most of which are >not unique. The first 10 observations look like[stuff deleted] I can think of two methods that might work ... 1. Scan as strings rather than numbers 2. Scan in as separate fields and then do sum sort of checksum function on all variables and unique() on the checksum. Hope that helps. Mark -- Mark Myatt -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._