thr3ads.net - R help - [R] unique and precision of long integers [May 2001]

If this information is useful, please help other people find it:
Share via:

Michael Herron

2001-May-14 15:09 UTC

[R] unique and precision of long integers

Hello.

I have a dataset with about 500,000 observations, most of which are
not unique.  The first 10 observations look like

901000000000100000010100101011002
901101101110100000010100101011002
901000000000100000010100000001002
901000000000100000010101001011002
901000000000100000010101010011002
901000000000100000010100110101002
901000000000100000010100101011002
900000000000100000010010101011002
901000000000100000010100101101002
901000000000100000010100101011002

Each digit reflects a separate field, but above all spaces are
removed.

I read in the data with scan(), and then use unique() to get the
unique observations.  But, when I print these elements to a file I
lose precision.  For instance, let x be a vector of the first 10
observations from the dataset:
> write (x,file="output",ncol=1)
more output 

9.01e+32
9.011011e+32
9.01e+32
9.01e+32
9.01e+32
9.01e+32
9.01e+32
9e+32
9.01e+32
9.01e+32

Is there a way to get all the digits back?  
> write (format(x,digits=22),file="output",ncol=1) 
does not do it, and I cannot seem to increase digits >22.

thanks, 

michael
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian Ripley

2001-May-14 15:26 UTC

head link

[R] unique and precision of long integers

On Mon, 14 May 2001, Michael Herron wrote:
>
> Hello.
>
> I have a dataset with about 500,000 observations, most of which are
> not unique.  The first 10 observations look like
>
> 901000000000100000010100101011002
> 901101101110100000010100101011002
> 901000000000100000010100000001002
> 901000000000100000010101001011002
> 901000000000100000010101010011002
> 901000000000100000010100110101002
> 901000000000100000010100101011002
> 900000000000100000010010101011002
> 901000000000100000010100101101002
> 901000000000100000010100101011002
>
> Each digit reflects a separate field, but above all spaces are
> removed.
>
> I read in the data with scan(), and then use unique() to get the
How did you read them with scan?  You seem to have doubles, despite your
title.  Reading them as integers overflows:
> foo <- scan("foo.dat", integer(0))
Read 10 items> foo [1] 2147483647 2147483647 2147483647 2147483647 2147483647 2147483647
 [7] 2147483647 2147483647 2147483647 2147483647

> unique observations.  But, when I print these elements to a file I
> lose precision.  For instance, let x be a vector of the first 10
Nope, you lost it reading them into the doubles.

Why not just do this with character objects?

foo <- scan("foo.dat", "")
unique(foo)



-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2001-May-14 15:41 UTC

head link

[R] unique and precision of long integers

On Mon, 14 May 2001, Michael Herron wrote:
>
> Hello.
>
> I have a dataset with about 500,000 observations, most of which are
> not unique.  The first 10 observations look like
>
> 901000000000100000010100101011002
> 901101101110100000010100101011002
> 901000000000100000010100000001002
> 901000000000100000010101001011002
> 901000000000100000010101010011002
> 901000000000100000010100110101002
> 901000000000100000010100101011002
> 900000000000100000010010101011002
> 901000000000100000010100101101002
> 901000000000100000010100101011002
>
> Each digit reflects a separate field, but above all spaces are
> removed.
>
> I read in the data with scan(), and then use unique() to get the
> unique observations.  But, when I print these elements to a file I
> lose precision.  For instance, let x be a vector of the first 10
> observations from the dataset:
>
> > write (x,file="output",ncol=1)
>
> more output
>
> 9.01e+32
> 9.011011e+32
> 9.01e+32
> 9.01e+32
> 9.01e+32
> 9.01e+32
> 9.01e+32
> 9e+32
> 9.01e+32
> 9.01e+32
>
> Is there a way to get all the digits back?
>
> > write (format(x,digits=22),file="output",ncol=1)
>
> does not do it, and I cannot seem to increase digits >22.
>
You can't store numbers to more than the precision provided by your
compiler/hardware, so there's probably only 16 accurate digits no matter
how many R prints.

In order to unique() them you can read them as strings, which have
essentially unlimited precision.

	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Uwe Ligges

2001-May-14 15:42 UTC

head link

[R] unique and precision of long integers

Michael Herron wrote:> 
> Hello.
> 
> I have a dataset with about 500,000 observations, most of which are
> not unique.  The first 10 observations look like
> 
> 901000000000100000010100101011002
> 901101101110100000010100101011002
> 901000000000100000010100000001002
> 901000000000100000010101001011002
> 901000000000100000010101010011002
> 901000000000100000010100110101002
> 901000000000100000010100101011002
> 900000000000100000010010101011002
> 901000000000100000010100101101002
> 901000000000100000010100101011002
> 
> Each digit reflects a separate field, but above all spaces are
> removed.
> 
> I read in the data with scan(), and then use unique() to get the
> unique observations.  But, when I print these elements to a file I
> lose precision.  For instance, let x be a vector of the first 10
> observations from the dataset:
> 
> > write (x,file="output",ncol=1)
> 
> more output
> 
> 9.01e+32
> 9.011011e+32
> 9.01e+32
> 9.01e+32
> 9.01e+32
> 9.01e+32
> 9.01e+32
> 9e+32
> 9.01e+32
> 9.01e+32
> 
> Is there a way to get all the digits back?
> 
> > write (format(x,digits=22),file="output",ncol=1)
> 
> does not do it, and I cannot seem to increase digits >22.

Converting into characters should do the trick.


Uwe Ligges


BTW: You don't expect to get exact results calculating with such
numbers, do you?
If so, you have to think about some precision problems. Example:
> 90000000000000000000 == 90000000000000000001[1] TRUE
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Christophe Declercq

2001-May-14 15:49 UTC

head link

[R] unique and precision of long integers

Hi, Michael
> De : owner-r-help at stat.math.ethz.ch
> [mailto:owner-r-help at stat.math.ethz.ch]De la part de Michael Herron
> Envoy? : lundi 14 mai 2001 17:09
> ? : r-help at stat.math.ethz.ch
> Objet : [R] unique and precision of long integers
>
>
>
> Hello.
>
> I have a dataset with about 500,000 observations, most of which are
> not unique.  The first 10 observations look like
>
> 901000000000100000010100101011002
> 901101101110100000010100101011002
[...]> I read in the data with scan(), and then use unique() to get the
> unique observations.  But, when I print these elements to a file I
> lose precision.
If you have long strings of digits, you can 'scan' them as character
(actually, they are not exactly numbers) and manipulate them afterwards:
> x<-scan("foo.dat", what="")
> x [1] "901000000000100000010100101011002"
"901101101110100000010100101011002"
 [3] "901000000000100000010100000001002"
"901000000000100000010101001011002"
 [5] "901000000000100000010101010011002"
"901000000000100000010100110101002"
 [7] "901000000000100000010100101011002"
"900000000000100000010010101011002"
 [9] "901000000000100000010100101101002"
"901000000000100000010100101011002"> write(x,"",ncol=1)901000000000100000010100101011002
901101101110100000010100101011002
901000000000100000010100000001002
901000000000100000010101001011002
901000000000100000010101010011002
901000000000100000010100110101002
901000000000100000010100101011002
900000000000100000010010101011002
901000000000100000010100101101002
901000000000100000010100101011002

I hope it helps.

Christophe

--
Christophe DECLERCQ, MD
Observatoire R?gional de la Sant? Nord-Pas-de-Calais
13, rue Faidherbe 59046 LILLE Cedex FRANCE
Phone +33 3 20 15 49 24
Fax   +33 3 20 55 92 30
E-mail c.declercq at orsnpdc.org


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Mark Myatt

2001-May-16 08:38 UTC

head link

[R] unique and precision of long integers

Michael Herron <mherron at latte.harvard.edu> writes:
>I have a dataset with about 500,000 observations, most of which are
>not unique.  The first 10 observations look like
[stuff deleted]

I can think of two methods that might work ...

        1. Scan as strings rather than numbers

        2. Scan in as separate fields and then do sum sort of checksum
        function on all variables and unique() on the checksum.

Hope that helps.

Mark

--
Mark Myatt


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more maybe matching threads

R help - May 2001 - unique and precision of long integers

[R] unique and precision of long integers

[R] unique and precision of long integers

[R] unique and precision of long integers

[R] unique and precision of long integers

[R] unique and precision of long integers

[R] unique and precision of long integers

Possibly Parallel Threads