thr3ads.net - R devel - [Rd] How to handle INT8 data [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Nicolas Paris

2017-Jan-20 17:47 UTC

[Rd] How to handle INT8 data

Well I definitely cannot use them as numeric because join is the main
reason of those identifiers.

About int64 and bit64 packages, it's not a solution, because I am
releasing a dataset for external users. I cannot ask them to install a
package in order to exploit them.

I have to be very carefull when releasing the data. If a user just use
read.csv functions, they by default cast the identifiers as numeric.

$ more res.csv
"col1";"col2"
"-1311071933951566764";"toto"
"-1311071933951566764";"tata"

> read.table("res.csv",sep=";",header=T)           col1 col2
1 -1.311072e+18 toto
2 -1.311072e+18 tata
>sapply(read.table("res.csv",sep=";",header=T),class)     col1      col2
"numeric"  "factor"
>
read.table("res.csv",sep=";",header=T,colClasses="character")col1 col2
1 -1311071933951566764 toto
2 -1311071933951566764 tata

Am I comdemned to provide a R script with the data in order to exploit the
dataset ?

Le 20 janv. 2017 ? 18h29, Murray Stokely ?crivait :> 2^53 == 2^53+1
> TRUE
> 
> Which makes joining or grouping data sets with 64 bit identifiers
problematic.
> 
> Murray (mobile)
> 
> On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.paris at
aphp.fr> wrote:
> 
>     Le 20 janv. 2017 ? 18h09, Murray Stokely ?crivait :
>     > The lack of 64 bit integer support causes lots of problems when
dealing
>     with
>     > certain types of data where the loss of precision from coercing to
53
>     bits with
>     > double is unacceptable.
> 
>     Hello Murray,
>     Do you mean, by eg. -1311071933951566764 loses in precision during
>     as.numeric(-1311071933951566764) process ?
>     Thanks,
>     >
>     > Two packages were developed to deal with this:  int64 and bit64.
>     >
>     > You may need to find archival versions of these packages if
they've
>     fallen off
>     > cran.
>     >
>     > Murray (mobile phone)
>     >
>     > On Jan 20, 2017 7:20 AM, "Gabriel Becker" <gmbecker
at ucdavis.edu> wrote:
>     >
>     >     I am not on R-core, so cannot speak to future plans to
internally
>     support
>     >     int8 (though my impression is that there aren't any, at
least none
>     that are
>     >     close to fruition).
>     >
>     >     The standard way of dealing with whole numbers too big to fit
in an
>     integer
>     >     is to put them in a numeric (double down in C land). this can
>     represent
>     >     integers up to 2^53 without loss of precision see (
>     >     http://stackoverflow.com/questions/1848700/biggest-
>     >     integer-that-can-be-stored-in-a-double).
>     >     This is how long vector indices are (currently) implemented in
R. If
>     it's
>     >     good enough for indices it's probably good enough for
whatever you
>     need
>     >     them for.
>     >
>     >     Hope that helps.
>     >
>     >     ~G
>     >
>     >
>     >     On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris
<nicolas.paris at aphp.fr
>     >
>     >     wrote:
>     >
>     >     > Hello r users,
>     >     >
>     >     > I have to deal with int8 data with R. AFAIK  R does only
handle
>     int4
>     >     > with `as.integer` function [1]. I wonder:
>     >     > 1. what is the better approach to handle int8 ?
`as.character` ?
>     >     > `as.numeric` ?
>     >     > 2. is there any plan to handle int8 in the future ? As
you might
>     know,
>     >     > int4 is to small to deal with earth population right now.
>     >     >
>     >     > Thanks for you ideas,
>     >     >
>     >     > int8 eg:
>     >     >
>     >     >      human_id
>     >     > ----------------------
>     >     >  -1311071933951566764
>     >     >  -4708675461424073238
>     >     >  -6865005668390999818
>     >     >   5578000650960353108
>     >     >  -3219674686933841021
>     >     >  -6469229889308771589
>     >     >   -606871692563545028
>     >     >  -8199987422425699249
>     >     >   -463287495999648233
>     >     >   7675955260644241951
>     >     >
>     >     > reference:
>     >     > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
>     >     >
>     >     > --
>     >     > Nicolas PARIS
>     >     >
>     >     > ______________________________________________
>     >     > R-devel at r-project.org mailing list
>     >     > https://stat.ethz.ch/mailman/listinfo/r-devel
>     >     >
>     >
>     >
>     >
>     >     --
>     >     Gabriel Becker, PhD
>     >     Associate Scientist (Bioinformatics)
>     >     Genentech Research
>     >
>     >             [[alternative HTML version deleted]]
>     >
>     >     ______________________________________________
>     >     R-devel at r-project.org mailing list
>     >     https://stat.ethz.ch/mailman/listinfo/r-devel
>     >
>     >
> 
>     --
>     Nicolas PARIS
> 
> 
-- 
Nicolas PARIS

Gabriel Becker

2017-Jan-20 17:57 UTC

head link

[Rd] How to handle INT8 data

How many unique idenfiiers do you have?

If they are large (in terms of bytes) but you don't have that many of them
(eg the total possible number you'll ever have is < INT_MAX), you could
store them as factors. You get the speed of integers but the labeling of
full "precision" strings.  Factors are fast for joins.

~G

On Fri, Jan 20, 2017 at 9:47 AM, Nicolas Paris <nicolas.paris at aphp.fr>
wrote:
> Well I definitely cannot use them as numeric because join is the main
> reason of those identifiers.
>
> About int64 and bit64 packages, it's not a solution, because I am
> releasing a dataset for external users. I cannot ask them to install a
> package in order to exploit them.
>
> I have to be very carefull when releasing the data. If a user just use
> read.csv functions, they by default cast the identifiers as numeric.
>
> $ more res.csv
> "col1";"col2"
> "-1311071933951566764";"toto"
> "-1311071933951566764";"tata"
>
>
> > read.table("res.csv",sep=";",header=T)
>            col1 col2
> 1 -1.311072e+18 toto
> 2 -1.311072e+18 tata
>
>
>sapply(read.table("res.csv",sep=";",header=T),class)
>      col1      col2
> "numeric"  "factor"
>
> >
read.table("res.csv",sep=";",header=T,colClasses="character")
> col1 col2
> 1 -1311071933951566764 toto
> 2 -1311071933951566764 tata
>
> Am I comdemned to provide a R script with the data in order to exploit the
> dataset ?
>
> Le 20 janv. 2017 ? 18h29, Murray Stokely ?crivait :
> > 2^53 == 2^53+1
> > TRUE
> >
> > Which makes joining or grouping data sets with 64 bit identifiers
> problematic.
> >
> > Murray (mobile)
> >
> > On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.paris
at aphp.fr> wrote:
> >
> >     Le 20 janv. 2017 ? 18h09, Murray Stokely ?crivait :
> >     > The lack of 64 bit integer support causes lots of problems
when
> dealing
> >     with
> >     > certain types of data where the loss of precision from
coercing to
> 53
> >     bits with
> >     > double is unacceptable.
> >
> >     Hello Murray,
> >     Do you mean, by eg. -1311071933951566764 loses in precision during
> >     as.numeric(-1311071933951566764) process ?
> >     Thanks,
> >     >
> >     > Two packages were developed to deal with this:  int64 and
bit64.
> >     >
> >     > You may need to find archival versions of these packages if
they've
> >     fallen off
> >     > cran.
> >     >
> >     > Murray (mobile phone)
> >     >
> >     > On Jan 20, 2017 7:20 AM, "Gabriel Becker"
<gmbecker at ucdavis.edu>
> wrote:
> >     >
> >     >     I am not on R-core, so cannot speak to future plans to
> internally
> >     support
> >     >     int8 (though my impression is that there aren't any,
at least
> none
> >     that are
> >     >     close to fruition).
> >     >
> >     >     The standard way of dealing with whole numbers too big to
fit
> in an
> >     integer
> >     >     is to put them in a numeric (double down in C land). this
can
> >     represent
> >     >     integers up to 2^53 without loss of precision see (
> >     >     http://stackoverflow.com/questions/1848700/biggest-
> >     >     integer-that-can-be-stored-in-a-double).
> >     >     This is how long vector indices are (currently)
implemented in
> R. If
> >     it's
> >     >     good enough for indices it's probably good enough for
whatever
> you
> >     need
> >     >     them for.
> >     >
> >     >     Hope that helps.
> >     >
> >     >     ~G
> >     >
> >     >
> >     >     On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <
> nicolas.paris at aphp.fr
> >     >
> >     >     wrote:
> >     >
> >     >     > Hello r users,
> >     >     >
> >     >     > I have to deal with int8 data with R. AFAIK  R does
only
> handle
> >     int4
> >     >     > with `as.integer` function [1]. I wonder:
> >     >     > 1. what is the better approach to handle int8 ?
> `as.character` ?
> >     >     > `as.numeric` ?
> >     >     > 2. is there any plan to handle int8 in the future ?
As you
> might
> >     know,
> >     >     > int4 is to small to deal with earth population right
now.
> >     >     >
> >     >     > Thanks for you ideas,
> >     >     >
> >     >     > int8 eg:
> >     >     >
> >     >     >      human_id
> >     >     > ----------------------
> >     >     >  -1311071933951566764
> >     >     >  -4708675461424073238
> >     >     >  -6865005668390999818
> >     >     >   5578000650960353108
> >     >     >  -3219674686933841021
> >     >     >  -6469229889308771589
> >     >     >   -606871692563545028
> >     >     >  -8199987422425699249
> >     >     >   -463287495999648233
> >     >     >   7675955260644241951
> >     >     >
> >     >     > reference:
> >     >     > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
> >     >     >
> >     >     > --
> >     >     > Nicolas PARIS
> >     >     >
> >     >     > ______________________________________________
> >     >     > R-devel at r-project.org mailing list
> >     >     > https://stat.ethz.ch/mailman/listinfo/r-devel
> >     >     >
> >     >
> >     >
> >     >
> >     >     --
> >     >     Gabriel Becker, PhD
> >     >     Associate Scientist (Bioinformatics)
> >     >     Genentech Research
> >     >
> >     >             [[alternative HTML version deleted]]
> >     >
> >     >     ______________________________________________
> >     >     R-devel at r-project.org mailing list
> >     >     https://stat.ethz.ch/mailman/listinfo/r-devel
> >     >
> >     >
> >
> >     --
> >     Nicolas PARIS
> >
> >
>
> --
> Nicolas PARIS
>


-- 
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

	[[alternative HTML version deleted]]

Peter Haverty

2017-Jan-20 17:59 UTC

head link

[Rd] How to handle INT8 data

For what it is worth, I would be extremely pleased to R's integer type go
to 64bit.  A signed 32bit integer is just a bit too small to index into the
~3 billion position human genome.  The "work arounds" that have arisen
for
this specific issue are surprisingly complex.

Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Fri, Jan 20, 2017 at 9:47 AM, Nicolas Paris <nicolas.paris at aphp.fr>
wrote:
> Well I definitely cannot use them as numeric because join is the main
> reason of those identifiers.
>
> About int64 and bit64 packages, it's not a solution, because I am
> releasing a dataset for external users. I cannot ask them to install a
> package in order to exploit them.
>
> I have to be very carefull when releasing the data. If a user just use
> read.csv functions, they by default cast the identifiers as numeric.
>
> $ more res.csv
> "col1";"col2"
> "-1311071933951566764";"toto"
> "-1311071933951566764";"tata"
>
>
> > read.table("res.csv",sep=";",header=T)
>            col1 col2
> 1 -1.311072e+18 toto
> 2 -1.311072e+18 tata
>
>
>sapply(read.table("res.csv",sep=";",header=T),class)
>      col1      col2
> "numeric"  "factor"
>
> >
read.table("res.csv",sep=";",header=T,colClasses="character")
> col1 col2
> 1 -1311071933951566764 toto
> 2 -1311071933951566764 tata
>
> Am I comdemned to provide a R script with the data in order to exploit the
> dataset ?
>
> Le 20 janv. 2017 ? 18h29, Murray Stokely ?crivait :
> > 2^53 == 2^53+1
> > TRUE
> >
> > Which makes joining or grouping data sets with 64 bit identifiers
> problematic.
> >
> > Murray (mobile)
> >
> > On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.paris
at aphp.fr> wrote:
> >
> >     Le 20 janv. 2017 ? 18h09, Murray Stokely ?crivait :
> >     > The lack of 64 bit integer support causes lots of problems
when
> dealing
> >     with
> >     > certain types of data where the loss of precision from
coercing to
> 53
> >     bits with
> >     > double is unacceptable.
> >
> >     Hello Murray,
> >     Do you mean, by eg. -1311071933951566764 loses in precision during
> >     as.numeric(-1311071933951566764) process ?
> >     Thanks,
> >     >
> >     > Two packages were developed to deal with this:  int64 and
bit64.
> >     >
> >     > You may need to find archival versions of these packages if
they've
> >     fallen off
> >     > cran.
> >     >
> >     > Murray (mobile phone)
> >     >
> >     > On Jan 20, 2017 7:20 AM, "Gabriel Becker"
<gmbecker at ucdavis.edu>
> wrote:
> >     >
> >     >     I am not on R-core, so cannot speak to future plans to
> internally
> >     support
> >     >     int8 (though my impression is that there aren't any,
at least
> none
> >     that are
> >     >     close to fruition).
> >     >
> >     >     The standard way of dealing with whole numbers too big to
fit
> in an
> >     integer
> >     >     is to put them in a numeric (double down in C land). this
can
> >     represent
> >     >     integers up to 2^53 without loss of precision see (
> >     >     http://stackoverflow.com/questions/1848700/biggest-
> >     >     integer-that-can-be-stored-in-a-double).
> >     >     This is how long vector indices are (currently)
implemented in
> R. If
> >     it's
> >     >     good enough for indices it's probably good enough for
whatever
> you
> >     need
> >     >     them for.
> >     >
> >     >     Hope that helps.
> >     >
> >     >     ~G
> >     >
> >     >
> >     >     On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <
> nicolas.paris at aphp.fr
> >     >
> >     >     wrote:
> >     >
> >     >     > Hello r users,
> >     >     >
> >     >     > I have to deal with int8 data with R. AFAIK  R does
only
> handle
> >     int4
> >     >     > with `as.integer` function [1]. I wonder:
> >     >     > 1. what is the better approach to handle int8 ?
> `as.character` ?
> >     >     > `as.numeric` ?
> >     >     > 2. is there any plan to handle int8 in the future ?
As you
> might
> >     know,
> >     >     > int4 is to small to deal with earth population right
now.
> >     >     >
> >     >     > Thanks for you ideas,
> >     >     >
> >     >     > int8 eg:
> >     >     >
> >     >     >      human_id
> >     >     > ----------------------
> >     >     >  -1311071933951566764
> >     >     >  -4708675461424073238
> >     >     >  -6865005668390999818
> >     >     >   5578000650960353108
> >     >     >  -3219674686933841021
> >     >     >  -6469229889308771589
> >     >     >   -606871692563545028
> >     >     >  -8199987422425699249
> >     >     >   -463287495999648233
> >     >     >   7675955260644241951
> >     >     >
> >     >     > reference:
> >     >     > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
> >     >     >
> >     >     > --
> >     >     > Nicolas PARIS
> >     >     >
> >     >     > ______________________________________________
> >     >     > R-devel at r-project.org mailing list
> >     >     > https://stat.ethz.ch/mailman/listinfo/r-devel
> >     >     >
> >     >
> >     >
> >     >
> >     >     --
> >     >     Gabriel Becker, PhD
> >     >     Associate Scientist (Bioinformatics)
> >     >     Genentech Research
> >     >
> >     >             [[alternative HTML version deleted]]
> >     >
> >     >     ______________________________________________
> >     >     R-devel at r-project.org mailing list
> >     >     https://stat.ethz.ch/mailman/listinfo/r-devel
> >     >
> >     >
> >
> >     --
> >     Nicolas PARIS
> >
> >
>
> --
> Nicolas PARIS
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Nicolas Paris

2017-Jan-20 18:05 UTC

head link

[Rd] How to handle INT8 data

Hi, 

I do have < INT_MAX.
This looks attractive but since they are unique identifiers, storing
them as factor will be likely to be counter-productive. (a string
version + an int32 for each)

I was looking to https://cran.r-project.org/web/packages/csvread/index.html
This looks like a good feet for my needs. 
Any chances such an external package for int64 would be integrated in core ?


Le 20 janv. 2017 ? 18h57, Gabriel Becker ?crivait :> How many unique idenfiiers do you have?
> 
> If they are large (in terms of bytes) but you don't have that many of
them (eg
> the total possible number you'll ever have is < INT_MAX), you could
store them
> as factors. You get the speed of integers but the labeling of full
"precision"
> strings.  Factors are fast for joins.
> 
> ~G
> 
> On Fri, Jan 20, 2017 at 9:47 AM, Nicolas Paris <nicolas.paris at
aphp.fr> wrote:
> 
>     Well I definitely cannot use them as numeric because join is the main
>     reason of those identifiers.
> 
>     About int64 and bit64 packages, it's not a solution, because I am
>     releasing a dataset for external users. I cannot ask them to install a
>     package in order to exploit them.
> 
>     I have to be very carefull when releasing the data. If a user just use
>     read.csv functions, they by default cast the identifiers as numeric.
> 
>     $ more res.csv
>     "col1";"col2"
>     "-1311071933951566764";"toto"
>     "-1311071933951566764";"tata"
> 
> 
>     > read.table("res.csv",sep=";",header=T)
>                col1 col2
>     1 -1.311072e+18 toto
>     2 -1.311072e+18 tata
> 
>    
>sapply(read.table("res.csv",sep=";",header=T),class)
>          col1      col2
>     "numeric"  "factor"
> 
>     >
read.table("res.csv",sep=";",header=T,colClasses="character")
>     col1 col2
>     1 -1311071933951566764 toto
>     2 -1311071933951566764 tata
> 
>     Am I comdemned to provide a R script with the data in order to exploit
the
>     dataset ?
> 
>     Le 20 janv. 2017 ? 18h29, Murray Stokely ?crivait :
>     > 2^53 == 2^53+1
>     > TRUE
>     >
>     > Which makes joining or grouping data sets with 64 bit identifiers
>     problematic.
>     >
>     > Murray (mobile)
>     >
>     > On Jan 20, 2017 9:15 AM, "Nicolas Paris"
<nicolas.paris at aphp.fr> wrote:
>     >
>     >     Le 20 janv. 2017 ? 18h09, Murray Stokely ?crivait :
>     >     > The lack of 64 bit integer support causes lots of
problems when
>     dealing
>     >     with
>     >     > certain types of data where the loss of precision from
coercing to
>     53
>     >     bits with
>     >     > double is unacceptable.
>     >
>     >     Hello Murray,
>     >     Do you mean, by eg. -1311071933951566764 loses in precision
during
>     >     as.numeric(-1311071933951566764) process ?
>     >     Thanks,
>     >     >
>     >     > Two packages were developed to deal with this:  int64 and
bit64.
>     >     >
>     >     > You may need to find archival versions of these packages
if they've
>     >     fallen off
>     >     > cran.
>     >     >
>     >     > Murray (mobile phone)
>     >     >
>     >     > On Jan 20, 2017 7:20 AM, "Gabriel Becker"
<gmbecker at ucdavis.edu>
>     wrote:
>     >     >
>     >     >     I am not on R-core, so cannot speak to future plans
to
>     internally
>     >     support
>     >     >     int8 (though my impression is that there aren't
any, at least
>     none
>     >     that are
>     >     >     close to fruition).
>     >     >
>     >     >     The standard way of dealing with whole numbers too
big to fit
>     in an
>     >     integer
>     >     >     is to put them in a numeric (double down in C land).
this can
>     >     represent
>     >     >     integers up to 2^53 without loss of precision see (
>     >     >     http://stackoverflow.com/questions/1848700/biggest-
>     >     >     integer-that-can-be-stored-in-a-double).
>     >     >     This is how long vector indices are (currently)
implemented in
>     R. If
>     >     it's
>     >     >     good enough for indices it's probably good enough
for whatever
>     you
>     >     need
>     >     >     them for.
>     >     >
>     >     >     Hope that helps.
>     >     >
>     >     >     ~G
>     >     >
>     >     >
>     >     >     On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <
>     nicolas.paris at aphp.fr
>     >     >
>     >     >     wrote:
>     >     >
>     >     >     > Hello r users,
>     >     >     >
>     >     >     > I have to deal with int8 data with R. AFAIK  R
does only
>     handle
>     >     int4
>     >     >     > with `as.integer` function [1]. I wonder:
>     >     >     > 1. what is the better approach to handle int8 ?
`as.character
>     ` ?
>     >     >     > `as.numeric` ?
>     >     >     > 2. is there any plan to handle int8 in the
future ? As you
>     might
>     >     know,
>     >     >     > int4 is to small to deal with earth population
right now.
>     >     >     >
>     >     >     > Thanks for you ideas,
>     >     >     >
>     >     >     > int8 eg:
>     >     >     >
>     >     >     >      human_id
>     >     >     > ----------------------
>     >     >     >  -1311071933951566764
>     >     >     >  -4708675461424073238
>     >     >     >  -6865005668390999818
>     >     >     >   5578000650960353108
>     >     >     >  -3219674686933841021
>     >     >     >  -6469229889308771589
>     >     >     >   -606871692563545028
>     >     >     >  -8199987422425699249
>     >     >     >   -463287495999648233
>     >     >     >   7675955260644241951
>     >     >     >
>     >     >     > reference:
>     >     >     > 1.
https://www.r-bloggers.com/r-in-a-64-bit-world/
>     >     >     >
>     >     >     > --
>     >     >     > Nicolas PARIS
>     >     >     >
>     >     >     > ______________________________________________
>     >     >     > R-devel at r-project.org mailing list
>     >     >     > https://stat.ethz.ch/mailman/listinfo/r-devel
>     >     >     >
>     >     >
>     >     >
>     >     >
>     >     >     --
>     >     >     Gabriel Becker, PhD
>     >     >     Associate Scientist (Bioinformatics)
>     >     >     Genentech Research
>     >     >
>     >     >             [[alternative HTML version deleted]]
>     >     >
>     >     >     ______________________________________________
>     >     >     R-devel at r-project.org mailing list
>     >     >     https://stat.ethz.ch/mailman/listinfo/r-devel
>     >     >
>     >     >
>     >
>     >     --
>     >     Nicolas PARIS
>     >
>     >
> 
>     --
>     Nicolas PARIS
> 
> 
> 
> 
> --
> Gabriel Becker, PhD
> Associate Scientist (Bioinformatics)
> Genentech Research
-- 
Nicolas PARIS
Responsable R & D
WIND - PACTE, H?pital Rothschild ( RTH )
Courriel : nicolas.paris at aphp.fr
Tel : 01 48 04 21 07

Willem Ligtenberg

2017-Jan-20 19:28 UTC

head link

[Rd] How to handle INT8 data

You might want to use a data.table then.
It will automatically detect that it is a 64 bit int.
Although also in that case the user will have to install the data.table
package.
(which is a good idea anyway in my opinion :) )

It will then obviously allow you to join tables.

Willem

On 20-01-17 18:47, Nicolas Paris wrote:> Well I definitely cannot use them as numeric because join is the main
> reason of those identifiers.
>
> About int64 and bit64 packages, it's not a solution, because I am
> releasing a dataset for external users. I cannot ask them to install a
> package in order to exploit them.
>
> I have to be very carefull when releasing the data. If a user just use
> read.csv functions, they by default cast the identifiers as numeric.
>
> $ more res.csv
> "col1";"col2"
> "-1311071933951566764";"toto"
> "-1311071933951566764";"tata"
>
>
>> read.table("res.csv",sep=";",header=T)
>            col1 col2
> 1 -1.311072e+18 toto
> 2 -1.311072e+18 tata
>
>>
sapply(read.table("res.csv",sep=";",header=T),class)
>      col1      col2
> "numeric"  "factor"
>
>>
read.table("res.csv",sep=";",header=T,colClasses="character")
> col1 col2
> 1 -1311071933951566764 toto
> 2 -1311071933951566764 tata
>
> Am I comdemned to provide a R script with the data in order to exploit the
dataset ?
>
> Le 20 janv. 2017 ? 18h29, Murray Stokely ?crivait :
>> 2^53 == 2^53+1
>> TRUE
>>
>> Which makes joining or grouping data sets with 64 bit identifiers
problematic.
>>
>> Murray (mobile)
>>
>> On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.paris at
aphp.fr> wrote:
>>
>>     Le 20 janv. 2017 ? 18h09, Murray Stokely ?crivait :
>>     > The lack of 64 bit integer support causes lots of problems
when dealing
>>     with
>>     > certain types of data where the loss of precision from
coercing to 53
>>     bits with
>>     > double is unacceptable.
>>
>>     Hello Murray,
>>     Do you mean, by eg. -1311071933951566764 loses in precision during
>>     as.numeric(-1311071933951566764) process ?
>>     Thanks,
>>     >
>>     > Two packages were developed to deal with this:  int64 and
bit64.
>>     >
>>     > You may need to find archival versions of these packages if
they've
>>     fallen off
>>     > cran.
>>     >
>>     > Murray (mobile phone)
>>     >
>>     > On Jan 20, 2017 7:20 AM, "Gabriel Becker"
<gmbecker at ucdavis.edu> wrote:
>>     >
>>     >     I am not on R-core, so cannot speak to future plans to
internally
>>     support
>>     >     int8 (though my impression is that there aren't any,
at least none
>>     that are
>>     >     close to fruition).
>>     >
>>     >     The standard way of dealing with whole numbers too big to
fit in an
>>     integer
>>     >     is to put them in a numeric (double down in C land). this
can
>>     represent
>>     >     integers up to 2^53 without loss of precision see (
>>     >     http://stackoverflow.com/questions/1848700/biggest-
>>     >     integer-that-can-be-stored-in-a-double).
>>     >     This is how long vector indices are (currently)
implemented in R. If
>>     it's
>>     >     good enough for indices it's probably good enough for
whatever you
>>     need
>>     >     them for.
>>     >
>>     >     Hope that helps.
>>     >
>>     >     ~G
>>     >
>>     >
>>     >     On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris
<nicolas.paris at aphp.fr
>>     >
>>     >     wrote:
>>     >
>>     >     > Hello r users,
>>     >     >
>>     >     > I have to deal with int8 data with R. AFAIK  R does
only handle
>>     int4
>>     >     > with `as.integer` function [1]. I wonder:
>>     >     > 1. what is the better approach to handle int8 ?
`as.character` ?
>>     >     > `as.numeric` ?
>>     >     > 2. is there any plan to handle int8 in the future ?
As you might
>>     know,
>>     >     > int4 is to small to deal with earth population right
now.
>>     >     >
>>     >     > Thanks for you ideas,
>>     >     >
>>     >     > int8 eg:
>>     >     >
>>     >     >      human_id
>>     >     > ----------------------
>>     >     >  -1311071933951566764
>>     >     >  -4708675461424073238
>>     >     >  -6865005668390999818
>>     >     >   5578000650960353108
>>     >     >  -3219674686933841021
>>     >     >  -6469229889308771589
>>     >     >   -606871692563545028
>>     >     >  -8199987422425699249
>>     >     >   -463287495999648233
>>     >     >   7675955260644241951
>>     >     >
>>     >     > reference:
>>     >     > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
>>     >     >
>>     >     > --
>>     >     > Nicolas PARIS
>>     >     >
>>     >     > ______________________________________________
>>     >     > R-devel at r-project.org mailing list
>>     >     > https://stat.ethz.ch/mailman/listinfo/r-devel
>>     >     >
>>     >
>>     >
>>     >
>>     >     --
>>     >     Gabriel Becker, PhD
>>     >     Associate Scientist (Bioinformatics)
>>     >     Genentech Research
>>     >
>>     >             [[alternative HTML version deleted]]
>>     >
>>     >     ______________________________________________
>>     >     R-devel at r-project.org mailing list
>>     >     https://stat.ethz.ch/mailman/listinfo/r-devel
>>     >
>>     >
>>
>>     --
>>     Nicolas PARIS
>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: OpenPGP digital signature
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20170120/41f83b05/attachment.bin>

Maybe Matching Threads

Search for more apparently analagous threads

R devel - Jan 2017 - How to handle INT8 data

[Rd] How to handle INT8 data

[Rd] How to handle INT8 data

[Rd] How to handle INT8 data

[Rd] How to handle INT8 data

[Rd] How to handle INT8 data

Maybe Matching Threads