thr3ads.net - R devel - [Rd] Converting non-32-bit integers from python to R to use bit64: reticulate [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Juan Telleria Ruiz de Aguirre

2019-Jun-03 04:50 UTC

[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

Thank you Martin for giving to know and developing 'Rmpfr' library for
unlimited size integers (GNU C GMP) and arbitrary precision floats (GNU C
MPFR):

https://cran.r-project.org/package=Rmpfr

My question is: In the long term (For R3.7.0 or R3.8.0):

Does it have sense that CMP substitutes INTSXP, and MPFR substitutes
REALSXP code? With this we would achieve that an integer is always an
integer, and a numeric double precision float always a numeric double
precision float, without sometimes casting underneath.

And would the R Community / R Ordinary Members would be willing to help R
Core on such implementation (If has sense, and wants to be adopted)?

Thank you all! :)

>
> If you are interested in using unlimited size integers, you
> could use the CRAN R package 'gmp' which builds on the GMP = GNU
> MP = GNU Multi Precision C library.
>
>    https://cran.r-project.org/package=gmp
>
> (and for arbitrary precision "floats", see CRAN pkg
'Rmpfr'
>  built on package gmp, and both the GNU C libraries  GMP and
>  MPFR:
>            https://cran.r-project.org/package=Rmpfr
> )
>
>
	[[alternative HTML version deleted]]

Martin Maechler

2019-Jun-04 16:08 UTC

head link

[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

>>>>> Juan Telleria Ruiz de Aguirre 
>>>>>     on Mon, 3 Jun 2019 06:50:17 +0200 writes:
    > Thank you Martin for giving to know and developing 'Rmpfr'
library for
    > unlimited size integers (GNU C GMP) and arbitrary precision floats (GNU
C
    > MPFR):

    > https://cran.r-project.org/package=Rmpfr

    > My question is: In the long term (For R3.7.0 or R3.8.0):

    > Does it have sense that CMP substitutes INTSXP, and MPFR substitutes
    > REALSXP code? With this we would achieve that an integer is always an
    > integer, and a numeric double precision float always a numeric double
    > precision float, without sometimes casting underneath.

    > And would the R Community / R Ordinary Members would be willing to help
R
    > Core on such implementation (If has sense, and wants to be adopted)?

No, such a change has "no sense" and hence won't be adopted (in
this form):

- INTSXP and REALSXP are part of the C API of R, and are well defined.
  Changing them will almost surely break 100s and by
  dependencies, probably 1000s of existing R packages.

- I'm sure Python and other system do have fixed size "double
  precision" vectors, because that's how you interface with all
  pre-existing computational libraries,
  and I am almost sure that support of arbitrary long integer
  (or double) is via another class/type.

- I know that Julia has adopted these (GMP and MPFR I think)
  types and nicely interfaces them on a relatively "base" level.
  With their nice class hierarchy (and very nice "S4 like"
multi-argument
  method dispatch for *all* functions) it can look quite
  seemless for the user to work with these extended classes, but
  they are not all identical to the basic "real"/"double" or
"integer" classes.
  
- I'm not the expert here (but there are not so many experts
  ..), but I'm pretty sure that adding new "basic types" in the
  underlying C level seems not at all easy for R.  It would mean a big
  break in all back compatibility -- which is conceivable --
  and *may* also need a big rewrite of much of the R code base
  which seems less conceivable in the mid term (2-3 years; long
  term: > 5 years).


    > Thank you all! :)

You are welcome.

I think we should close this thread here,  unless some real
experts join.

Martin

Kevin Ushey

2019-Jun-04 17:14 UTC

head link

[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

I think a more productive conversation could be: what additions to R
would allow for user-defined types / classes that behave just like the
built-in vector types? As a motivating example, one cannot currently
use the 64bit integer objects from bit64 to subset data frames:

   > library(bit64); mtcars[as.integer64(1:3), ]
    [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
   <0 rows> (or 0-length row.names)

I think ALTREP presents a possibility here, in that we could have a
64bit integer ALTREP object that behaves either like an INTSXP or
REALSXP as necessary. But I'm not sure how we would handle large 64bit
integer values which won't fit in either an INTSXP or REALSXP (in the
REALSXP case, precision could be lost for values > 2^53).

One possibility would be to allow ALTREP objects to have a chance at
managing dispatch in some methods, so that (for example) in e.g.
data[<ALTREP>], the ALTREP object has the opportunity to choose how
the data object should be subsetted. Of course, this implies wiring
through yet another dispatch mechanism through a category of primitive
/ internal functions, which could be expensive in terms of
implementation / maintenance... and I'm not sure if this could play
well with the existing S3 / S4 dispatch mechanisms.

FWIW, I think most commonly 64bit integers arise as e.g. database keys
/ IDs, and are typically just used for subsetting / reordering of data
as opposed to math. In these cases, converting the 64bit integers to a
character vector is typically a viable workaround, although it's much
slower.

Still, at least to me, it seems like there is likely a path forward
with ALTREP for 64bit integer vectors that can behave (more or less)
just like builtin R vectors.

Best,
Kevin

On Tue, Jun 4, 2019 at 9:34 AM Martin Maechler
<maechler at stat.math.ethz.ch> wrote:>
> >>>>> Juan Telleria Ruiz de Aguirre
> >>>>>     on Mon, 3 Jun 2019 06:50:17 +0200 writes:
>
>     > Thank you Martin for giving to know and developing 'Rmpfr'
library for
>     > unlimited size integers (GNU C GMP) and arbitrary precision floats
(GNU C
>     > MPFR):
>
>     > https://cran.r-project.org/package=Rmpfr
>
>     > My question is: In the long term (For R3.7.0 or R3.8.0):
>
>     > Does it have sense that CMP substitutes INTSXP, and MPFR
substitutes
>     > REALSXP code? With this we would achieve that an integer is always
an
>     > integer, and a numeric double precision float always a numeric
double
>     > precision float, without sometimes casting underneath.
>
>     > And would the R Community / R Ordinary Members would be willing to
help R
>     > Core on such implementation (If has sense, and wants to be
adopted)?
>
> No, such a change has "no sense" and hence won't be adopted
(in
> this form):
>
> - INTSXP and REALSXP are part of the C API of R, and are well defined.
>   Changing them will almost surely break 100s and by
>   dependencies, probably 1000s of existing R packages.
>
> - I'm sure Python and other system do have fixed size "double
>   precision" vectors, because that's how you interface with all
>   pre-existing computational libraries,
>   and I am almost sure that support of arbitrary long integer
>   (or double) is via another class/type.
>
> - I know that Julia has adopted these (GMP and MPFR I think)
>   types and nicely interfaces them on a relatively "base" level.
>   With their nice class hierarchy (and very nice "S4 like"
multi-argument
>   method dispatch for *all* functions) it can look quite
>   seemless for the user to work with these extended classes, but
>   they are not all identical to the basic
"real"/"double" or "integer" classes.
>
> - I'm not the expert here (but there are not so many experts
>   ..), but I'm pretty sure that adding new "basic types" in
the
>   underlying C level seems not at all easy for R.  It would mean a big
>   break in all back compatibility -- which is conceivable --
>   and *may* also need a big rewrite of much of the R code base
>   which seems less conceivable in the mid term (2-3 years; long
>   term: > 5 years).
>
>
>     > Thank you all! :)
>
> You are welcome.
>
> I think we should close this thread here,  unless some real
> experts join.
>
> Martin
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Maybe Matching Threads

Search for more seemingly similar threads

R devel - Jun 2019 - Converting non-32-bit integers from python to R to use bit64: reticulate

[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

Maybe Matching Threads