Michael Sannella
2018-Oct-08  20:01 UTC
[Rd] bug with OutDec option and deferred_string altrep object
While implementing R's new 'altrep' functionality in the TERR
engine,
I discovered a bug in R's 'deferred_string' altrep object: it is not
using the correct value of the 'OutDec' option when it expands a
deferred_string.  See the following example:
R 3.5.1: (same results in R 3.6.0 devel engine built 10/5)
    > options(scipen=0, OutDec=".")
    > as.character(123.456)
    [1] "123.456"
    > options(scipen=-5, OutDec=",")
    > as.character(123.456)
    [1] "1,23456e+02"
    > xx <- as.character(123.456)
    > options(scipen=0, OutDec=".")
    > xx
    [1] "1.23456e+02"
    >
In the example above, the variable 'xx' is set to a deferred_string
while OutDec is ','.  However, when the string is actually formatted
(when xx is printed), it uses the current option value OutDec='.' to
format the string.  I think that deferred_string should use the value
OutDec=',' from when as.character was called.
Note that the behavior is different with the 'scipen' option: The
deferred_string object records the scipen=-5 value when as.character
is called, and uses this value when xx is printed.  Looking at the
deferred_string object, it appears that CDR(R_altrep_data1(<obj>)) is
set to a scalar integer containing the scipen value at the time the
deferred_string was created.
Ideally, the deferred_string object would save both the scipen and
OutDec option values.  I'd suggest saving these values as regular
pairlist values, say by setting the data1 field to pairlist(<source>,
scipen=-5L, OutDec=',') for the value of xx above.  To save space, you
could avoid saving these values in the common case where scipen=0L,
OutDec='.'.  It would also be better if the data1 field was a
well-formed pairlist; the current value of the data1 field causes
R_inspect to segfault.
I understand that you probably wouldn't want to change the
deferred_string structure.  An alternative fix would be to avoid this
case by:
  1. Never create a deferred_string if OutDec is not '.'.
  2. When expanding an element of a deferred_string, temporarily set
OutDec to '.'.
  ~~ Michael Sannella
	[[alternative HTML version deleted]]
Tierney, Luke
2018-Oct-09  02:33 UTC
[Rd] bug with OutDec option and deferred_string altrep object
Thanks for the report. The approach you outlines below should work -- I'll look into it. Best, luke On Mon, 8 Oct 2018, Michael Sannella wrote:> While implementing R's new 'altrep' functionality in the TERR engine, > I discovered a bug in R's 'deferred_string' altrep object: it is not > using the correct value of the 'OutDec' option when it expands a > deferred_string.? See the following example: > > R 3.5.1: (same results in R 3.6.0 devel engine built 10/5) > ? ? > options(scipen=0, OutDec=".") > ? ? > as.character(123.456) > ? ? [1] "123.456" > ? ? > options(scipen=-5, OutDec=",") > ? ? > as.character(123.456) > ? ? [1] "1,23456e+02" > ? ? > xx <- as.character(123.456) > ? ? > options(scipen=0, OutDec=".") > ? ? > xx > ? ? [1] "1.23456e+02" > ? ? > > > In the example above, the variable 'xx' is set to a deferred_string > while OutDec is ','.? However, when the string is actually formatted > (when xx is printed), it uses the current option value OutDec='.' to > format the string.? I think that deferred_string should use the value > OutDec=',' from when as.character was called. > > Note that the behavior is different with the 'scipen' option: The > deferred_string object records the scipen=-5 value when as.character > is called, and uses this value when xx is printed.? Looking at the > deferred_string object, it appears that CDR(R_altrep_data1(<obj>)) is > set to a scalar integer containing the scipen value at the time the > deferred_string was created. > > Ideally, the deferred_string object would save both the scipen and > OutDec option values.? I'd suggest saving these values as regular > pairlist values, say by setting the data1 field to pairlist(<source>, > scipen=-5L, OutDec=',') for the value of xx above.? To save space, you > could avoid saving these values in the common case where scipen=0L, > OutDec='.'.? It would also be better if the data1 field was a > well-formed pairlist; the current value of the data1 field causes > R_inspect to segfault. > > I understand that you probably wouldn't want to change the > deferred_string structure.? An alternative fix would be to avoid this > case by: > ? 1. Never create a deferred_string if OutDec is not '.'. > ? 2. When expanding an element of a deferred_string, temporarily set > OutDec to '.'. > > ? ~~ Michael Sannella > > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Tierney, Luke
2018-Oct-09  22:04 UTC
[Rd] bug with OutDec option and deferred_string altrep object
This is now fixed in R-devel. Will port to R_patched in due course. R_inspect also now handles pairlists ending with dotted pairs. Best, luke On Tue, 9 Oct 2018, Tierney, Luke wrote:> Thanks for the report. The approach you outlines below should work -- > I'll look into it. > > Best, > > luke > > On Mon, 8 Oct 2018, Michael Sannella wrote: > >> While implementing R's new 'altrep' functionality in the TERR engine, >> I discovered a bug in R's 'deferred_string' altrep object: it is not >> using the correct value of the 'OutDec' option when it expands a >> deferred_string.? See the following example: >> >> R 3.5.1: (same results in R 3.6.0 devel engine built 10/5) >> ? ? > options(scipen=0, OutDec=".") >> ? ? > as.character(123.456) >> ? ? [1] "123.456" >> ? ? > options(scipen=-5, OutDec=",") >> ? ? > as.character(123.456) >> ? ? [1] "1,23456e+02" >> ? ? > xx <- as.character(123.456) >> ? ? > options(scipen=0, OutDec=".") >> ? ? > xx >> ? ? [1] "1.23456e+02" >> ? ? > >> >> In the example above, the variable 'xx' is set to a deferred_string >> while OutDec is ','.? However, when the string is actually formatted >> (when xx is printed), it uses the current option value OutDec='.' to >> format the string.? I think that deferred_string should use the value >> OutDec=',' from when as.character was called. >> >> Note that the behavior is different with the 'scipen' option: The >> deferred_string object records the scipen=-5 value when as.character >> is called, and uses this value when xx is printed.? Looking at the >> deferred_string object, it appears that CDR(R_altrep_data1(<obj>)) is >> set to a scalar integer containing the scipen value at the time the >> deferred_string was created. >> >> Ideally, the deferred_string object would save both the scipen and >> OutDec option values.? I'd suggest saving these values as regular >> pairlist values, say by setting the data1 field to pairlist(<source>, >> scipen=-5L, OutDec=',') for the value of xx above.? To save space, you >> could avoid saving these values in the common case where scipen=0L, >> OutDec='.'.? It would also be better if the data1 field was a >> well-formed pairlist; the current value of the data1 field causes >> R_inspect to segfault. >> >> I understand that you probably wouldn't want to change the >> deferred_string structure.? An alternative fix would be to avoid this >> case by: >> ? 1. Never create a deferred_string if OutDec is not '.'. >> ? 2. When expanding an element of a deferred_string, temporarily set >> OutDec to '.'. >> >> ? ~~ Michael Sannella >> >> >> > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Apparently Analagous Threads
- bug with OutDec option and deferred_string altrep object
- STRING_IS_SORTED claims as.character(1:100) is sorted
- R_ext/Altrep.h should be more C++-friendly
- R_ext/Altrep.h should be more C++-friendly
- v3 serialization of compact_intseq altrep should write modified data