Michael Sannella
2018-Oct-08 20:01 UTC
[Rd] bug with OutDec option and deferred_string altrep object
While implementing R's new 'altrep' functionality in the TERR engine, I discovered a bug in R's 'deferred_string' altrep object: it is not using the correct value of the 'OutDec' option when it expands a deferred_string. See the following example: R 3.5.1: (same results in R 3.6.0 devel engine built 10/5) > options(scipen=0, OutDec=".") > as.character(123.456) [1] "123.456" > options(scipen=-5, OutDec=",") > as.character(123.456) [1] "1,23456e+02" > xx <- as.character(123.456) > options(scipen=0, OutDec=".") > xx [1] "1.23456e+02" > In the example above, the variable 'xx' is set to a deferred_string while OutDec is ','. However, when the string is actually formatted (when xx is printed), it uses the current option value OutDec='.' to format the string. I think that deferred_string should use the value OutDec=',' from when as.character was called. Note that the behavior is different with the 'scipen' option: The deferred_string object records the scipen=-5 value when as.character is called, and uses this value when xx is printed. Looking at the deferred_string object, it appears that CDR(R_altrep_data1(<obj>)) is set to a scalar integer containing the scipen value at the time the deferred_string was created. Ideally, the deferred_string object would save both the scipen and OutDec option values. I'd suggest saving these values as regular pairlist values, say by setting the data1 field to pairlist(<source>, scipen=-5L, OutDec=',') for the value of xx above. To save space, you could avoid saving these values in the common case where scipen=0L, OutDec='.'. It would also be better if the data1 field was a well-formed pairlist; the current value of the data1 field causes R_inspect to segfault. I understand that you probably wouldn't want to change the deferred_string structure. An alternative fix would be to avoid this case by: 1. Never create a deferred_string if OutDec is not '.'. 2. When expanding an element of a deferred_string, temporarily set OutDec to '.'. ~~ Michael Sannella [[alternative HTML version deleted]]
Tierney, Luke
2018-Oct-09 02:33 UTC
[Rd] bug with OutDec option and deferred_string altrep object
Thanks for the report. The approach you outlines below should work -- I'll look into it. Best, luke On Mon, 8 Oct 2018, Michael Sannella wrote:> While implementing R's new 'altrep' functionality in the TERR engine, > I discovered a bug in R's 'deferred_string' altrep object: it is not > using the correct value of the 'OutDec' option when it expands a > deferred_string.? See the following example: > > R 3.5.1: (same results in R 3.6.0 devel engine built 10/5) > ? ? > options(scipen=0, OutDec=".") > ? ? > as.character(123.456) > ? ? [1] "123.456" > ? ? > options(scipen=-5, OutDec=",") > ? ? > as.character(123.456) > ? ? [1] "1,23456e+02" > ? ? > xx <- as.character(123.456) > ? ? > options(scipen=0, OutDec=".") > ? ? > xx > ? ? [1] "1.23456e+02" > ? ? > > > In the example above, the variable 'xx' is set to a deferred_string > while OutDec is ','.? However, when the string is actually formatted > (when xx is printed), it uses the current option value OutDec='.' to > format the string.? I think that deferred_string should use the value > OutDec=',' from when as.character was called. > > Note that the behavior is different with the 'scipen' option: The > deferred_string object records the scipen=-5 value when as.character > is called, and uses this value when xx is printed.? Looking at the > deferred_string object, it appears that CDR(R_altrep_data1(<obj>)) is > set to a scalar integer containing the scipen value at the time the > deferred_string was created. > > Ideally, the deferred_string object would save both the scipen and > OutDec option values.? I'd suggest saving these values as regular > pairlist values, say by setting the data1 field to pairlist(<source>, > scipen=-5L, OutDec=',') for the value of xx above.? To save space, you > could avoid saving these values in the common case where scipen=0L, > OutDec='.'.? It would also be better if the data1 field was a > well-formed pairlist; the current value of the data1 field causes > R_inspect to segfault. > > I understand that you probably wouldn't want to change the > deferred_string structure.? An alternative fix would be to avoid this > case by: > ? 1. Never create a deferred_string if OutDec is not '.'. > ? 2. When expanding an element of a deferred_string, temporarily set > OutDec to '.'. > > ? ~~ Michael Sannella > > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Tierney, Luke
2018-Oct-09 22:04 UTC
[Rd] bug with OutDec option and deferred_string altrep object
This is now fixed in R-devel. Will port to R_patched in due course. R_inspect also now handles pairlists ending with dotted pairs. Best, luke On Tue, 9 Oct 2018, Tierney, Luke wrote:> Thanks for the report. The approach you outlines below should work -- > I'll look into it. > > Best, > > luke > > On Mon, 8 Oct 2018, Michael Sannella wrote: > >> While implementing R's new 'altrep' functionality in the TERR engine, >> I discovered a bug in R's 'deferred_string' altrep object: it is not >> using the correct value of the 'OutDec' option when it expands a >> deferred_string.? See the following example: >> >> R 3.5.1: (same results in R 3.6.0 devel engine built 10/5) >> ? ? > options(scipen=0, OutDec=".") >> ? ? > as.character(123.456) >> ? ? [1] "123.456" >> ? ? > options(scipen=-5, OutDec=",") >> ? ? > as.character(123.456) >> ? ? [1] "1,23456e+02" >> ? ? > xx <- as.character(123.456) >> ? ? > options(scipen=0, OutDec=".") >> ? ? > xx >> ? ? [1] "1.23456e+02" >> ? ? > >> >> In the example above, the variable 'xx' is set to a deferred_string >> while OutDec is ','.? However, when the string is actually formatted >> (when xx is printed), it uses the current option value OutDec='.' to >> format the string.? I think that deferred_string should use the value >> OutDec=',' from when as.character was called. >> >> Note that the behavior is different with the 'scipen' option: The >> deferred_string object records the scipen=-5 value when as.character >> is called, and uses this value when xx is printed.? Looking at the >> deferred_string object, it appears that CDR(R_altrep_data1(<obj>)) is >> set to a scalar integer containing the scipen value at the time the >> deferred_string was created. >> >> Ideally, the deferred_string object would save both the scipen and >> OutDec option values.? I'd suggest saving these values as regular >> pairlist values, say by setting the data1 field to pairlist(<source>, >> scipen=-5L, OutDec=',') for the value of xx above.? To save space, you >> could avoid saving these values in the common case where scipen=0L, >> OutDec='.'.? It would also be better if the data1 field was a >> well-formed pairlist; the current value of the data1 field causes >> R_inspect to segfault. >> >> I understand that you probably wouldn't want to change the >> deferred_string structure.? An alternative fix would be to avoid this >> case by: >> ? 1. Never create a deferred_string if OutDec is not '.'. >> ? 2. When expanding an element of a deferred_string, temporarily set >> OutDec to '.'. >> >> ? ~~ Michael Sannella >> >> >> > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Possibly Parallel Threads
- bug with OutDec option and deferred_string altrep object
- STRING_IS_SORTED claims as.character(1:100) is sorted
- R_ext/Altrep.h should be more C++-friendly
- R_ext/Altrep.h should be more C++-friendly
- v3 serialization of compact_intseq altrep should write modified data