Michael Chirico
2025-Jan-30  21:07 UTC
[Rd] Suggestion to emphasize Rboolean is unrelated to LGLSXP in R-exts
Hello all, The recent change (r87656) to make Rboolean map to type 'bool', not 'int', broke some tests & made me realize I've had totally the wrong impression about what Rboolean actually is, and I suspect I'm not alone. Till now, I've assumed that like Rbyte --> RAWSXP, Rboolean is the _correct_ storage type for LGLSXP, while idioms like int *ip LOGICAL(...) only "happened to work" because Rboolean masks int, which could change at any time. Actually, it turns out Rboolean only "happened to be int" because of the use of 'enum', which is changing in now-and-future C standards! We made that mistake in 9 places [1], though only one happened to break tests. There are at least dozens of other cases on CRAN [2],[3]. Here's the current exposition on Rboolean in R-exts [4]:> Further, the included header R_ext/Boolean.h has enumeration constants TRUE and FALSE of type Rboolean in order to provide a way of using ?logical? variables in C consistently. This can conflict with other software: for example it conflicts with the headers in IJG?s jpeg-9 (but not earlier versions).I suggest embellishing this, perhaps like so:> Further, the included header R_ext/Boolean.h has enumeration constants TRUE and FALSE of type Rboolean in order to provide a way of using ?logical? variables in C consistently. This has no concept of "missingness", and so is \emph{not} related to the R logical type LGLSXP. This can conflict with other software: for example it conflicts with the headers in IJG?s jpeg-9 (but not earlier versions).It would also be nice if this mistake could be caught by the compiler, but that's another issue. Michael Chirico [1] https://github.com/Rdatatable/data.table/pull/6782/files [2] https://github.com/search?q=org%3Acran+%2FRboolean.*%5CbLOGICAL%2F&type=code [3] https://github.com/search?q=org%3Acran+%2FLOGICAL.*sizeof%5B%28%5DRboolean%2F&type=code [4] https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Mathematical-constants
Ivan Krylov
2025-Feb-01  09:39 UTC
[Rd] [SPAM Warning!] Suggestion to emphasize Rboolean is unrelated to LGLSXP in R-exts
On Thu, 30 Jan 2025 13:07:31 -0800 Michael Chirico <michaelchirico4 at gmail.com> wrote:> There are at least dozens of other cases on CRAN [2],[3].Some of these involve casting an int to Rboolean. Best case, the int is compared against NA_LOGICAL beforehand, avoiding any mistake (there's at least one like that). Worst case, NA_LOGICAL is not considered before the cast, so NA will now be interpreted as TRUE. This is hard to check without actually reading the code. Some packages compare an Rboolean expression against NA_LOGICAL [1]. This implies having stored an int in an Rboolean value as in the previous paragraph. I think that it wasn't disallowed according to the C standard to store NA_LOGICAL in an enumeration type wide enough to fit it (and it evidently worked in practice). With typedef bool Rboolean, storing NA_LOGICAL in an Rboolean converts it to 'true', so the comparison will definitely fail: DPQ src/pnchisq-it.c:530,532 Rmpfr src/convert.c:535 checkmate src/helper.c:102 chron src/unpaste.c:21 collapse src/data.table_rbindlist.c:208,258,383,384,408,431 data.table (many; fixed in Git) ff src/ordermerge.c:5074 (one declaration, many comparisons) networkDynamic src/Rinit.c:209 src/is.active.c:75,76,96-98 slam src/util.c:258 this.path src/get_file_from_closure.h:13,43 src/thispath.c:14,17,19,39 src/ext.c:25 src/setsyspath.c:8 src/get_file_from_closure.h:13,43 Four packages cast int* pointers returned by LOGICAL() to Rboolean* or use sizeof(Rboolean) to calculate buffer sizes in calls to memcpy() with LOGICAL() buffers [2]. With typedef bool Rboolean, this is a serious mistake, because the memory layout of the types is no longer compatible: bit64 src/integer64.c:576,603,914,929,942,955,968,981,994 collapse src/data.table_rbindlist.c:19,67,105 data.table (many; fixed in Git) kit src/utils.c:390 I don't know Coccinelle that well and there may be additional cases I failed to consider. At which point is it appropriate to start notifying maintainers of the bugs not caught by their test suites? -- Best regards, Ivan [1] Coccinelle script: @@ typedef Rboolean; Rboolean E; @@ * E == NA_LOGICAL [2] Coccinelle scripts: @@ typedef Rboolean; int* E; @@ * (Rboolean*)E This one will offer a diff to fix the bug: @@ int *E1; int *E2; typedef Rboolean; @@ ( memcpy | memmove ) (E1, E2, <+... -sizeof(Rboolean) +sizeof(int) ...+> )
Tomas Kalibera
2025-Feb-05  19:54 UTC
[Rd] Suggestion to emphasize Rboolean is unrelated to LGLSXP in R-exts
On 1/30/25 22:07, Michael Chirico wrote:> Hello all, > > The recent change (r87656) to make Rboolean map to type 'bool', not > 'int', broke some tests & made me realize I've had totally the wrong > impression about what Rboolean actually is, and I suspect I'm not > alone. > > Till now, I've assumed that like Rbyte --> RAWSXP, Rboolean is the > _correct_ storage type for LGLSXP, while idioms like int *ip > LOGICAL(...) only "happened to work" because Rboolean masks int, which > could change at any time. > > Actually, it turns out Rboolean only "happened to be int" because of > the use of 'enum', which is changing in now-and-future C standards! > > We made that mistake in 9 places [1], though only one happened to > break tests. There are at least dozens of other cases on CRAN [2],[3]. > > Here's the current exposition on Rboolean in R-exts [4]: > >> Further, the included header R_ext/Boolean.h has enumeration constants TRUE and FALSE of type Rboolean in order to provide a way of using ?logical? variables in C consistently. This can conflict with other software: for example it conflicts with the headers in IJG?s jpeg-9 (but not earlier versions). > I suggest embellishing this, perhaps like so: > >> Further, the included header R_ext/Boolean.h has enumeration constants TRUE and FALSE of type Rboolean in order to provide a way of using ?logical? variables in C consistently. This has no concept of "missingness", and so is \emph{not} related to the R logical type LGLSXP. This can conflict with other software: for example it conflicts with the headers in IJG?s jpeg-9 (but not earlier versions).?Probably worth an extra sentence making this more explicit. I've added one similar to your suggestion. Tomas> It would also be nice if this mistake could be caught by the compiler, > but that's another issue. > > Michael Chirico > > [1] https://github.com/Rdatatable/data.table/pull/6782/files > [2] https://github.com/search?q=org%3Acran+%2FRboolean.*%5CbLOGICAL%2F&type=code > [3] https://github.com/search?q=org%3Acran+%2FLOGICAL.*sizeof%5B%28%5DRboolean%2F&type=code > [4] https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Mathematical-constants > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel