Jan van der Laan
2017-Oct-06 10:08 UTC
[Rd] Using response variable in interaction as explanatory variable in glm crashes R
The following code crashes R (I know I shouldn't try to estimate such a model; this was a bug in some code of mine). I also tried with R-devel; same result. tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE), dob_year = c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n = c(1489634L, 17491L, 134985L, 1639L, 47892L, 611L, 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657, 48547, 593, 4423, 54)), .Names = c("dob_day", "dob_mon", "dob_year", "n", "pred1"), row.names = c(NA, -8L), class = "data.frame") m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family = binomial()) The crash doesn't when the variables are added just as main effects (dob_day+dob_mon): this results in a warning and the removal of dob_mon from the formula. -- Jan > R.version ?????????????? _ platform?????? x86_64-pc-linux-gnu arch?????????? x86_64 os???????????? linux-gnu system???????? x86_64, linux-gnu status major????????? 3 minor????????? 4.1 year?????????? 2017 month????????? 06 day??????????? 30 svn rev??????? 72865 language?????? R version.string R version 3.4.1 (2017-06-30) nickname?????? Single Candle
Jan van der Laan
2017-Oct-06 10:13 UTC
[Rd] Using response variable in interaction as explanatory variable in glm crashes R
It is actually model.matrix that crashes, not glm. Same crash occurs with e.g. lm. model.matrix(dob_mon ~ dob_day*dob_mon, data = tab) also crashes R. Jan On 06-10-17 12:08, Jan van der Laan wrote:> > The following code crashes R (I know I shouldn't try to estimate such > a model; this was a bug in some code of mine). I also tried with > R-devel; same result. > > > tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE, FALSE, TRUE, > TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE, TRUE, TRUE, FALSE, > FALSE, TRUE, TRUE), dob_year = c(FALSE, TRUE, FALSE, TRUE, FALSE, > TRUE, FALSE, TRUE), n = c(1489634L, 17491L, 134985L, 1639L, 47892L, > 611L, 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657, 48547, > 593, 4423, 54)), .Names = c("dob_day", "dob_mon", "dob_year", > "n", "pred1"), row.names = c(NA, -8L), class = "data.frame") > > m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family = binomial()) > > > The crash doesn't when the variables are added just as main effects > (dob_day+dob_mon): this results in a warning and the removal of > dob_mon from the formula. > > -- > > Jan > > > > > R.version > ?????????????? _ > platform?????? x86_64-pc-linux-gnu > arch?????????? x86_64 > os???????????? linux-gnu > system???????? x86_64, linux-gnu > status > major????????? 3 > minor????????? 4.1 > year?????????? 2017 > month????????? 06 > day??????????? 30 > svn rev??????? 72865 > language?????? R > version.string R version 3.4.1 (2017-06-30) > nickname?????? Single Candle > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2017-Oct-09 15:52 UTC
[Rd] Using response variable in interaction as explanatory variable in glm crashes R
>>>>> Jan van der Laan <rhelp at eoos.dds.nl> >>>>> on Fri, 6 Oct 2017 12:13:39 +0200 writes:> It is actually model.matrix that crashes, not glm. Same > crash occurs with e.g. lm. > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab) > also crashes R. Yes, segmentation fault. It only happens when these are *logical* variables, not, e.g., when transformed to integer. The C code in src/library/stats/src/model.c tries to eliminate occurances of the LHS of the formula from the RHS when building the model matrix and it does work fine in the integer case. Part of the culprit code may be this (from line 717), with the isLogical(.) which in our case, shifts the pointer by 1 in the call to firstfactor() : int adj = isLogical(var_i)?1:0; // avoid overflow of jstart * nn PR#15578 firstfactor(&rx[jstart * nn], n, jnext - jstart, REAL(contrast), nrows(contrast), ncols(contrast), INTEGER(var_i)+adj); then in firstfactor(), we see the segfault (when running R with '-d gdb') : > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab) Program received signal SIGSEGV, Segmentation fault. 0x00007fffeafa76b5 in firstfactor (ncx=0, v=0x5c3b37c, ncc=1, nrc=2, c=0x5c90008, nrx=8, x=0x5cbf150) at ../../../../../R/src/library/stats/src/model.c:252 252 else xj[i] = cj[v[i]-1]; Missing separate debuginfos, ................. (gdb) list 247 for (int j = 0; j < ncc; j++) { 248 xj = &x[j * (R_xlen_t)nrx]; 249 cj = &c[j * (R_xlen_t)nrc]; 250 for (int i = 0; i < nrx; i++) 251 if(v[i] == NA_INTEGER) xj[i] = NA_REAL; 252 else xj[i] = cj[v[i]-1]; 253 } 254 } 255 and indeed in the debugger, i=7 and v[i] is "outside", v[] being of length 7, hence indexed 0:6. > Jan > On 06-10-17 12:08, Jan van der Laan wrote: >> >> The following code crashes R (I know I shouldn't try to >> estimate such a model; this was a bug in some code of >> mine). I also tried with R-devel; same result. >> >> >> tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE, >> FALSE, TRUE, TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE, >> TRUE, TRUE, FALSE, FALSE, TRUE, TRUE), dob_year >> c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n >> = c(1489634L, 17491L, 134985L, 1639L, 47892L, 611L, >> 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657, >> 48547, 593, 4423, 54)), .Names = c("dob_day", "dob_mon", >> "dob_year", "n", "pred1"), row.names = c(NA, -8L), class >> = "data.frame") >> >> m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family >> binomial()) >> >> >> The crash doesn't when the variables are added just as >> main effects (dob_day+dob_mon): this results in a warning >> and the removal of dob_mon from the formula. >> >> -- >> >> Jan >> >> >> >> > R.version ?????????????? _ platform?????? >> x86_64-pc-linux-gnu arch?????????? x86_64 os???????????? >> linux-gnu system???????? x86_64, linux-gnu status >> major????????? 3 minor????????? 4.1 year?????????? 2017 >> month????????? 06 day??????????? 30 svn rev??????? 72865 >> language?????? R version.string R version 3.4.1 >> (2017-06-30) nickname?????? Single Candle >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Possibly Parallel Threads
- Using response variable in interaction as explanatory variable in glm crashes R
- Using response variable in interaction as explanatory variable in glm crashes R
- Contrasts for 2x4 interaction in mixed effects model
- ui and ci explanatory documentation
- FW: Reference category for explanatory factors