Leonidas Lundell
2021-Aug-26 07:46 UTC
[R] Potential bug/unexpected behaviour in model matrix
Dear R-project, Apologies if I am sending this to the wrong list, and thank you for your enormous contribution. I discovered a subtle interaction between the data.table package and model.matrix function that influences the output to the point that you will get completely erroneous results: df <- data.frame(basespaceID = 8:1, group = paste0(rep(c("a", "b"), 4), "_", sort(rep(c("1", "2"), 4)))) designDF <- model.matrix(~0 + group, data = df) dt <- data.table::as.data.table(df) designDT <- model.matrix(~0 + group, data = dt) all(designDF == designDT) #TRUE data.table::setkey(dt, "basespaceID") designDTkeyed <- model.matrix(~0 + group, data = dt) all(designDF == designDTkeyed) #FALSE # It seems that a keyed data.table reorders the rows of the design matrix by alphabetical order: designDFreordered <- model.matrix(~0 + group, data = df[8:1,]) all(designDFreordered == designDTkeyed) #TRUE And my sessionInfo if that?s of any help: sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.5.2 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods?? base???? other attached packages: [1] data.table_1.14.0 loaded via a namespace (and not attached): [1] umap_0.2.7.0????? Rcpp_1.0.7??????? knitr_1.33????? ??magrittr_2.0.1?? ?[5] maps_3.3.0??????? lattice_0.20-44?? rlang_0.4.11????? stringr_1.4.0??? ?[9] tools_4.1.0?????? grid_4.1.0??????? xfun_0.25???????? png_0.1-7??????? [13] audio_0.1-7?????? RSpectra_0.16-0?? htmltools_0.5.1.1 shapefiles_0.7?? [17] askpass_1.1?????? openssl_1.4.4???? yaml_2.2.1??????? digest_0.6.27??? [21] zip_2.2.0???????? Matrix_1.3-4????? beepr_1.3???????? evaluate_0.14??? [25] rmarkdown_2.10??? openxlsx_4.2.4??? sp_1.4-5????????? stringi_1.7.3??? [29] compiler_4.1.0??? fossil_0.4.0????? jsonlite_1.7.2??? reticulate_1.20? [33] foreign_0.8-81?? Best regards Leonidas Lundell Postdoc Barres & Zierath group ? University of Copenhagen Novo Nordisk Foundation Center for Basic Metabolic Research ? mailto:leo.lundell at sund.ku.dk ? ? -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 22059 bytes Desc: image001.png URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20210826/c3ec9058/attachment.png>
Andrew Simmons
2021-Aug-26 19:26 UTC
[R] Potential bug/unexpected behaviour in model matrix
Hello, I'm not so sure this is a bug, it appears to be behaving as intended from the documentation. I would suggest using argument 'physical' from 'setkey' to avoid reordering the rows. Something like: x <- data.table::data.table(V1 = 9:0) y <- data.table::copy(x) data.table::setkey(x, V1, physical = TRUE) data.table::setkey(y, V1, physical = FALSE) print(x) print(y) attr(x, "index") attr(y, "index") 'x' does not have an attribute index because the rows were reordered. 'y' does have an index because its rows weren't reordered. I hope this helps! On Thu, Aug 26, 2021 at 1:02 PM Leonidas Lundell <leo.lundell at sund.ku.dk> wrote:> Dear R-project, > > Apologies if I am sending this to the wrong list, and thank you for your > enormous contribution. > > I discovered a subtle interaction between the data.table package and > model.matrix function that influences the output to the point that you will > get completely erroneous results: > > df <- data.frame(basespaceID = 8:1, group = paste0(rep(c("a", "b"), 4), > "_", sort(rep(c("1", "2"), 4)))) > designDF <- model.matrix(~0 + group, data = df) > > dt <- data.table::as.data.table(df) > designDT <- model.matrix(~0 + group, data = dt) > > all(designDF == designDT) > #TRUE > > data.table::setkey(dt, "basespaceID") > designDTkeyed <- model.matrix(~0 + group, data = dt) > > all(designDF == designDTkeyed) > #FALSE > > # It seems that a keyed data.table reorders the rows of the design matrix > by alphabetical order: > > designDFreordered <- model.matrix(~0 + group, data = df[8:1,]) > all(designDFreordered == designDTkeyed) > #TRUE > > And my sessionInfo if that?s of any help: > > sessionInfo() > > R version 4.1.0 (2021-05-18) > Platform: x86_64-apple-darwin17.0 (64-bit) > Running under: macOS Big Sur 11.5.2 > > Matrix products: default > LAPACK: > /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.14.0 > > loaded via a namespace (and not attached): > [1] umap_0.2.7.0 Rcpp_1.0.7 knitr_1.33 magrittr_2.0.1 > [5] maps_3.3.0 lattice_0.20-44 rlang_0.4.11 > stringr_1.4.0 > [9] tools_4.1.0 grid_4.1.0 xfun_0.25 > png_0.1-7 > [13] audio_0.1-7 RSpectra_0.16-0 htmltools_0.5.1.1 > shapefiles_0.7 > [17] askpass_1.1 openssl_1.4.4 yaml_2.2.1 > digest_0.6.27 > [21] zip_2.2.0 Matrix_1.3-4 beepr_1.3 > evaluate_0.14 > [25] rmarkdown_2.10 openxlsx_4.2.4 sp_1.4-5 > stringi_1.7.3 > [29] compiler_4.1.0 fossil_0.4.0 jsonlite_1.7.2 > reticulate_1.20 > [33] foreign_0.8-81 > > Best regards > > Leonidas Lundell > Postdoc > Barres & Zierath group > > University of Copenhagen > Novo Nordisk Foundation > Center for Basic Metabolic Research > > mailto:leo.lundell at sund.ku.dk > > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]