Gerhard Burger
2018-Nov-13 17:24 UTC
[R] Using lm on data.frame with categorical data as character column results in error in plot.lm
Hi all, Not sure if the following could be considered a bug, or just a user error but here goes: We're teaching our students to use the tidyverse for most of their R stuff and the following gives problems (code adapted/shortened to pinpoint problem): ``` iris_long = tidyr::gather(iris, key ="variable", value = "value", -Species) iris_lm = lm( value ~ Species + variable, data = iris_long) stats:::plot.lm(iris_lm, which = 5) ``` whereas, if we use reshape::melt instead of tidyr::gather it works fine: ``` iris_long = reshape2::melt(iris) iris_lm = lm( value ~ Species + variable, data = iris_long) stats:::plot.lm(iris_lm, which = 5) ``` Now the only difference between the output from melt and gather is that the resulting "variable" column is a factor column in melt, but a character column in gather: ``` testthat::expect_identical(reshape2::melt(iris), tidyr::gather(iris, key ="variable", value = "value", -Species)) ``` This can be fixed by adding `factor_key = T` to the gather call, after which everything works fine. Are categorical variables required to be in a factor column? Because `lm` seems to handle it fine, but `plot.lm` gives problems... Is this something that might need a fix in plot.lm? Any insight appreciated! Kind regards, Gerhard For completeness, my sessionInfo: ``` R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 R6_2.2.2 plyr_1.8.4 magrittr_1.5 pillar_1.3.0 rlang_0.2.2 [9] stringi_1.2.4 reshape2_1.4.3 rstudioapi_0.7 testthat_2.0.0 tools_3.5.1 stringr_1.3.1 glue_1.3.0 purrr_0.2.5 [17] compiler_3.5.1 tidyselect_0.2.4 tibble_1.4.2 ``` [[alternative HTML version deleted]]