Chris Evans
2021-Sep-18 19:26 UTC
[R] Cacheing of functions from libraries other than the base in Rmarkdown
This question may belong somewhere else, if so, please signpost me and accept apologies. What is happening is that I have a large (for me, > 3k lines) Rmarkdown file with many R code blocks (no other code or engine is used) working on some large datasets. I have some inline r like There are `r n_distinct(tibDat$ID)` participants and `r nrow(tibDat)` rows of data. What I am finding is that even if one knit has worked fine and I change something somewhere and knit again, the second knit is often failing with an error like n_distinct(tibDat$ID) : could not find function "n_distinct" This is not happening for functions like nrow() from base R and it mostly seems to happen to functions from the tidyverse. I think what is happening is some sort of cache corruption presumably caused by the memory demands. I am pretty sure I've seen this before but a long time ago and dealt with it by deleting the files and cache folders created by the knit. That works now too but as knitting the whole file now takes over 20 minutes, I really don't want to have to do that. I have found that replacing things with base functions fixes the problem every time, e.g. replacing `r n_distinct(tibDat$ID)` with `r length(unique(tibDat$ID))` works fine. The other workaround is to compute what you need for the inline computation at the end of the preceding code block, trivial e.g. at the end of the preceding code block: n_distinct(tibDat$ID) -> tmpN ``` and then `r tmpN` that works fine so I have my workarounds but I guess I have three questions: 1) do others see this? 2) is there some setting that might, assuming my guess about the cause is correct, increase some storage somewhere and avert this? 3) if it is a bug, where should I report it (as I'm not sure what is causing it!)? Thanks in advance, Chris> sessionInfo()R version 4.1.1 (2021-08-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] boot_1.3-28 CECPfuns_0.0.0.9041 janitor_2.1.0 lubridate_1.7.10 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.0.1 tidyr_1.1.3 tibble_3.1.4 [12] ggplot2_3.3.5 tidyverse_1.3.1 english_1.2-6 pander_0.6.4 loaded via a namespace (and not attached): [1] fs_1.5.0 bit64_4.0.5 RColorBrewer_1.1-2 httr_1.4.2 tools_4.1.1 backports_1.2.1 utf8_1.2.2 R6_2.5.1 rpart_4.1-15 Hmisc_4.5-0 DBI_1.1.1 [12] colorspace_2.0-2 nnet_7.3-16 withr_2.4.2 tidyselect_1.1.1 gridExtra_2.3 bit_4.0.4 compiler_4.1.1 cli_3.0.1 rvest_1.0.1 htmlTable_2.2.1 xml2_1.3.2 [23] labeling_0.4.2 scales_1.1.1 checkmate_2.0.0 corrr_0.4.3 odbc_1.3.2 digest_0.6.27 readODS_1.7.0 foreign_0.8-81 rmarkdown_2.11 base64enc_0.1-3 jpeg_0.1-9 [34] pkgconfig_2.0.3 htmltools_0.5.2 dbplyr_2.1.1 fastmap_1.1.0 RJDBC_0.2-8 htmlwidgets_1.5.4 rlang_0.4.11 readxl_1.3.1 rstudioapi_0.13 farver_2.1.0 generics_0.1.0 [45] jsonlite_1.7.2 magrittr_2.0.1 Formula_1.2-4 Matrix_1.3-4 Rcpp_1.0.7 munsell_0.5.0 fansi_0.5.0 lifecycle_1.0.0 stringi_1.7.4 yaml_2.2.1 snakecase_0.11.0 [56] grid_4.1.1 blob_1.2.2 crayon_1.4.1 lattice_0.20-44 haven_2.4.3 splines_4.1.1 hms_1.1.0 knitr_1.34 pillar_1.6.2 reprex_2.0.1 glue_1.4.2 [67] evaluate_0.14 latticeExtra_0.6-29 data.table_1.14.0 modelr_0.1.8 png_0.1-7 vctrs_0.3.8 tzdb_0.1.2 psy_1.1 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 [78] xfun_0.26 broom_0.7.9 rsconnect_0.8.24 viridisLite_0.4.0 survival_3.2-13 rJava_1.0-4 cluster_2.1.2 ellipsis_0.3.2 -- Chris Evans (he/him) <chris at psyctc.org> Visiting Professor, University of Sheffield and UDLA, Quito, Ecuador I do some consultation work for the University of Roehampton <chris.evans at roehampton.ac.uk> and other places but <chris at psyctc.org> remains my main Email address. I have a work web site at: https://www.psyctc.org/psyctc/ and a site I manage for CORE and CORE system trust at: http://www.coresystemtrust.org.uk/ I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at: https://www.psyctc.org/pelerinage2016/ceworkdiary/ Beware: French time, generally an hour ahead of UK.
Bert Gunter
2021-Sep-18 20:01 UTC
[R] Cacheing of functions from libraries other than the base in Rmarkdown
I think you should post on the RStudio help forums. They have specific areas to ask for help on their stuff, at least for some of it. You may wish to wait a bit before doing so, though, just to see if someone here responds. Bert On Sat, Sep 18, 2021, 12:26 PM Chris Evans <chrishold at psyctc.org> wrote:> This question may belong somewhere else, if so, please signpost me and > accept apologies. > > What is happening is that I have a large (for me, > 3k lines) Rmarkdown > file with many R code blocks (no other code or > engine is used) working on some large datasets. I have some inline r like > > There are `r n_distinct(tibDat$ID)` participants and `r nrow(tibDat)` > rows of data. > > What I am finding is that even if one knit has worked fine and I change > something somewhere and knit again, the second > knit is often failing with an error like > > n_distinct(tibDat$ID) : could not find function "n_distinct" > > This is not happening for functions like nrow() from base R and it mostly > seems to happen to functions from the tidyverse. > > I think what is happening is some sort of cache corruption presumably > caused by the memory demands. I am pretty sure I've > seen this before but a long time ago and dealt with it by deleting the > files and cache folders created by the knit. That > works now too but as knitting the whole file now takes over 20 minutes, I > really don't want to have to do that. > > I have found that replacing things with base functions fixes the problem > every time, e.g. replacing `r n_distinct(tibDat$ID)` > with `r length(unique(tibDat$ID))` works fine. The other workaround is to > compute what you need for the inline > computation at the end of the preceding code block, trivial e.g. at the > end of the preceding code block: > > n_distinct(tibDat$ID) -> tmpN > ``` > > and then > > `r tmpN` > > that works fine so I have my workarounds but I guess I have three > questions: > > 1) do others see this? > 2) is there some setting that might, assuming my guess about the cause is > correct, increase some storage somewhere and avert this? > 3) if it is a bug, where should I report it (as I'm not sure what is > causing it!)? > > Thanks in advance, > > Chris > > > > > sessionInfo() > R version 4.1.1 (2021-08-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.3 LTS > > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 > LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] boot_1.3-28 CECPfuns_0.0.0.9041 janitor_2.1.0 > lubridate_1.7.10 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 > purrr_0.3.4 readr_2.0.1 tidyr_1.1.3 > tibble_3.1.4 > [12] ggplot2_3.3.5 tidyverse_1.3.1 english_1.2-6 > pander_0.6.4 > > loaded via a namespace (and not attached): > [1] fs_1.5.0 bit64_4.0.5 RColorBrewer_1.1-2 > httr_1.4.2 tools_4.1.1 backports_1.2.1 utf8_1.2.2 > R6_2.5.1 rpart_4.1-15 Hmisc_4.5-0 DBI_1.1.1 > > [12] colorspace_2.0-2 nnet_7.3-16 withr_2.4.2 > tidyselect_1.1.1 gridExtra_2.3 bit_4.0.4 > compiler_4.1.1 cli_3.0.1 rvest_1.0.1 > htmlTable_2.2.1 xml2_1.3.2 > [23] labeling_0.4.2 scales_1.1.1 checkmate_2.0.0 > corrr_0.4.3 odbc_1.3.2 digest_0.6.27 readODS_1.7.0 > foreign_0.8-81 rmarkdown_2.11 base64enc_0.1-3 > jpeg_0.1-9 > [34] pkgconfig_2.0.3 htmltools_0.5.2 dbplyr_2.1.1 > fastmap_1.1.0 RJDBC_0.2-8 htmlwidgets_1.5.4 rlang_0.4.11 > readxl_1.3.1 rstudioapi_0.13 farver_2.1.0 > generics_0.1.0 > [45] jsonlite_1.7.2 magrittr_2.0.1 Formula_1.2-4 > Matrix_1.3-4 Rcpp_1.0.7 munsell_0.5.0 fansi_0.5.0 > lifecycle_1.0.0 stringi_1.7.4 yaml_2.2.1 > snakecase_0.11.0 > [56] grid_4.1.1 blob_1.2.2 crayon_1.4.1 > lattice_0.20-44 haven_2.4.3 splines_4.1.1 hms_1.1.0 > knitr_1.34 pillar_1.6.2 reprex_2.0.1 > glue_1.4.2 > [67] evaluate_0.14 latticeExtra_0.6-29 data.table_1.14.0 > modelr_0.1.8 png_0.1-7 vctrs_0.3.8 tzdb_0.1.2 > psy_1.1 cellranger_1.1.0 gtable_0.3.0 > assertthat_0.2.1 > [78] xfun_0.26 broom_0.7.9 rsconnect_0.8.24 > viridisLite_0.4.0 survival_3.2-13 rJava_1.0-4 cluster_2.1.2 > ellipsis_0.3.2 > > > -- > Chris Evans (he/him) <chris at psyctc.org> Visiting Professor, University of > Sheffield and UDLA, Quito, Ecuador > I do some consultation work for the University of Roehampton < > chris.evans at roehampton.ac.uk> and other places > but <chris at psyctc.org> remains my main Email address. I have a work web > site at: > https://www.psyctc.org/psyctc/ > and a site I manage for CORE and CORE system trust at: > http://www.coresystemtrust.org.uk/ > I have "semigrated" to France, see: > https://www.psyctc.org/pelerinage2016/semigrating-to-france/ > > https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ > > If you want an Emeeting, I am trying to keep them to Thursdays and my > diary is at: > https://www.psyctc.org/pelerinage2016/ceworkdiary/ > Beware: French time, generally an hour ahead of UK. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Berry, Charles
2021-Sep-19 17:28 UTC
[R] Cacheing of functions from libraries other than the base in Rmarkdown
Chris,> On Sep 18, 2021, at 12:26 PM, Chris Evans <chrishold at psyctc.org> wrote: > > This question may belong somewhere else, if so, please signpost me and accept apologies. > > What is happening is that I have a large (for me, > 3k lines) Rmarkdown file with many R code blocks (no other code or > engine is used) working on some large datasets. I have some inline r like > > There are `r n_distinct(tibDat$ID)` participants and `r nrow(tibDat)` rows of data. > > What I am finding is that even if one knit has worked fine and I change something somewhere and knit again, the second > knit is often failing with an error like > > n_distinct(tibDat$ID) : could not find function "n_distinct" > > This is not happening for functions like nrow() from base R and it mostly seems to happen to functions from the tidyverse. > > I think what is happening is some sort of cache corruption presumably caused by the memory demands. I am pretty sure I've > seen this before but a long time ago and dealt with it by deleting the files and cache folders created by the knit.Caching things that depend on libraries is known to be tricky. Specifically, it is advised that "loading packages via library() in a cached chunk and these packages will be used by uncached chunks" is something you should not do. I suspect that this is the problem with your inline chunk. I have to reread things like: https://yihui.org/knitr/demo/cache/ and relevant parts of the manual to be sure I didn't mess something up and maybe you should look at that and the manual yet another time. HTH, Chuck