Gabriel Becker
2019-Dec-20 23:48 UTC
[Rd] list.files(., pattern=<>, recursive = TRUE, include.dirs = TRUE)
Hi all, I ran into a weird corner-case of list.files today and I'm wondering what people think about it and a potential wishlist enhancement related to it. Consider the case where we call list.files with recursive and include.dirs both TRUE and we supply a pattern. In this case pattern is applied to directory names when deciding whether to list the directory return value but NOT when recursing. This behavior is consistent, but I'd argue its also counterintuitive. If a directory is excluded for not matching pattern, I wouldnt necessarily expect its children/contents to even be candidates for inclusion at first blush. If others agree this behavior is strange/suboptimal I figure there are a few different things that can be done here (which I discuss below): 1. Modify behavior list.files(., include.dirs=TRUE, recursive=TRUE, pattern=<>) so that 1. pattern is applied when deciding where to recurse. 2. all directories that (recursively) contain least one file (or *possibly* empty leaf subdirectory) that matches pattern are themselves included in the return value. 2. Add a recurse.pattern argument to list.files (and list.dirs probably) that is used to filter directories recursed into (ignored when recursive == FALSE) . 3. Modify the documentation of list.files so it mentions this inconsistency so that at least this behavior is documented, even if its (arguably) not ideal My thoughts: Both *1.1* and *1.2* are breaking changes, though I suspect that setting include.dirs and recursive both to TRUE, (or, in fact setting include.dirs to TRUE and having a pattern) is probably relatively rare. *1.1* is a more drastic change but in my opinion ultimately more intuitive than *1.2* I think *2 *could be useful, though only if the pattern would actually be the same at different steps of recursion often enough in practice (sometimes but not always, I'd think). *2* would fully backwards compatible (computing on formals lists not withstanding...) unless its default was set to pattern when include.dirs is TRUE, in which case it would be a disable-able implementation of *1.1* I think *3* would be good to do if there's no appetite for doing anything higher on the list. I am happy to submit patches (as wishlist items , except for *3*) for any of the above if there is interest. Thoughts? ~G td = file.path(tempdir(), "listfilestst") dns = c("good", "bad" ) dpths = file.path(td, as.vector(outer(dns, dns, paste, sep .Platform$file.sep))) invisible(lapply(dpths, dir.create, recursive = TRUE)) fpths = as.vector(outer(dpths, c("goodfil", "badfil"), file.path)) invisible(sapply(fpths, function(pth) cat(" ", file = pth))) ## all files(/+dirs) list.files(td, recursive = TRUE) ## [1] "bad/bad/badfil" "bad/bad/goodfil" "bad/good/badfil" ## [4] "bad/good/goodfil" "good/bad/badfil" "good/bad/goodfil" ## [7] "good/good/badfil" "good/good/goodfil" list.files(td, recursive = TRUE, include.dirs = TRUE) ## [1] "bad" "bad/bad" "bad/bad/badfil" ## [4] "bad/bad/goodfil" "bad/good" "bad/good/badfil" ## [7] "bad/good/goodfil" "good" "good/bad" ## [10] "good/bad/badfil" "good/bad/goodfil" "good/good" ## [13] "good/good/badfil" "good/good/goodfil" ## no b files list.files(td, recursive = TRUE, pattern = "^[^b]+$") ## [1] "bad/bad/goodfil" "bad/good/goodfil" "good/bad/goodfil" ## [4] "good/good/goodfil" ## no b files include.dirs=TRUE ## bad is not included but bad/good is (both are directories) ## bad/bad/goodfil is also included list.files(td, recursive = TRUE, pattern = "^[^b]+$", include.dirs = TRUE) ## [1] "bad/bad/goodfil" "bad/good" "bad/good/goodfil" ## [4] "good" "good/bad/goodfil" "good/good" ## [7] "good/good/goodfil" [[alternative HTML version deleted]]