Hadley Wickham
2018-Aug-23 18:31 UTC
[Rd] conflicted: an alternative conflict resolution strategy
Hi all, I?d love to get your feedback on the conflicted package, which provides an alternative strategy for resolving ambiugous function names (i.e. when multiple packages provide identically named functions). conflicted 0.1.0 is already on CRAN, but I?m currently preparing a revision (<https://github.com/r-lib/conflicted>), and looking for feedback. As you are no doubt aware, R?s default approach means that the most recently loaded package ?wins? any conflicts. You do get a message about conflicts on load, but I see a lot newer R users experiencing problems caused by function conflicts. I think there are three primary reasons: - People don?t read messages about conflicts. Even if you are conscientious and do read the messages, it?s hard to notice a single new conflict caused by a package upgrade. - The warning and the problem may be quite far apart. If you load all your packages at the top of the script, it may potentially be 100s of lines before you encounter a conflict. - The error messages caused by conflicts are cryptic because you end up calling a function with utterly unexpected arguments. For these reasons, conflicted takes an alternative approach, forcing the user to explicitly disambiguate any conflicts: library(conflicted) library(dplyr) library(MASS) select #> Error: [conflicted] `select` found in 2 packages. #> Either pick the one you want with `::` #> * MASS::select #> * dplyr::select #> Or declare a preference with `conflicted_prefer()` #> * conflict_prefer("select", "MASS") #> * conflict_prefer("select", "dplyr") conflicted works by attaching a new ?conflicted? environment just after the global environment. This environment contains an active binding for any ambiguous bindings. The conflicted environment also contains bindings for `library()` and `require()` that rebuild the conflicted environemnt suppress default reporting (but are otherwise thin wrapeprs around the base equivalents). conflicted also provides a `conflict_scout()` helper which you can use to see what?s going on: conflict_scout(c("dplyr", "MASS")) #> 1 conflict: #> * `select`: dplyr, MASS conflicted applies a few heuristics to minimise false positives (at the cost of introducing a few false negatives). The overarching goal is to ensure that code behaves identically regardless of the order in which packages are attached. - A number of packages provide a function that appears to conflict with a function in a base package, but they follow the superset principle (i.e. they only extend the API, as explained to me by Herv? Pages). conflicted assumes that packages adhere to the superset principle, which appears to be true in most of the cases that I?ve seen. For example, the lubridate package provides `as.difftime()` and `date()` which extend the behaviour of base functions, and provides S4 generics for the set operators. conflict_scout(c("lubridate", "base")) #> 5 conflicts: #> * `as.difftime`: [lubridate] #> * `date` : [lubridate] #> * `intersect` : [lubridate] #> * `setdiff` : [lubridate] #> * `union` : [lubridate] There are two popular functions that don?t adhere to this principle: `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these special cases so they correctly generate conflicts. (I sure wish I?d know about the subset principle when creating dplyr!) conflict_scout(c("dplyr", "stats")) #> 2 conflicts: #> * `filter`: dplyr, stats #> * `lag` : dplyr, stats - Deprecated functions should never win a conflict, so conflicted checks for use of `.Deprecated()`. This rule is very useful when moving functions from one package to another. For example, many devtools functions were moved to usethis, and conflicted ensures that you always get the non-deprecated version, regardess of package attach order: head(conflict_scout(c("devtools", "usethis"))) #> 26 conflicts: #> * `use_appveyor` : [usethis] #> * `use_build_ignore` : [usethis] #> * `use_code_of_conduct`: [usethis] #> * `use_coverage` : [usethis] #> * `use_cran_badge` : [usethis] #> * `use_cran_comments` : [usethis] #> ... Finally, as mentioned above, the user can declare preferences: conflict_prefer("select", "MASS") #> [conflicted] Will prefer MASS::select over any other package conflict_scout(c("dplyr", "MASS")) #> 1 conflict: #> * `select`: [MASS] I?d love to hear what people think about the general idea, and if there are any obviously missing pieces. Thanks! Hadley -- http://hadley.nz
Duncan Murdoch
2018-Aug-23 20:46 UTC
[Rd] conflicted: an alternative conflict resolution strategy
First, some general comments: This sounds like a useful package. I would guess it has very little impact on runtime efficiency except when attaching a new package; have you checked that? I am not so sure about your heuristics. Can they be disabled, so the user is always forced to make the choice? Even when a function is intended to adhere to the superset principle, they don't always get it right, so a really careful user should always do explicit disambiguation. And of course, if users wrote most of their long scripts as packages instead of as long scripts, the ambiguity issue would arise far less often, because namespaces in packages are intended to solve the same problem as your package does. One more comment inline about a typo, possibly in an error message. Duncan Murdoch On 23/08/2018 2:31 PM, Hadley Wickham wrote:> Hi all, > > I?d love to get your feedback on the conflicted package, which provides an > alternative strategy for resolving ambiugous function names (i.e. when > multiple packages provide identically named functions). conflicted 0.1.0 > is already on CRAN, but I?m currently preparing a revision > (<https://github.com/r-lib/conflicted>), and looking for feedback. > > As you are no doubt aware, R?s default approach means that the most > recently loaded package ?wins? any conflicts. You do get a message about > conflicts on load, but I see a lot newer R users experiencing problems > caused by function conflicts. I think there are three primary reasons: > > - People don?t read messages about conflicts. Even if you are > conscientious and do read the messages, it?s hard to notice a single > new conflict caused by a package upgrade. > > - The warning and the problem may be quite far apart. If you load all > your packages at the top of the script, it may potentially be 100s > of lines before you encounter a conflict. > > - The error messages caused by conflicts are cryptic because you end > up calling a function with utterly unexpected arguments. > > For these reasons, conflicted takes an alternative approach, forcing the > user to explicitly disambiguate any conflicts: > > library(conflicted) > library(dplyr) > library(MASS) > > select > #> Error: [conflicted] `select` found in 2 packages. > #> Either pick the one you want with `::` > #> * MASS::select > #> * dplyr::select > #> Or declare a preference with `conflicted_prefer()` > #> * conflict_prefer("select", "MASS") > #> * conflict_prefer("select", "dplyr")I don't know if this is a typo in your r-devel message or a typo in the error message, but you say `conflicted_prefer()` in one place and conflict_prefer() in the other.> > conflicted works by attaching a new ?conflicted? environment just after > the global environment. This environment contains an active binding for > any ambiguous bindings. The conflicted environment also contains > bindings for `library()` and `require()` that rebuild the conflicted > environemnt suppress default reporting (but are otherwise thin wrapeprs > around the base equivalents). > > conflicted also provides a `conflict_scout()` helper which you can use > to see what?s going on: > > conflict_scout(c("dplyr", "MASS")) > #> 1 conflict: > #> * `select`: dplyr, MASS > > conflicted applies a few heuristics to minimise false positives (at the > cost of introducing a few false negatives). The overarching goal is to > ensure that code behaves identically regardless of the order in which > packages are attached. > > - A number of packages provide a function that appears to conflict > with a function in a base package, but they follow the superset > principle (i.e. they only extend the API, as explained to me by > Herv? Pages). > > conflicted assumes that packages adhere to the superset principle, > which appears to be true in most of the cases that I?ve seen. For > example, the lubridate package provides `as.difftime()` and `date()` > which extend the behaviour of base functions, and provides S4 > generics for the set operators. > > conflict_scout(c("lubridate", "base")) > #> 5 conflicts: > #> * `as.difftime`: [lubridate] > #> * `date` : [lubridate] > #> * `intersect` : [lubridate] > #> * `setdiff` : [lubridate] > #> * `union` : [lubridate] > > There are two popular functions that don?t adhere to this principle: > `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these > special cases so they correctly generate conflicts. (I sure wish I?d > know about the subset principle when creating dplyr!) > > conflict_scout(c("dplyr", "stats")) > #> 2 conflicts: > #> * `filter`: dplyr, stats > #> * `lag` : dplyr, stats > > - Deprecated functions should never win a conflict, so conflicted > checks for use of `.Deprecated()`. This rule is very useful when > moving functions from one package to another. For example, many > devtools functions were moved to usethis, and conflicted ensures > that you always get the non-deprecated version, regardess of package > attach order: > > head(conflict_scout(c("devtools", "usethis"))) > #> 26 conflicts: > #> * `use_appveyor` : [usethis] > #> * `use_build_ignore` : [usethis] > #> * `use_code_of_conduct`: [usethis] > #> * `use_coverage` : [usethis] > #> * `use_cran_badge` : [usethis] > #> * `use_cran_comments` : [usethis] > #> ... > > Finally, as mentioned above, the user can declare preferences: > > conflict_prefer("select", "MASS") > #> [conflicted] Will prefer MASS::select over any other package > conflict_scout(c("dplyr", "MASS")) > #> 1 conflict: > #> * `select`: [MASS] > > I?d love to hear what people think about the general idea, and if there > are any obviously missing pieces. > > Thanks! > > Hadley > >
Jari Oksanen
2018-Aug-24 07:12 UTC
[Rd] conflicted: an alternative conflict resolution strategy
If you have to load two packages which both export the same name in their namespaces, namespace does not help in resolving which synonymous function to use. Neither does it help to have a package instead of a script as long as you end up loading two namespaces with name conflicts. The order of importing namespaces can also be difficult to control, because you may end up loading a namespace already when you start your R with a saved workspace. Moving a function to another package may be a transitional issue which disappears when both packages are at their final stages, but if you use the recommend deprecation stage, the same names can live together for a long time. So this package is a good idea, and preferably base R should be able to handle the issue of choosing between exported synonymous functions. This has bitten me several times in package development, and with growing CRAN it is a growing problem. Package authors often have poor control of the issue, as they do not know what packages users use. Now we can only have a FAQ that tells that a certain error message does not come from a function in our package, but from some other package having a synonymous function that was used instead. cheers, Jari Oksanen On 23 Aug 2018, at 23:46 pm, Duncan Murdoch <murdoch.duncan at gmail.com<mailto:murdoch.duncan at gmail.com>> wrote: First, some general comments: This sounds like a useful package. I would guess it has very little impact on runtime efficiency except when attaching a new package; have you checked that? I am not so sure about your heuristics. Can they be disabled, so the user is always forced to make the choice? Even when a function is intended to adhere to the superset principle, they don't always get it right, so a really careful user should always do explicit disambiguation. And of course, if users wrote most of their long scripts as packages instead of as long scripts, the ambiguity issue would arise far less often, because namespaces in packages are intended to solve the same problem as your package does. One more comment inline about a typo, possibly in an error message. Duncan Murdoch On 23/08/2018 2:31 PM, Hadley Wickham wrote: Hi all, I?d love to get your feedback on the conflicted package, which provides an alternative strategy for resolving ambiugous function names (i.e. when multiple packages provide identically named functions). conflicted 0.1.0 is already on CRAN, but I?m currently preparing a revision (<https://github.com/r-lib/conflicted>), and looking for feedback. As you are no doubt aware, R?s default approach means that the most recently loaded package ?wins? any conflicts. You do get a message about conflicts on load, but I see a lot newer R users experiencing problems caused by function conflicts. I think there are three primary reasons: - People don?t read messages about conflicts. Even if you are conscientious and do read the messages, it?s hard to notice a single new conflict caused by a package upgrade. - The warning and the problem may be quite far apart. If you load all your packages at the top of the script, it may potentially be 100s of lines before you encounter a conflict. - The error messages caused by conflicts are cryptic because you end up calling a function with utterly unexpected arguments. For these reasons, conflicted takes an alternative approach, forcing the user to explicitly disambiguate any conflicts: library(conflicted) library(dplyr) library(MASS) select #> Error: [conflicted] `select` found in 2 packages. #> Either pick the one you want with `::` #> * MASS::select #> * dplyr::select #> Or declare a preference with `conflicted_prefer()` #> * conflict_prefer("select", "MASS") #> * conflict_prefer("select", "dplyr") I don't know if this is a typo in your r-devel message or a typo in the error message, but you say `conflicted_prefer()` in one place and conflict_prefer() in the other. conflicted works by attaching a new ?conflicted? environment just after the global environment. This environment contains an active binding for any ambiguous bindings. The conflicted environment also contains bindings for `library()` and `require()` that rebuild the conflicted environemnt suppress default reporting (but are otherwise thin wrapeprs around the base equivalents). conflicted also provides a `conflict_scout()` helper which you can use to see what?s going on: conflict_scout(c("dplyr", "MASS")) #> 1 conflict: #> * `select`: dplyr, MASS conflicted applies a few heuristics to minimise false positives (at the cost of introducing a few false negatives). The overarching goal is to ensure that code behaves identically regardless of the order in which packages are attached. - A number of packages provide a function that appears to conflict with a function in a base package, but they follow the superset principle (i.e. they only extend the API, as explained to me by Herv? Pages). conflicted assumes that packages adhere to the superset principle, which appears to be true in most of the cases that I?ve seen. For example, the lubridate package provides `as.difftime()` and `date()` which extend the behaviour of base functions, and provides S4 generics for the set operators. conflict_scout(c("lubridate", "base")) #> 5 conflicts: #> * `as.difftime`: [lubridate] #> * `date` : [lubridate] #> * `intersect` : [lubridate] #> * `setdiff` : [lubridate] #> * `union` : [lubridate] There are two popular functions that don?t adhere to this principle: `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these special cases so they correctly generate conflicts. (I sure wish I?d know about the subset principle when creating dplyr!) conflict_scout(c("dplyr", "stats")) #> 2 conflicts: #> * `filter`: dplyr, stats #> * `lag` : dplyr, stats - Deprecated functions should never win a conflict, so conflicted checks for use of `.Deprecated()`. This rule is very useful when moving functions from one package to another. For example, many devtools functions were moved to usethis, and conflicted ensures that you always get the non-deprecated version, regardess of package attach order: head(conflict_scout(c("devtools", "usethis"))) #> 26 conflicts: #> * `use_appveyor` : [usethis] #> * `use_build_ignore` : [usethis] #> * `use_code_of_conduct`: [usethis] #> * `use_coverage` : [usethis] #> * `use_cran_badge` : [usethis] #> * `use_cran_comments` : [usethis] #> ... Finally, as mentioned above, the user can declare preferences: conflict_prefer("select", "MASS") #> [conflicted] Will prefer MASS::select over any other package conflict_scout(c("dplyr", "MASS")) #> 1 conflict: #> * `select`: [MASS] I?d love to hear what people think about the general idea, and if there are any obviously missing pieces. Thanks! Hadley ______________________________________________ R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]]
Joris Meys
2018-Aug-24 09:28 UTC
[Rd] conflicted: an alternative conflict resolution strategy
Dear Hadley, There's been some mails from you lately about packages on R-devel. I would argue that the appropriate list for that is R-pkg-devel, as I've been told myself not too long ago. People might get confused and think this is about a change to R itself, which it obviously is not. Kind regards Joris On Thu, Aug 23, 2018 at 8:32 PM Hadley Wickham <h.wickham at gmail.com> wrote:> Hi all, > > I?d love to get your feedback on the conflicted package, which provides an > alternative strategy for resolving ambiugous function names (i.e. when > multiple packages provide identically named functions). conflicted 0.1.0 > is already on CRAN, but I?m currently preparing a revision > (<https://github.com/r-lib/conflicted>), and looking for feedback. > > As you are no doubt aware, R?s default approach means that the most > recently loaded package ?wins? any conflicts. You do get a message about > conflicts on load, but I see a lot newer R users experiencing problems > caused by function conflicts. I think there are three primary reasons: > > - People don?t read messages about conflicts. Even if you are > conscientious and do read the messages, it?s hard to notice a single > new conflict caused by a package upgrade. > > - The warning and the problem may be quite far apart. If you load all > your packages at the top of the script, it may potentially be 100s > of lines before you encounter a conflict. > > - The error messages caused by conflicts are cryptic because you end > up calling a function with utterly unexpected arguments. > > For these reasons, conflicted takes an alternative approach, forcing the > user to explicitly disambiguate any conflicts: > > library(conflicted) > library(dplyr) > library(MASS) > > select > #> Error: [conflicted] `select` found in 2 packages. > #> Either pick the one you want with `::` > #> * MASS::select > #> * dplyr::select > #> Or declare a preference with `conflicted_prefer()` > #> * conflict_prefer("select", "MASS") > #> * conflict_prefer("select", "dplyr") > > conflicted works by attaching a new ?conflicted? environment just after > the global environment. This environment contains an active binding for > any ambiguous bindings. The conflicted environment also contains > bindings for `library()` and `require()` that rebuild the conflicted > environemnt suppress default reporting (but are otherwise thin wrapeprs > around the base equivalents). > > conflicted also provides a `conflict_scout()` helper which you can use > to see what?s going on: > > conflict_scout(c("dplyr", "MASS")) > #> 1 conflict: > #> * `select`: dplyr, MASS > > conflicted applies a few heuristics to minimise false positives (at the > cost of introducing a few false negatives). The overarching goal is to > ensure that code behaves identically regardless of the order in which > packages are attached. > > - A number of packages provide a function that appears to conflict > with a function in a base package, but they follow the superset > principle (i.e. they only extend the API, as explained to me by > Herv? Pages). > > conflicted assumes that packages adhere to the superset principle, > which appears to be true in most of the cases that I?ve seen. For > example, the lubridate package provides `as.difftime()` and `date()` > which extend the behaviour of base functions, and provides S4 > generics for the set operators. > > conflict_scout(c("lubridate", "base")) > #> 5 conflicts: > #> * `as.difftime`: [lubridate] > #> * `date` : [lubridate] > #> * `intersect` : [lubridate] > #> * `setdiff` : [lubridate] > #> * `union` : [lubridate] > > There are two popular functions that don?t adhere to this principle: > `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these > special cases so they correctly generate conflicts. (I sure wish I?d > know about the subset principle when creating dplyr!) > > conflict_scout(c("dplyr", "stats")) > #> 2 conflicts: > #> * `filter`: dplyr, stats > #> * `lag` : dplyr, stats > > - Deprecated functions should never win a conflict, so conflicted > checks for use of `.Deprecated()`. This rule is very useful when > moving functions from one package to another. For example, many > devtools functions were moved to usethis, and conflicted ensures > that you always get the non-deprecated version, regardess of package > attach order: > > head(conflict_scout(c("devtools", "usethis"))) > #> 26 conflicts: > #> * `use_appveyor` : [usethis] > #> * `use_build_ignore` : [usethis] > #> * `use_code_of_conduct`: [usethis] > #> * `use_coverage` : [usethis] > #> * `use_cran_badge` : [usethis] > #> * `use_cran_comments` : [usethis] > #> ... > > Finally, as mentioned above, the user can declare preferences: > > conflict_prefer("select", "MASS") > #> [conflicted] Will prefer MASS::select over any other package > conflict_scout(c("dplyr", "MASS")) > #> 1 conflict: > #> * `select`: [MASS] > > I?d love to hear what people think about the general idea, and if there > are any obviously missing pieces. > > Thanks! > > Hadley > > > -- > http://hadley.nz > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Hadley Wickham
2018-Aug-24 12:18 UTC
[Rd] conflicted: an alternative conflict resolution strategy
On Thu, Aug 23, 2018 at 3:46 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> > First, some general comments: > > This sounds like a useful package. > > I would guess it has very little impact on runtime efficiency except > when attaching a new package; have you checked that?It adds one extra element to the search path, so the impact on speed should be equivalent to loading one additional package (i.e. negligible) I've also done some benchmarking to see the impact on calls to library(). These are now a little outdated (because I've added more heuristics so I should re-do), but previously conflicted added about 100 ms overhead to a library() call when I had ~170 packages loaded (the most I could load without running out of dlls).> I am not so sure about your heuristics. Can they be disabled, so the > user is always forced to make the choice? Even when a function is > intended to adhere to the superset principle, they don't always get it > right, so a really careful user should always do explicit disambiguation.That is a good question - my intuition is always to start with less user control as it makes it easier to get the core ideas right, and it's easy to add more control later (whereas if you later take it away, people get unhappy). Maybe it's natural to have a function that does the opposite of conflict_prefer(), and declare that something that doesn't appear to be a conflict actually is? I don't think that an option to suppress the superset principle altogether will work - my sense is that it will generate too many false positives, to the point where you'll get frustrated and stop using conflicted.> And of course, if users wrote most of their long scripts as packages > instead of as long scripts, the ambiguity issue would arise far less > often, because namespaces in packages are intended to solve the same > problem as your package does.Agreed.> One more comment inline about a typo, possibly in an error message.Thanks for spotting; fixed in devel now. Hadley -- http://hadley.nz
Hadley Wickham
2018-Aug-24 12:27 UTC
[Rd] conflicted: an alternative conflict resolution strategy
On Fri, Aug 24, 2018 at 4:28 AM Joris Meys <jorismeys at gmail.com> wrote:> > Dear Hadley, > > There's been some mails from you lately about packages on R-devel. I would argue that the appropriate list for that is R-pkg-devel, as I've been told myself not too long ago. People might get confused and think this is about a change to R itself, which it obviously is not.The description for R-pkg-devel states:> This list is to get help about package development in R. The goal of the list is to provide a forum for learning about the package development process. We hope to build a community of R package developers who can help each other solve problems, and reduce some of the burden on the CRAN maintainers. If you are having problems developing a package or passing R CMD check, this is the place to ask!The description for R-devel states:> This list is intended for questions and discussion about code development in R. Questions likely to prompt discussion unintelligible to non-programmers or topics that are too technical for R-help's audience should go to R-devel, unless they are specifically about problems in R package development where the R-package-devel list is rather appropriate, see the posting guide section. The main R mailing list is R-help.My questions are not about how to develop a package, R CMD check, or how to get it on CRAN, but instead about the semantics of the packages I am working on. My opinion is supported by the fact that a number of members of the R core team have responded (both on list and off) and have not expressed concern about my choice of venue. That said, I am happy to change venues (or simply not email at all) if there is widespread concern that my emails are inappropriate. Hadley -- http://hadley.nz
Gabe Becker
2018-Aug-24 19:37 UTC
[Rd] conflicted: an alternative conflict resolution strategy
Hadley, Overall seems like a cool and potentially really idea. I do have some thoughts/feedback, which I've put in-line below On Thu, Aug 23, 2018 at 11:31 AM, Hadley Wickham <h.wickham at gmail.com> wrote:> > <snip> >> conflicted applies a few heuristics to minimise false positives (at the > cost of introducing a few false negatives). The overarching goal is to > ensure that code behaves identically regardless of the order in which > packages are attached. > > - A number of packages provide a function that appears to conflict > with a function in a base package, but they follow the superset > principle (i.e. they only extend the API, as explained to me by > Herv? Pages). > > conflicted assumes that packages adhere to the superset principle, > which appears to be true in most of the cases that I?ve seen.It seems that you may be able to strengthen this heuristic from a blanket assumption to something more narrowly targeted by looking for one or more of the following to confirm likely-superset adherence 1. matching or purely extending formals (ie all the named arguments of base::fun match including order, and there are new arguments in pkg::fun only if base::fun takes ...) 2. explicit call to base::fun in the body of pkg::fun 3. UseMethod(funname) and at least one provided S3 method calls base::fun 4. S4 generic creation using fun or base::fun as the seeding/default method body or called from at least one method> For > example, the lubridate package provides `as.difftime()` and `date()` > which extend the behaviour of base functions, and provides S4 > generics for the set operators. > > conflict_scout(c("lubridate", "base")) > #> 5 conflicts: > #> * `as.difftime`: [lubridate] > #> * `date` : [lubridate] > #> * `intersect` : [lubridate] > #> * `setdiff` : [lubridate] > #> * `union` : [lubridate] > > There are two popular functions that don?t adhere to this principle: > `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these > special cases so they correctly generate conflicts. (I sure wish I?d > know about the subset principle when creating dplyr!) > > conflict_scout(c("dplyr", "stats")) > #> 2 conflicts: > #> * `filter`: dplyr, stats > #> * `lag` : dplyr, stats > > - Deprecated functions should never win a conflict, so conflicted > checks for use of `.Deprecated()`. This rule is very useful when > moving functions from one package to another. For example, many > devtools functions were moved to usethis, and conflicted ensures > that you always get the non-deprecated version, regardess of package > attach order: >I would completely believe this rule is useful for refactoring as you describe, but that is the "same function" case. For an end-user in the "different function same symbol" case it's not at all clear to me that the deprecated function should always win. People sometimes use deprecated functions. It's not great, and eventually they'll need to fix that for any given case, but imagine if you deprecated the filter verb in dplyr (I know this will never happen, but I think it's illustrative none the less). Consider a piece of code someone wrote before this hypothetical deprecation of filter. The fact that it's now deprecated certainly doesn't mean that they secretly wanted stats::filter all along, right? Conflicted acting as if it does will lead to them getting the exact kind of error you're looking to protect them from, and with even less ability to understand why because they are already doing "The right thing" to protect themselves by using conflicted in the first place...> Finally, as mentioned above, the user can declare preferences: > > conflict_prefer("select", "MASS") > #> [conflicted] Will prefer MASS::select over any other package > conflict_scout(c("dplyr", "MASS")) > #> 1 conflict: > #> * `select`: [MASS] > >I deeply worry about people putting this kind of thing, or even just library(conflicted), in their .Rprofile and thus making their scripts *substantially* less reproducible. Is that a consequence you have thought about to this kind of functionality? Best, ~G> I?d love to hear what people think about the general idea, and if there > are any obviously missing pieces. > > Thanks! > > Hadley > > > -- > http://hadley.nz > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >Best, ~G -- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
Hadley Wickham
2018-Aug-29 21:41 UTC
[Rd] conflicted: an alternative conflict resolution strategy
>> conflicted applies a few heuristics to minimise false positives (at the >> cost of introducing a few false negatives). The overarching goal is to >> ensure that code behaves identically regardless of the order in which >> packages are attached. >> >> - A number of packages provide a function that appears to conflict >> with a function in a base package, but they follow the superset >> principle (i.e. they only extend the API, as explained to me by >> Herv? Pages). >> >> conflicted assumes that packages adhere to the superset principle, >> which appears to be true in most of the cases that I?ve seen. > > > It seems that you may be able to strengthen this heuristic from a blanket assumption to something more narrowly targeted by looking for one or more of the following to confirm likely-superset adherence > > matching or purely extending formals (ie all the named arguments of base::fun match including order, and there are new arguments in pkg::fun only if base::fun takes ...) > explicit call to base::fun in the body of pkg::fun > UseMethod(funname) and at least one provided S3 method calls base::fun > S4 generic creation using fun or base::fun as the seeding/default method body or called from at least one methodOooh nice, idea I'll definitely try it out.>> For >> example, the lubridate package provides `as.difftime()` and `date()` >> which extend the behaviour of base functions, and provides S4 >> generics for the set operators. >> >> conflict_scout(c("lubridate", "base")) >> #> 5 conflicts: >> #> * `as.difftime`: [lubridate] >> #> * `date` : [lubridate] >> #> * `intersect` : [lubridate] >> #> * `setdiff` : [lubridate] >> #> * `union` : [lubridate] >> >> There are two popular functions that don?t adhere to this principle: >> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these >> special cases so they correctly generate conflicts. (I sure wish I?d >> know about the subset principle when creating dplyr!) >> >> conflict_scout(c("dplyr", "stats")) >> #> 2 conflicts: >> #> * `filter`: dplyr, stats >> #> * `lag` : dplyr, stats >> >> - Deprecated functions should never win a conflict, so conflicted >> checks for use of `.Deprecated()`. This rule is very useful when >> moving functions from one package to another. For example, many >> devtools functions were moved to usethis, and conflicted ensures >> that you always get the non-deprecated version, regardess of package >> attach order: > > > I would completely believe this rule is useful for refactoring as you describe, but that is the "same function" case. For an end-user in the "different function same symbol" case it's not at all clear to me that the deprecated function should always win. > > People sometimes use deprecated functions. It's not great, and eventually they'll need to fix that for any given case, but imagine if you deprecated the filter verb in dplyr (I know this will never happen, but I think it's illustrative none the less). > > Consider a piece of code someone wrote before this hypothetical deprecation of filter. The fact that it's now deprecated certainly doesn't mean that they secretly wanted stats::filter all along, right? Conflicted acting as if it does will lead to them getting the exact kind of error you're looking to protect them from, and with even less ability to understand why because they are already doing "The right thing" to protect themselves by using conflicted in the first place...Ah yes, good point. I'll add some heuristic to check that the function name appears in the first argument of the .Deprecated call (assuming that the call looks something like `.Deprecated("pkg::foo")`)>> Finally, as mentioned above, the user can declare preferences: >> >> conflict_prefer("select", "MASS") >> #> [conflicted] Will prefer MASS::select over any other package >> conflict_scout(c("dplyr", "MASS")) >> #> 1 conflict: >> #> * `select`: [MASS] >> > > I deeply worry about people putting this kind of thing, or even just library(conflicted), in their .Rprofile and thus making their scripts substantially less reproducible. Is that a consequence you have thought about to this kind of functionality?Yes, and I've already recommended against it in two places :) I'm not sure if there's any more I can do - people already put (e.g.) `library(ggplot2)` in their .Rprofile, which is just as bad from a reproducibility standpoint. Thanks for the thoughtful feedback! Hadley -- http://hadley.nz
Apparently Analagous Threads
- conflicted: an alternative conflict resolution strategy
- conflicted: an alternative conflict resolution strategy
- conflicted: an alternative conflict resolution strategy
- conflicted: an alternative conflict resolution strategy
- Feature Request: User Prompt + Message First Execution when "Managing Search Path Conflicts"