This arose in testing [.terms and has me confused. data(esoph)?? # use a standard data set t0x <- terms(model.frame( ~ tobgp, data=esoph)) t1 <-? terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) t1x <- (delete.response(t1))[-1] > all.equal(t0x, t1x) [1] TRUE # the above is wrong, because they actually are not the same > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) [1] "Names: 1 string mismatch" [2] "Lengths (1, 2) differ (string compare on first 1)" > sessionInfo() R Under development (unstable) (2019-04-05 r76323) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS:?? /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: ?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C ?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=C ?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8 ?[7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C ?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods base loaded via a namespace (and not attached): [1] compiler_3.7.0 tools_3.7.0 [[alternative HTML version deleted]]
On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:> This arose in testing [.terms and has me confused. > > data(esoph)?? # use a standard data set > > t0x <- terms(model.frame( ~ tobgp, data=esoph)) > t1 <-? terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) > t1x <- (delete.response(t1))[-1] > > > all.equal(t0x, t1x) > [1] TRUE > > # the above is wrong, because they actually are not the same > > > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) > [1] "Names: 1 string mismatch" > [2] "Lengths (1, 2) differ (string compare on first 1)"As documented, all.equal() is generic, with methods for different classes. The classes of both t0x and t1x are c("terms","formula") with no all.equal.terms method, so all.equal.formula is called. That method isn't specifically documented, but you can see its definition as function (target, current, ...) { if (length(target) != length(current)) return(paste0("target, current differ in having response: ", length(target) == 3L, ", ", length(current) == 3L)) if (!identical(deparse(target), deparse(current))) "formulas differ in contents" else TRUE } So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no attributes shown, even though "showAttributes" is set by default. I haven't traced through the C code to see where things are going wrong. Duncan Murdoch> > > sessionInfo() > R Under development (unstable) (2019-04-05 r76323) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 18.04.2 LTS > > Matrix products: default > BLAS:?? /usr/local/src/R-devel/lib/libRblas.so > LAPACK: /usr/local/src/R-devel/lib/libRlapack.so > > locale: > ?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C > ?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=C > ?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8 > ?[7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C > ?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats???? graphics? grDevices utils???? datasets? methods base > > loaded via a namespace (and not attached): > [1] compiler_3.7.0 tools_3.7.0 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Duncan, ? I should have included it in my original note, but ?? ? all.equal(unclass(t0x), unclass(t1x)) returns TRUE as well.? I had tried that as well. ? But a further look at all.equal.default shows the following line right near the top: ??? if (is.language(target) || is.function(target)) ??????? return(all.equal.language(target, current, ...)) and that path explicitly ignores attributes. I'll change my original original title to "all.equal was not a good tool for testing certain code issues". Thanks for the pointer, Terry On 4/5/19 9:00 AM, Duncan Murdoch wrote:> On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: >> This arose in testing [.terms and has me confused. >> >> data(esoph)?? # use a standard data set >> >> t0x <- terms(model.frame( ~ tobgp, data=esoph)) >> t1 <-? terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) >> t1x <- (delete.response(t1))[-1] >> >> ? > all.equal(t0x, t1x) >> [1] TRUE >> >> # the above is wrong, because they actually are not the same >> >> ? > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) >> [1] "Names: 1 string mismatch" >> [2] "Lengths (1, 2) differ (string compare on first 1)" > > As documented, all.equal() is generic, with methods for different classes.? The classes > of both t0x and t1x are > > ?c("terms","formula") > > with no all.equal.terms method, so all.equal.formula is called. That method isn't > specifically documented, but you can see its definition as > > function (target, current, ...) > { > ??? if (length(target) != length(current)) > ??????? return(paste0("target, current differ in having response: ", > ??????????? length(target) == 3L, ", ", length(current) == 3L)) > ??? if (!identical(deparse(target), deparse(current))) > ??????? "formulas differ in contents" > ??? else TRUE > } > > So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no > attributes shown, even though "showAttributes" is set by default.?? I haven't traced > through the C code to see where things are going wrong. > > Duncan Murdoch > >> >> ? > sessionInfo() >> R Under development (unstable) (2019-04-05 r76323) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 18.04.2 LTS >> >> Matrix products: default >> BLAS:?? /usr/local/src/R-devel/lib/libRblas.so >> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so >> >> locale: >> ? ?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C >> ? ?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=C >> ? ?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8 >> ? ?[7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C >> ? ?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats???? graphics? grDevices utils???? datasets? methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.7.0 tools_3.7.0 >> >> >> ????[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >[[alternative HTML version deleted]]
On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:> Duncan, > ? I should have included it in my original note, but > > ?? ? all.equal(unclass(t0x), unclass(t1x)) > > returns TRUE as well.? I had tried that as well. ? But a further look at > all.equal.default shows the following line right near the top: > ??? if (is.language(target) || is.function(target)) > ??????? return(all.equal.language(target, current, ...)) > > and that path explicitly ignores attributes.Which R version are you using? I see deparse(target) and deparse(current) in all.equal.language(), and those should not be ignoring attributes according to the documentation. Duncan Murdoch> > I'll change my original original title to "all.equal was not a good tool > for testing certain code issues". > > Thanks for the pointer, > > Terry > > > > On 4/5/19 9:00 AM, Duncan Murdoch wrote: >> On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: >>> This arose in testing [.terms and has me confused. >>> >>> data(esoph)?? # use a standard data set >>> >>> t0x <- terms(model.frame( ~ tobgp, data=esoph)) >>> t1 <-? terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) >>> t1x <- (delete.response(t1))[-1] >>> >>> ? > all.equal(t0x, t1x) >>> [1] TRUE >>> >>> # the above is wrong, because they actually are not the same >>> >>> ? > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) >>> [1] "Names: 1 string mismatch" >>> [2] "Lengths (1, 2) differ (string compare on first 1)" >> >> As documented, all.equal() is generic, with methods for different >> classes.? The classes of both t0x and t1x are >> >> ?c("terms","formula") >> >> with no all.equal.terms method, so all.equal.formula is called. That >> method isn't specifically documented, but you can see its definition as >> >> function (target, current, ...) >> { >> ??? if (length(target) != length(current)) >> ??????? return(paste0("target, current differ in having response: ", >> ??????????? length(target) == 3L, ", ", length(current) == 3L)) >> ??? if (!identical(deparse(target), deparse(current))) >> ??????? "formulas differ in contents" >> ??? else TRUE >> } >> >> So the issue is that deparse(t0x) and deparse(t1x) give the same >> strings with no attributes shown, even though "showAttributes" is set >> by default.?? I haven't traced through the C code to see where things >> are going wrong. >> >> Duncan Murdoch >> >>> >>> ? > sessionInfo() >>> R Under development (unstable) (2019-04-05 r76323) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> Running under: Ubuntu 18.04.2 LTS >>> >>> Matrix products: default >>> BLAS:?? /usr/local/src/R-devel/lib/libRblas.so >>> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so >>> >>> locale: >>> ? ?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C >>> ? ?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=C >>> ? ?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8 >>> ? ?[7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C >>> ? ?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats???? graphics? grDevices utils???? datasets? methods base >>> >>> loaded via a namespace (and not attached): >>> [1] compiler_3.7.0 tools_3.7.0 >>> >>> >>> ????[[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >
On 4/5/19 9:39 AM, Duncan Murdoch wrote:> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: >> Duncan, >> ?? I should have included it in my original note, but >> >> ??? ? all.equal(unclass(t0x), unclass(t1x)) >> >> returns TRUE as well.? I had tried that as well. ? But a further look at >> all.equal.default shows the following line right near the top: >> ???? if (is.language(target) || is.function(target)) >> ???????? return(all.equal.language(target, current, ...)) >> >> and that path explicitly ignores attributes. > > Which R version are you using?? I see deparse(target) and deparse(current) in > all.equal.language(), and those should not be ignoring attributes according to the > documentation. >I'm using today's version of R-devel on Ubuntu.? (svn up this AM) But I agree, both target and current appear.> Duncan Murdoch > >> >> I'll change my original original title to "all.equal was not a good tool for testing >> certain code issues". >> >> Thanks for the pointer, >> >> Terry >> >> >> >> On 4/5/19 9:00 AM, Duncan Murdoch wrote: >>> On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: >>>> This arose in testing [.terms and has me confused. >>>> >>>> data(esoph)?? # use a standard data set >>>> >>>> t0x <- terms(model.frame( ~ tobgp, data=esoph)) >>>> t1 <-? terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) >>>> t1x <- (delete.response(t1))[-1] >>>> >>>> ? > all.equal(t0x, t1x) >>>> [1] TRUE >>>> >>>> # the above is wrong, because they actually are not the same >>>> >>>> ? > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) >>>> [1] "Names: 1 string mismatch" >>>> [2] "Lengths (1, 2) differ (string compare on first 1)" >>> >>> As documented, all.equal() is generic, with methods for different classes.? The >>> classes of both t0x and t1x are >>> >>> ?c("terms","formula") >>> >>> with no all.equal.terms method, so all.equal.formula is called. That method isn't >>> specifically documented, but you can see its definition as >>> >>> function (target, current, ...) >>> { >>> ??? if (length(target) != length(current)) >>> ??????? return(paste0("target, current differ in having response: ", >>> ??????????? length(target) == 3L, ", ", length(current) == 3L)) >>> ??? if (!identical(deparse(target), deparse(current))) >>> ??????? "formulas differ in contents" >>> ??? else TRUE >>> } >>> >>> So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no >>> attributes shown, even though "showAttributes" is set by default.?? I haven't traced >>> through the C code to see where things are going wrong. >>> >>> Duncan Murdoch >>> >>>> >>>> ? > sessionInfo() >>>> R Under development (unstable) (2019-04-05 r76323) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> Running under: Ubuntu 18.04.2 LTS >>>> >>>> Matrix products: default >>>> BLAS:?? /usr/local/src/R-devel/lib/libRblas.so >>>> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so >>>> >>>> locale: >>>> ? ?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C >>>> ? ?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=C >>>> ? ?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8 >>>> ? ?[7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C >>>> ? ?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats???? graphics? grDevices utils???? datasets methods base >>>> >>>> loaded via a namespace (and not attached): >>>> [1] compiler_3.7.0 tools_3.7.0 >>>> >>>> >>>> ????[[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >> >
On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote:> > > On 4/5/19 9:39 AM, Duncan Murdoch wrote: >> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: >>> Duncan, >>> ?? I should have included it in my original note, but >>> >>> ??? ? all.equal(unclass(t0x), unclass(t1x)) >>> >>> returns TRUE as well.? I had tried that as well. ? But a further look at >>> all.equal.default shows the following line right near the top: >>> ???? if (is.language(target) || is.function(target)) >>> ???????? return(all.equal.language(target, current, ...)) >>> >>> and that path explicitly ignores attributes. >> >> Which R version are you using?? I see deparse(target) and deparse(current) in >> all.equal.language(), and those should not be ignoring attributes according to the >> documentation. >> > I'm using today's version of R-devel on Ubuntu.? (svn up this AM) > But I agree, both target and current appear.That's not what I said. I said that the attributes should not be ignored in that function. I don't see anything in the R-devel version of it that ignores attributes: > all.equal.language function (target, current, ...) { mt <- mode(target) mc <- mode(current) if (mt == "expression" && mc == "expression") return(all.equal.list(target, current, ...)) ttxt <- paste(deparse(target), collapse = "\n") ctxt <- paste(deparse(current), collapse = "\n") msg <- c(if (mt != mc) paste0("Modes of target, current: ", mt, ", ", mc), if (ttxt != ctxt) { if (pmatch(ttxt, ctxt, 0L)) "target is a subset of current" else if (pmatch(ctxt, ttxt, 0L)) "current is a subset of target" else "target, current do not match when deparsed" }) if (is.null(msg)) TRUE else msg } <bytecode: 0x7fd9e792f1e0> <environment: namespace:base> Duncan Murdoch> >> Duncan Murdoch >> >>> >>> I'll change my original original title to "all.equal was not a good tool for testing >>> certain code issues". >>> >>> Thanks for the pointer, >>> >>> Terry >>> >>> >>> >>> On 4/5/19 9:00 AM, Duncan Murdoch wrote: >>>> On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: >>>>> This arose in testing [.terms and has me confused. >>>>> >>>>> data(esoph)?? # use a standard data set >>>>> >>>>> t0x <- terms(model.frame( ~ tobgp, data=esoph)) >>>>> t1 <-? terms(model.frame(ncases ~ agegp + tobgp, data=esoph)) >>>>> t1x <- (delete.response(t1))[-1] >>>>> >>>>> ? > all.equal(t0x, t1x) >>>>> [1] TRUE >>>>> >>>>> # the above is wrong, because they actually are not the same >>>>> >>>>> ? > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses')) >>>>> [1] "Names: 1 string mismatch" >>>>> [2] "Lengths (1, 2) differ (string compare on first 1)" >>>> >>>> As documented, all.equal() is generic, with methods for different classes.? The >>>> classes of both t0x and t1x are >>>> >>>> ?c("terms","formula") >>>> >>>> with no all.equal.terms method, so all.equal.formula is called. That method isn't >>>> specifically documented, but you can see its definition as >>>> >>>> function (target, current, ...) >>>> { >>>> ??? if (length(target) != length(current)) >>>> ??????? return(paste0("target, current differ in having response: ", >>>> ??????????? length(target) == 3L, ", ", length(current) == 3L)) >>>> ??? if (!identical(deparse(target), deparse(current))) >>>> ??????? "formulas differ in contents" >>>> ??? else TRUE >>>> } >>>> >>>> So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no >>>> attributes shown, even though "showAttributes" is set by default.?? I haven't traced >>>> through the C code to see where things are going wrong. >>>> >>>> Duncan Murdoch >>>> >>>>> >>>>> ? > sessionInfo() >>>>> R Under development (unstable) (2019-04-05 r76323) >>>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>>> Running under: Ubuntu 18.04.2 LTS >>>>> >>>>> Matrix products: default >>>>> BLAS:?? /usr/local/src/R-devel/lib/libRblas.so >>>>> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so >>>>> >>>>> locale: >>>>> ? ?[1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C >>>>> ? ?[3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=C >>>>> ? ?[5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8 >>>>> ? ?[7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C >>>>> ? ?[9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats???? graphics? grDevices utils???? datasets methods base >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] compiler_3.7.0 tools_3.7.0 >>>>> >>>>> >>>>> ????[[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-devel at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> >>>> >>> >> >
>>>>> Duncan Murdoch >>>>> on Fri, 5 Apr 2019 11:12:48 -0400 writes:> On 05/04/2019 10:46 a.m., Therneau, Terry M., Ph.D. wrote: >> >> >> On 4/5/19 9:39 AM, Duncan Murdoch wrote: >>> On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote: >>>> Duncan, >>>> ?? I should have included it in my original note, but >>>> >>>> ??? ? all.equal(unclass(t0x), unclass(t1x)) >>>> >>>> returns TRUE as well.? I had tried that as well. ? But a further look at >>>> all.equal.default shows the following line right near the top: >>>> ???? if (is.language(target) || is.function(target)) >>>> ???????? return(all.equal.language(target, current, ...)) >>>> >>>> and that path explicitly ignores attributes. >>> >>> Which R version are you using?? I see deparse(target) and deparse(current) in >>> all.equal.language(), and those should not be ignoring attributes according to the >>> documentation. But the problem is that indeed "of course" all.equal.formula() and not all.equal.language() is called for the terms since as you yourself remarked, their class is c("terms", "formula"), and so what Terry reported is indeed correct *and* a bug and in "all versions" of R (I did not look far back, but these things haven't changed much). The cleanest would probably be to define an all.equal.terms() method, as I think there may be more code relying on the behavior of all.equal.formula() to only look at the formulas themselves and not their attributes... but you (Duncan) and others may have a different opinion. Martin Maechler ETH Zurich and R Core Team