You are right of course, Peter, but I can see where some will get confused.?? In a formula some symbols and functions are special operators, and others are simple functions.?? That is the reason one needs I(events/time) to put a rate in as a variable.??? Someone who types 'offset' at the command line will see that there actually IS a function behind the scenes. Does anyone see a downside to Bill Dunlap's suggestion where the first step of my formula processing would be to "clean off" any survival:: modifiers???? That is, something that will break? After all, the code already has a lot of? "if (....) "? lines for other common user errors.?? I could view it as just saving me the time to deal with the 'we found an error' emails.?? I would output the corrected version as the "call" component. Terry On 8/27/24 03:38, peter dalgaard wrote:> In my view, that's just plain wrong, because strata() is not a function but a special operator in a model formula. Wouldn't it also blow up on stats::offset()? > > Oh, yes it would: > >> lm(y~x+offset(z)) > Call: > lm(formula = y ~ x + offset(z)) > > Coefficients: > (Intercept) x > 0.7350 0.0719 > >> lm(y~x+stats::offset(z)) > Call: > lm(formula = y ~ x + stats::offset(z)) > > Coefficients: > (Intercept) x stats::offset(z) > 0.6457 0.1078 0.8521 > > > Or, to be facetious: > >> lm(y~base::"+"(x,z)) > Call: > lm(formula = y ~ base::"+"(x, z)) > > Coefficients: > (Intercept) base::"+"(x, z) > 0.4516 0.4383 > > > > -pd > >> On 26 Aug 2024, at 16:42 , Therneau, Terry M., Ph.D. via R-devel<r-devel at r-project.org> wrote: >> >> The survival package makes significant use of the "specials" argument of terms(), before >> calling model.frame; it is part of nearly every modeling function. The reason is that >> strata argments simply have to be handled differently than other things on the right hand >> side. Likewise for tt() and cluster(), though those are much less frequent. >> >> I now get "bug reports" from the growing segment that believes one should put >> packagename:: in front of every single instance. For instance >> fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno + >> survival::strata(inst), data= survival::lung) >> >> This fails to give the correct answer because it fools terms(formula, specials>> "strata"). I've stood firm in my response of "that's your bug, not mine", but I begin >> to believe I am swimming uphill. One person responded that it was company policy to >> qualify everything. >> >> I don't see an easy way to fix survival, and even if I did it would be a tremendous amout >> of work. What are other's thoughts? >> >> Terry >> >> >> >> -- >> >> Terry M Therneau, PhD >> Department of Quantitative Health Sciences >> Mayo Clinic >> therneau at mayo.edu >> >> "TERR-ree THUR-noh" >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C02%7Ctherneau%40mayo.edu%7C7659a5f0f0d34746966a08dcc6739fed%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638603447151664511%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UAkeksswfFdLwOdzQIOXUPC2Ey255oW%2FX41kptNZNcU%3D&reserved=0[[alternative HTML version deleted]]
On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:> You are right of course, Peter, but I can see where some will get confused.?? In a formula > some symbols and functions are special operators, and others are simple functions.?? That > is the reason one needs I(events/time) to put a rate in as a variable.??? Someone who > types 'offset' at the command line will see that there actually IS a function behind the > scenes. > > Does anyone see a downside to Bill Dunlap's suggestion where the first step of my formula > processing would be to "clean off" any survival:: modifiers???? That is, something that > will break? After all, the code already has a lot of? "if (....) "? lines for other common > user errors.?? I could view it as just saving me the time to deal with the 'we found an > error' emails.?? I would output the corrected version as the "call" component.I don't know if you have any data vectors that someone might use in a fit, but conceivably survdiff( Surv(time, status) ~ survival::datavector + strata(inst), data=lung) would mean something different than survdiff( Surv(time, status) ~ datavector + strata(inst), data=lung) if a user had a vector named datavector. Duncan Murdoch> > Terry > > On 8/27/24 03:38, peter dalgaard wrote: >> In my view, that's just plain wrong, because strata() is not a function but a special operator in a model formula. Wouldn't it also blow up on stats::offset()? >> >> Oh, yes it would: >> >>> lm(y~x+offset(z)) >> Call: >> lm(formula = y ~ x + offset(z)) >> >> Coefficients: >> (Intercept) x >> 0.7350 0.0719 >> >>> lm(y~x+stats::offset(z)) >> Call: >> lm(formula = y ~ x + stats::offset(z)) >> >> Coefficients: >> (Intercept) x stats::offset(z) >> 0.6457 0.1078 0.8521 >> >> >> Or, to be facetious: >> >>> lm(y~base::"+"(x,z)) >> Call: >> lm(formula = y ~ base::"+"(x, z)) >> >> Coefficients: >> (Intercept) base::"+"(x, z) >> 0.4516 0.4383 >> >> >> >> -pd >> >>> On 26 Aug 2024, at 16:42 , Therneau, Terry M., Ph.D. via R-devel<r-devel at r-project.org> wrote: >>> >>> The survival package makes significant use of the "specials" argument of terms(), before >>> calling model.frame; it is part of nearly every modeling function. The reason is that >>> strata argments simply have to be handled differently than other things on the right hand >>> side. Likewise for tt() and cluster(), though those are much less frequent. >>> >>> I now get "bug reports" from the growing segment that believes one should put >>> packagename:: in front of every single instance. For instance >>> fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno + >>> survival::strata(inst), data= survival::lung) >>> >>> This fails to give the correct answer because it fools terms(formula, specials>>> "strata"). I've stood firm in my response of "that's your bug, not mine", but I begin >>> to believe I am swimming uphill. One person responded that it was company policy to >>> qualify everything. >>> >>> I don't see an easy way to fix survival, and even if I did it would be a tremendous amout >>> of work. What are other's thoughts? >>> >>> Terry >>> >>> >>> >>> -- >>> >>> Terry M Therneau, PhD >>> Department of Quantitative Health Sciences >>> Mayo Clinic >>> therneau at mayo.edu >>> >>> "TERR-ree THUR-noh" >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C02%7Ctherneau%40mayo.edu%7C7659a5f0f0d34746966a08dcc6739fed%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638603447151664511%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UAkeksswfFdLwOdzQIOXUPC2Ey255oW%2FX41kptNZNcU%3D&reserved=0 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
I don't see a big downside, but I will say that there's always a bit of a tradeoff between "train the users to do it right" (by writing clear documentation and informative error messages) and "make things easy for the user" (by making the code more complicated to handle things for them automatically). For example, part of me wishes that (1) there were only one way to provide a response variable for a binomial variable with N>1 (preferably by specifying proportions and a weights argument) and (2) grouping variables in lme4/nlme/et al always had to be specified as factors (rather than automatically being coerced to factors). Making those decisions would avoid so much code complexity ... (and eliminate one class of errors, i.e. people including a continuous covariate as a random-effect grouping variable because they think of 'random effect' and 'nuisance variable' as synonyms ...) But taking the "train the users to do it right" path does also involve more discussion with users ("if your software knows what I should be doing why can't it just do it for me?") cheers Ben Bolker On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:> You are right of course, Peter, but I can see where some will get confused.?? In a formula > some symbols and functions are special operators, and others are simple functions.?? That > is the reason one needs I(events/time) to put a rate in as a variable.??? Someone who > types 'offset' at the command line will see that there actually IS a function behind the > scenes. > > Does anyone see a downside to Bill Dunlap's suggestion where the first step of my formula > processing would be to "clean off" any survival:: modifiers???? That is, something that > will break? After all, the code already has a lot of? "if (....) "? lines for other common > user errors.?? I could view it as just saving me the time to deal with the 'we found an > error' emails.?? I would output the corrected version as the "call" component. > > Terry > > On 8/27/24 03:38, peter dalgaard wrote: >> In my view, that's just plain wrong, because strata() is not a function but a special operator in a model formula. Wouldn't it also blow up on stats::offset()? >> >> Oh, yes it would: >> >>> lm(y~x+offset(z)) >> Call: >> lm(formula = y ~ x + offset(z)) >> >> Coefficients: >> (Intercept) x >> 0.7350 0.0719 >> >>> lm(y~x+stats::offset(z)) >> Call: >> lm(formula = y ~ x + stats::offset(z)) >> >> Coefficients: >> (Intercept) x stats::offset(z) >> 0.6457 0.1078 0.8521 >> >> >> Or, to be facetious: >> >>> lm(y~base::"+"(x,z)) >> Call: >> lm(formula = y ~ base::"+"(x, z)) >> >> Coefficients: >> (Intercept) base::"+"(x, z) >> 0.4516 0.4383 >> >> >> >> -pd >> >>> On 26 Aug 2024, at 16:42 , Therneau, Terry M., Ph.D. via R-devel<r-devel at r-project.org> wrote: >>> >>> The survival package makes significant use of the "specials" argument of terms(), before >>> calling model.frame; it is part of nearly every modeling function. The reason is that >>> strata argments simply have to be handled differently than other things on the right hand >>> side. Likewise for tt() and cluster(), though those are much less frequent. >>> >>> I now get "bug reports" from the growing segment that believes one should put >>> packagename:: in front of every single instance. For instance >>> fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno + >>> survival::strata(inst), data= survival::lung) >>> >>> This fails to give the correct answer because it fools terms(formula, specials>>> "strata"). I've stood firm in my response of "that's your bug, not mine", but I begin >>> to believe I am swimming uphill. One person responded that it was company policy to >>> qualify everything. >>> >>> I don't see an easy way to fix survival, and even if I did it would be a tremendous amout >>> of work. What are other's thoughts? >>> >>> Terry >>> >>> >>> >>> -- >>> >>> Terry M Therneau, PhD >>> Department of Quantitative Health Sciences >>> Mayo Clinic >>> therneau at mayo.edu >>> >>> "TERR-ree THUR-noh" >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C02%7Ctherneau%40mayo.edu%7C7659a5f0f0d34746966a08dcc6739fed%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638603447151664511%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UAkeksswfFdLwOdzQIOXUPC2Ey255oW%2FX41kptNZNcU%3D&reserved=0 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering > E-mail is sent at my convenience; I don't expect replies outside of working hours.