Terry Therneau
2008-Nov-11 15:28 UTC
[R] R design (was "Variable passed to function not used in function in select)
I've read the back and forth this morning, and I have to side with Vince. 1. Functions that re-interpret their arguments are very dangerous. The original question involved a well formed call to a function, which returned the wrong answer. Bug, design flaw, whatever -- it's a mistake and the best choice would be to fix it. I only consider such behavior in 2 cases: a. when the function is almost never, ever, called from anything but the top level. help() is the only example I can think of. b. to create a label from an argument, as in plot, but the argument itself is left alone to work as it should. One possible fix for subset: first treat the argument formally, and only if that simple interpretation fails try the more 'clever' interpretations. Whether this is doable or not I can't say. 2. The documentation of subset is not in any way clear. I would never have been able to diagnose or work around this bug. The issues are very subtle. I quite often see "it's in the manual so we bear no blame" as an argument on this list. We all need to remember that our view of what we are particularly close to is a distorted one -- I for instance think that everything about the survival package is crystal clear --- and be particularly open to concerns that something is opaque or subtle. 3. I've heavily used perhaps 20 computing languages in my life. I found S to be a refreshing revalation (referring to S of the 1988 Blue manual) precisely because it was completely functional. Once I got used to it, this feature made it so much more useful, extensible, understandable than other things I'd used. R is becoming less and less a functional language (hidden functions and dependencies with environments for one), I quite often cannot figure out either exactly what a function calls or how to get it to stop doing it. I am not sure we have gained with each choice of "convenience" or sophistication over functional purity. I want "scan(file=myfile)" to continue to return "variable myfile not found" when I forget the quotes. I am stumped by the R results I get too often, and I'm not a novice. That said, good design is hard. I spend a lot of time on that aspect in the survival package and there are still bits where the 'right' way is only clear after several years experience. I do occassionaly make non-backwards compatable changes. The R core team has done an amazing job on the whole. And let's not shoot the bearers of bad news. Terry T
Duncan Murdoch
2008-Nov-11 16:12 UTC
[R] R design (was "Variable passed to function not used in function in select)
On 11/11/2008 10:28 AM, Terry Therneau wrote:> I've read the back and forth this morning, and I have to side with Vince. > > 1. Functions that re-interpret their arguments are very dangerous. The > original question involved a well formed call to a function, which returned the > wrong answer. Bug, design flaw, whatever -- it's a mistake and the best choice > would be to fix it. > I only consider such behavior in 2 cases: > a. when the function is almost never, ever, called from anything but the > top level. help() is the only example I can think of. > b. to create a label from an argument, as in plot, but the argument > itself is left alone to work as it should.There's another major use for this: model formulas. I like to be able to write lm(y ~ ., data=df), and I'd really hate to have to evaluate all the terms in a model formula explicitly.> One possible fix for subset: first treat the argument formally, and only if that > simple interpretation fails try the more 'clever' interpretations. Whether this > is doable or not I can't say.> 2. The documentation of subset is not in any way clear. I would never have > been able to diagnose or work around this bug. The issues are very subtle. > I quite often see "it's in the manual so we bear no blame" as an argument on > this list. We all need to remember that our view of what we are particularly > close to is a distorted one -- I for instance think that everything about the > survival package is crystal clear --- and be particularly open to concerns that > something is opaque or subtle. > > 3. I've heavily used perhaps 20 computing languages in my life. I found S to > be a refreshing revalation (referring to S of the 1988 Blue manual) precisely > because it was completely functional. Once I got used to it, this feature made > it so much more useful, extensible, understandable than other things I'd used.I don't know your definition of "completely functional", but I don't think S and R have ever been. It has always been possible to refer to non-local variables within a function (and their meaning is different between S and R, but I think R tends to be a bit more functional in this), to make super-assignments, to do lots of things that have side effects.> R is becoming less and less a functional language (hidden functions and > dependencies with environments for one), I quite often cannot figure out either > exactly what a function calls or how to get it to stop doing it.I don't know what you mean here. Are you talking about recent changes? (Which ones?) Or are you talking about older things, like namespaces? Or closures, which have been in R from the beginning (and which are part of why I'd call it more functional than S)? I am not sure> we have gained with each choice of "convenience" or sophistication over > functional purity. I want "scan(file=myfile)" to continue to return "variable > myfile not found" when I forget the quotes.R allows a lot of flexibility in how arguments are handled, and there's been some experimentation with different variations. Remember that R is partly a laboratory in which people are trying to invent new ways of doing statistical computing, and also remember that R (including its contributed packages) has hundreds of authors, not all of whom agree on the best way to do things. The benefit of this is that more stuff gets done: I'm not forced to adopt your ideas of The Right Way to Do Things, so I can get down to coding in the way I like. The disadvantage is that things can be inconsistent, so people are forced to read the documentation, and the documentation is always imperfect.> > I am stumped by the R results I get too often, and I'm not a novice. That > said, good design is hard. I spend a lot of time on that aspect in the survival > package and there are still bits where the 'right' way is only clear after > several years experience. I do occassionaly make non-backwards compatable > changes. The R core team has done an amazing job on the whole.If I'm not mistaken, you are still an S user as well as an R user, and this is a bit of a disadvantage: at a fundamental level, they are different languages, though they look superficially similar. I haven't used S in quite a few years, so I expect I'd be stumped by the results I got there in a lot of cases. I think that in the main R is a simpler, easier language to understand, but there are certainly bits and pieces of it where it is not easy.> And let's not shoot the bearers of bad news.I think we can discuss what's good and what's bad about the language without bringing out the guns or insults. Duncan Murdoch
Rolf Turner
2008-Nov-11 20:18 UTC
[R] R design (was "Variable passed to function not used in function in select)
On 12/11/2008, at 4:28 AM, Terry Therneau wrote:> I've read the back and forth this morning, and I have to side with > Vince.Well, I've read back and forth this morning and I have to side with Berwin Turlach --- whose postings were, I thought, extremely well expressed. I'm getting heartily sick of Wacek Kusnierczyk's truculent whinging. (One might also add the adjectives pedantic, dogmatic, and arrogant.) His ranting, in another thread, about the distinction between `equal' and `identical' would have been risible were it not so boring. The whinging boils down to ``R is not perfect, so trash the whole thing and start again''. This is rubbish. *Mostly* R is very easy to use and does exactly what the user would expect. Sometimes it does not do what the user would (naively) expect; sometimes there are good reasons for this, sometimes not. Sometimes, though very rarely, it might be possible to change the language so as to meet naive expectations. Mostly it would be better for the user to become less naive. By and large the difficulties arise only in obscure contexts, when the user is trying to do something sophisticated. If a naive user tries to do something sophisticated, then he or she should be on the lookout for problems and should check results carefully. In elementary usage there are hardly ever any problems.> 1. Functions that re-interpret their arguments are very > dangerous. The > original question involved a well formed call to a function, which > returned the > wrong answer.Fair enough comment ....> Bug, design flaw, whatever -- it's a mistake and the best choice > would be to fix it..... but it might not be fixable in practical terms, i.e. fixing it might have worse consequences than leaving it alone. <snip>> 2. The documentation of subset is not in any way clear. I would > never have > been able to diagnose or work around this bug. The issues are very > subtle.Well, that's true for some value of ``clear''.> I quite often see "it's in the manual so we bear no blame" as an > argument on > this list. We all need to remember that our view of what we are > particularly > close to is a distorted one -- I for instance think that everything > about the > survival package is crystal clear --- and be particularly open to > concerns that > something is opaque or subtle.I agree with this, for sure. But I think that the best approach would be to include a warning in the documentation of subset, to the effect: ``There are subtle and difficult issues involved in the use of this function. If you don't understand them, don't mess with it.'' Others have pointed out in this thread that one does not *have* to use subset() --- anything it can do can be done in other ways. Like those who pointed this out, I myself have never used subset(), never felt I had to, and never felt any the worse for not having done so. <snip>> And let's not shoot the bearers of bad news.Bearing bad news is not the same thing as bad-tempered and just plain rude criticism. cheers, Rolf ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
Terry Therneau
2008-Nov-11 20:52 UTC
[R] R design (was "Variable passed to function not used in function in select)
Rolf, Fair comments, mostly.> By and large the difficulties arise only in obscure contexts, when > the user is trying to do something sophisticated.But in the case at hand, the user was doing something simple, and got caught when the function tried to be overly clever. That's rather unfair to him. The ongoing chain of examples and counterarguments got quite obscure though. > *Mostly* R is very easy to use and does exactly what the user would > expect. Less true than you think. We've gotton so close and intimate that we forget how complicated the package actually is. We lose track of how much we know. English spelling poses few conundrums to a literature major, the exceptions and special cases go below the level of conscious thought. (No I'm not hinting that R is an inconsistent as English - no computer system is that bad. Though come to think of it CMS came close). I have had this argument most often with SAS wizards who think they should use it in a beginners course, because its "simple to use". But nothing whose printed documentation takes up 6 feet of shelf is particularly simple. R is not a small system either. Terry