Gabriel Becker
2022-Apr-11 23:28 UTC
[Rd] I've written a big review of R. Can I get some feedback?
Hi Reece, I'm not really sure what kind of review you're looking for (and I'm not certain this is the right place for it, but hopefully its ok enough). Also, to channel Pascal, forgive me, I would have written a shorter response but I didn't have the time. Firstly, it is fairly ... partisan, I suppose, for lack of a better term. More importantly from a usefulness perspective you often notably don't present the knowledge you gained at the end of the various frustrations you had. As one example that jumped out to me, you say "One day, you?ll be tripped up by R?s hierarchy of how it likes to simplify mixed types outside of lists. " but you don't present your readers with the (well defined) coercion hierarchy so that they would, you know, not be tripped up by it as badly. This is probably my largest issue with your document overall. It can give the reader talking points about how R is bad (not all of which are even incorrect, per se, as many expert R users will be happy to tell you), but it won't really help people become better R users in many cases. Your article also, I suspect, fails to understand what a typical "Novice R Users" is and what they want to do. By and large they want to analyze data and create plots. They are analysts, NOT programmers (writing analysis scripts is not programming in the typical sense, and I'm not the only one who thinks that). So the point you make early on in your explanation why you do not strongly recommend R For Data Science (which I had no part in writing and have not read myself) that it "It deliberately avoids the fundamentals of programming ? e.g. making functions, loops, and if statements ? until the second half. I therefore suspect that any non-novice would be better off finding an introduction to the relevant packages with their favourite search engine." misses the point of R itself for what I'd claim is the "typical novice R user". Having read through your review, I'm confused why you were using R to do some of the things I'm inferring that you felt like you needed it to do. If you picked up R wanting an applicable equally to all programming problem domains general purpose language, you're going to have a bad time. Mostly because that is not what R is. Finally, a (very) incomplete response to a few of the more specific points raised in your review: *Lists:* The linked stack overflow question ( https://stackoverflow.com/questions/2050790/how-to-correctly-use-lists-in-r) shows a pretty fundamental misunderstanding of what lists and atomic vectors are/do in R. There is nothing wrong with this, asking questions we don't know the answer to is how we learn, but I'm not sure the question serves as well as a primer for R lists as you claim. The top answer at time of writing discusses the C level structure of R objects, which can, I suppose, inform your knowledge on how lists at the R level work, but is NOT necessary nor the most pedagogically useful way to present it. *Strings:* Strings are not arrays of characters idiomatically at the R level, they are *atomic observed values within a (character) vector of data*. Yes, deep down in the C code they are arrays of characters, but not at the R level. As such, splitting the elements of a character vector into their respective component individual characters is not (at all, in my experience) a common operation. charvec[1] within typical R usage (where charvec is *a vector of **data*) is much more likely to be intended to select the *first observation for the data vector*, which it does. Given what R is for, frankly I think it'd be fairly insane for charvec[1] to do what substr does. *Variable Manipulation* Novice users shouldn't be calling eval. This is not to gatekeep it from them, like we have some special "eval-callers" club that they're not invited to. Rather, it is me saying that metaprogramming is not a novice-difficulty task in R (or, I would expect, anywhere else really). You also say "variable names" in this section where you mean "argument names" and that distinction is both meaningful and important. *Variable names, *are not partially matched:> xyz <- 5> x*Error: object 'x' not found* *Subsetting:* I'm fairly certain arrays (including 2d matrices are stored in column order rather than row order because that has been the standard for linear algebra on computers since before I knew what either of those things were... tail(x,1) *is* the idiomatic way of getting the last element of a vector. The people on stackoverflow that told you this was "very slow" were misguided at best. It takes ~6000 *nano*seconds on my laptop, compared to the ~200 nanoseconds x[length(x)]. Yes, that is a 30x speedup; no, it doesn't matter in practice. I'm going to stop now because this is already too long, but this type of response continues to be possible throughout. Lastly, with regard to your mapply challenge. and I quote directly from the documentation (emphasis mine): ...: *arguments to vectorize over* (vectors or lists of strictly positive length, or all of zero length). See also ?Details?. MoreArgs: a list of *other arguments* to ?FUN?. ... is the arguments you vectorize over, so FUN gets one element of each thing in ... for each call. MoreArgs, then, is the set of arguments to FUN *which you don't vectorize over, *ie where each call to FUN gets the whole thing. That's it, that's the whole thing. I don't disagree that this could be clearer (as Ben pointed out, a documentation patch would be the way to address this), but its not correct to say the information isn't in there at all. Best, ~G On Mon, Apr 11, 2022 at 1:52 PM Toby Hocking <tdhock5 at gmail.com> wrote:> You could take some of your observations and turn them into patches that > would help improve R. (discussion of such patches is one function of this > email list) > > On Sun, Apr 10, 2022 at 9:05 AM Stephen H. Dawson, DSL via R-devel < > r-devel at r-project.org> wrote: > > > Hi Reece, > > > > > > Thanks for the article. What specific feedback do you seek for your > > writing? > > > > > > Kindest Regards, > > *Stephen Dawson, DSL* > > /Executive Strategy Consultant/ > > Business & Technology > > +1 (865) 804-3454 > > http://www.shdawson.com > > > > > > On 4/9/22 15:52, Reece Goding wrote: > > > Hello, > > > > > > For a while, I've been working on writing a very big review of R. I've > > finally finished my final proofread of it. Can I get some feedback? This > > seems the most appropriate place to ask. It's linked below. > > > > > > https://github.com/ReeceGoding/Frustration-One-Year-With-R > > > > > > If you think you've seen it before, that will be because it found some > > popularity on Hacker News before I was done proofreading it. The > reception > > seems largely positive so far. > > > > > > Thanks, > > > Reece Goding > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Reece Goding
2022-Apr-12 21:31 UTC
[Rd] I've written a big review of R. Can I get some feedback?
Hi Gabriel, Thanks for the feedback. Much of what you've said seems to agree with a common trend that I've seen in other feedback. Namely, you seem to agree with the many that have told me that using R as anything other than as a tool for data analysis was a grave mistake. I'm increasingly starting to suspect that you're all right. I therefore have little to no counters to your points. As for what you've said in reply to my "mapply challenge", I admit that your response is logical and may even be the best possible answer. However, I find it disturbing that the solution to my puzzle appears to rest on a having a very careful and very specific understanding of what the words "vectorize over" means in the documentation. You could well be right, but it doesn't sit well with me. I'll further consider what you've said about the rest. I'm already making some changes. Thanks again, Reece ________________________________________ From: Gabriel Becker <gabembecker at gmail.com> Sent: 12 April 2022 00:28 To: Toby Hocking; Reece.Goding at outlook.com Cc: r-devel at r-project.org Subject: Re: [Rd] I've written a big review of R. Can I get some feedback? Hi Reece, I'm not really sure what kind of review you're looking for (and I'm not certain this is the right place for it, but hopefully its ok enough). Also, to channel Pascal, forgive me, I would have written a shorter response but I didn't have the time. Firstly, it is fairly ... partisan, I suppose, for lack of a better term. More importantly from a usefulness perspective you often notably don't present the knowledge you gained at the end of the various frustrations you had. As one example that jumped out to me, you say "One day, you?ll be tripped up by R?s hierarchy of how it likes to simplify mixed types outside of lists. " but you don't present your readers with the (well defined) coercion hierarchy so that they would, you know, not be tripped up by it as badly. This is probably my largest issue with your document overall. It can give the reader talking points about how R is bad (not all of which are even incorrect, per se, as many expert R users will be happy to tell you), but it won't really help people become better R users in many cases. Your article also, I suspect, fails to understand what a typical "Novice R Users" is and what they want to do. By and large they want to analyze data and create plots. They are analysts, NOT programmers (writing analysis scripts is not programming in the typical sense, and I'm not the only one who thinks that). So the point you make early on in your explanation why you do not strongly recommend R For Data Science (which I had no part in writing and have not read myself) that it "It deliberately avoids the fundamentals of programming ? e.g. making functions, loops, and if statements ? until the second half. I therefore suspect that any non-novice would be better off finding an introduction to the relevant packages with their favourite search engine." misses the point of R itself for what I'd claim is the "typical novice R user". Having read through your review, I'm confused why you were using R to do some of the things I'm inferring that you felt like you needed it to do. If you picked up R wanting an applicable equally to all programming problem domains general purpose language, you're going to have a bad time. Mostly because that is not what R is. Finally, a (very) incomplete response to a few of the more specific points raised in your review: Lists: The linked stack overflow question (https://stackoverflow.com/questions/2050790/how-to-correctly-use-lists-in-r) shows a pretty fundamental misunderstanding of what lists and atomic vectors are/do in R. There is nothing wrong with this, asking questions we don't know the answer to is how we learn, but I'm not sure the question serves as well as a primer for R lists as you claim. The top answer at time of writing discusses the C level structure of R objects, which can, I suppose, inform your knowledge on how lists at the R level work, but is NOT necessary nor the most pedagogically useful way to present it. Strings: Strings are not arrays of characters idiomatically at the R level, they are atomic observed values within a (character) vector of data. Yes, deep down in the C code they are arrays of characters, but not at the R level. As such, splitting the elements of a character vector into their respective component individual characters is not (at all, in my experience) a common operation. charvec[1] within typical R usage (where charvec is a vector of data) is much more likely to be intended to select the first observation for the data vector, which it does. Given what R is for, frankly I think it'd be fairly insane for charvec[1] to do what substr does. Variable Manipulation Novice users shouldn't be calling eval. This is not to gatekeep it from them, like we have some special "eval-callers" club that they're not invited to. Rather, it is me saying that metaprogramming is not a novice-difficulty task in R (or, I would expect, anywhere else really). You also say "variable names" in this section where you mean "argument names" and that distinction is both meaningful and important. Variable names, are not partially matched:> xyz <- 5> xError: object 'x' not found Subsetting: I'm fairly certain arrays (including 2d matrices are stored in column order rather than row order because that has been the standard for linear algebra on computers since before I knew what either of those things were... tail(x,1) is the idiomatic way of getting the last element of a vector. The people on stackoverflow that told you this was "very slow" were misguided at best. It takes ~6000 nanoseconds on my laptop, compared to the ~200 nanoseconds x[length(x)]. Yes, that is a 30x speedup; no, it doesn't matter in practice. I'm going to stop now because this is already too long, but this type of response continues to be possible throughout. Lastly, with regard to your mapply challenge. and I quote directly from the documentation (emphasis mine): ...: arguments to vectorize over (vectors or lists of strictly positive length, or all of zero length). See also ?Details?. MoreArgs: a list of other arguments to ?FUN?. ... is the arguments you vectorize over, so FUN gets one element of each thing in ... for each call. MoreArgs, then, is the set of arguments to FUN which you don't vectorize over, ie where each call to FUN gets the whole thing. That's it, that's the whole thing. I don't disagree that this could be clearer (as Ben pointed out, a documentation patch would be the way to address this), but its not correct to say the information isn't in there at all. Best, ~G On Mon, Apr 11, 2022 at 1:52 PM Toby Hocking <tdhock5 at gmail.com<mailto:tdhock5 at gmail.com>> wrote: You could take some of your observations and turn them into patches that would help improve R. (discussion of such patches is one function of this email list) On Sun, Apr 10, 2022 at 9:05 AM Stephen H. Dawson, DSL via R-devel < r-devel at r-project.org<mailto:r-devel at r-project.org>> wrote:> Hi Reece, > > > Thanks for the article. What specific feedback do you seek for your > writing? > > > Kindest Regards, > *Stephen Dawson, DSL* > /Executive Strategy Consultant/ > Business & Technology > +1 (865) 804-3454 > http://www.shdawson.com > > > On 4/9/22 15:52, Reece Goding wrote: > > Hello, > > > > For a while, I've been working on writing a very big review of R. I've > finally finished my final proofread of it. Can I get some feedback? This > seems the most appropriate place to ask. It's linked below. > > > > https://github.com/ReeceGoding/Frustration-One-Year-With-R > > > > If you think you've seen it before, that will be because it found some > popularity on Hacker News before I was done proofreading it. The reception > seems largely positive so far. > > > > Thanks, > > Reece Goding > > ______________________________________________ > > R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]] ______________________________________________ R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-devel