Paul Johnson
2011-Feb-15 18:04 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
Hello, I am looking for CRAN packages that don't teach bad habits. Can I have suggestions? I don't mean the recommended packages that come with R, I mean the contributed ones. I've been sampling a lot of examples and am surprised that many ignore seemingly agreed-upon principles of R coding. In r-devel, almost everyone seems to support the "functional programming" theme in Chambers's book on Software For Data Analysis, but when I go look at randomly selected packages, programmers don't follow that advice. In particular: 1. Functions must avoid "mystery variables from nowhere." Consider a function's code, it should not be necessary to say "what's variable X?" and go hunting in the commands that lead up to the function call. If X is used in the function, it should be in a named argument, or extracted from one of the named arguments. People who rely on variables floating around in the user's environment are creating hard-to-find bugs. 2. We don't want functions with indirect effects (no <<- ), almost always. 3. Code should be vectorized where possible, C style for loops over vector members should be avoided. 4. We don't want gratuitous use of "return" at the end of functions. Why do people still do that? 5. Neatness counts. Code should look nice! Check out how beautiful the functions in MASS look! I want code with spaces and " <- " rather than everything jammed together with "=". I don't mean to criticize any particular person's code in raising this point. For teaching exemples, where to focus? Here's one candidate I've found: MNP. as far as I can tell, it meets the first 4 requirements. And it has some very clear C code with it as well. I'm only hesitant there because I'm not entirely sure that a package's C code should introduce its own functions for handling vectors and matrices, when some general purpose library might be more desirable. But that's a small point, and clarity and completeness counts a great deal in my opinion. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
Hadley Wickham
2011-Feb-15 18:39 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
I think my recent packages are pretty good. In particular, I'd recommend string, plyr and testthat as being well written, well documented and (somewhat) well tested. I've also been trying to write up the process of writing good packages. See https://github.com/hadley/devtools/wiki for my thoughts so far. Hadley On Tue, Feb 15, 2011 at 6:04 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:> Hello, > > I am looking for CRAN packages that don't teach bad habits. ?Can I > have suggestions? > > I don't mean the recommended packages that come with R, I mean the > contributed ones. ?I've been sampling a lot of examples and am > surprised that many ignore seemingly agreed-upon principles of R > coding. In r-devel, almost everyone seems to support the "functional > programming" theme in Chambers's book on Software For Data Analysis, > but when I go look at randomly selected packages, programmers don't > follow that advice. > > In particular: > > 1. Functions must avoid "mystery variables from nowhere." > > Consider a function's code, it should not be necessary to say "what's > variable X?" and go hunting in the commands that lead up to the > function call. ?If X is used in the function, it should be in a named > argument, or extracted from one of the named arguments. ?People who > rely on variables floating around in the user's environment are > creating hard-to-find bugs. > > 2. We don't want functions with indirect effects (no <<- ), almost always. > > 3. Code should be vectorized where possible, C style for loops over > vector members should be avoided. > > 4. We don't want gratuitous use of "return" at the end of functions. > Why do people still do that? > > 5. Neatness counts. ?Code should look nice! ?Check out how beautiful > the functions in MASS look! I want code with spaces and " <- " rather > than ?everything jammed together with "=". > > I don't mean to criticize any particular person's code in raising this > point. ?For teaching exemples, where to focus? > > Here's one candidate I've found: > > MNP. ?as far as I can tell, it meets the first 4 requirements. ?And it > has some very clear C code with it as well. I'm only hesitant there > because I'm not entirely sure that a package's C code should introduce > its own functions for handling vectors and matrices, when some general > purpose library might be more desirable. ?But that's a small point, > and clarity and completeness counts a great deal in my opinion. > > > > > > -- > Paul E. Johnson > Professor, Political Science > 1541 Lilac Lane, Room 504 > University of Kansas > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Jeffrey Ryan
2011-Feb-15 19:19 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
I think for teaching, you need to use R itself. Everything else is going to be a derivative from that, and if you are looking for 'correctness' or 'consistency' with the spirit of R, you can only be disappointed - as everyone will take liberties or bring personal style into the equation. In addition, your points are debatable in terms of priority/value. e.g. what is wrong with 'return'? Certainly provides clarity and consistency if you have if-else constructs. We've all learned from reading R sources, and it seems to have worked out well for many of us. Jeff On Tue, Feb 15, 2011 at 12:04 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:> Hello, > > I am looking for CRAN packages that don't teach bad habits. ?Can I > have suggestions? > > I don't mean the recommended packages that come with R, I mean the > contributed ones. ?I've been sampling a lot of examples and am > surprised that many ignore seemingly agreed-upon principles of R > coding. In r-devel, almost everyone seems to support the "functional > programming" theme in Chambers's book on Software For Data Analysis, > but when I go look at randomly selected packages, programmers don't > follow that advice. > > In particular: > > 1. Functions must avoid "mystery variables from nowhere." > > Consider a function's code, it should not be necessary to say "what's > variable X?" and go hunting in the commands that lead up to the > function call. ?If X is used in the function, it should be in a named > argument, or extracted from one of the named arguments. ?People who > rely on variables floating around in the user's environment are > creating hard-to-find bugs. > > 2. We don't want functions with indirect effects (no <<- ), almost always. > > 3. Code should be vectorized where possible, C style for loops over > vector members should be avoided. > > 4. We don't want gratuitous use of "return" at the end of functions. > Why do people still do that? > > 5. Neatness counts. ?Code should look nice! ?Check out how beautiful > the functions in MASS look! I want code with spaces and " <- " rather > than ?everything jammed together with "=". > > I don't mean to criticize any particular person's code in raising this > point. ?For teaching exemples, where to focus? > > Here's one candidate I've found: > > MNP. ?as far as I can tell, it meets the first 4 requirements. ?And it > has some very clear C code with it as well. I'm only hesitant there > because I'm not entirely sure that a package's C code should introduce > its own functions for handling vectors and matrices, when some general > purpose library might be more desirable. ?But that's a small point, > and clarity and completeness counts a great deal in my opinion. > > > > > > -- > Paul E. Johnson > Professor, Political Science > 1541 Lilac Lane, Room 504 > University of Kansas > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Jeffrey Ryan jeffrey.ryan at lemnica.com www.lemnica.com
Gabor Grothendieck
2011-Feb-15 19:38 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
On Tue, Feb 15, 2011 at 1:04 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:> Hello, > > I am looking for CRAN packages that don't teach bad habits. ?Can I > have suggestions? > > I don't mean the recommended packages that come with R, I mean the > contributed ones. ?I've been sampling a lot of examples and am > surprised that many ignore seemingly agreed-upon principles of R > coding. In r-devel, almost everyone seems to support the "functional > programming" theme in Chambers's book on Software For Data Analysis, > but when I go look at randomly selected packages, programmers don't > follow that advice. > > In particular: > > 1. Functions must avoid "mystery variables from nowhere." > > Consider a function's code, it should not be necessary to say "what's > variable X?" and go hunting in the commands that lead up to the > function call. ?If X is used in the function, it should be in a named > argument, or extracted from one of the named arguments. ?People who > rely on variables floating around in the user's environment are > creating hard-to-find bugs. > > 2. We don't want functions with indirect effects (no <<- ), almost always. > > 3. Code should be vectorized where possible, C style for loops over > vector members should be avoided. > > 4. We don't want gratuitous use of "return" at the end of functions. > Why do people still do that? > > 5. Neatness counts. ?Code should look nice! ?Check out how beautiful > the functions in MASS look! I want code with spaces and " <- " rather > than ?everything jammed together with "=". > > I don't mean to criticize any particular person's code in raising this > point. ?For teaching exemples, where to focus? > > Here's one candidate I've found: > > MNP. ?as far as I can tell, it meets the first 4 requirements. ?And it > has some very clear C code with it as well. I'm only hesitant there > because I'm not entirely sure that a package's C code should introduce > its own functions for handling vectors and matrices, when some general > purpose library might be more desirable. ?But that's a small point, > and clarity and completeness counts a great deal in my opinion. >There was some discussion of this on stats stackexchange http://stats.stackexchange.com/questions/5418/first-r-packages-source-code-to-study-in-preparation-for-writing-own-package -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Sebastian P. Luque
2011-Feb-15 20:26 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
Hi Paul, You might want to post this to the teaching list (R-sig-teaching). I'd look at packages written by old-timers and R Core. I've also found that most Bioconductor packages follow the guidelines you mention and many other excellent habits very well. I agree with you that these are very important things to teach. Seb On Tue, 15 Feb 2011 12:04:42 -0600, Paul Johnson <pauljohn32 at gmail.com> wrote:> Hello, I am looking for CRAN packages that don't teach bad habits. > Can I have suggestions?> I don't mean the recommended packages that come with R, I mean the > contributed ones. I've been sampling a lot of examples and am > surprised that many ignore seemingly agreed-upon principles of R > coding. In r-devel, almost everyone seems to support the "functional > programming" theme in Chambers's book on Software For Data Analysis, > but when I go look at randomly selected packages, programmers don't > follow that advice.> In particular:> 1. Functions must avoid "mystery variables from nowhere."> Consider a function's code, it should not be necessary to say "what's > variable X?" and go hunting in the commands that lead up to the > function call. If X is used in the function, it should be in a named > argument, or extracted from one of the named arguments. People who > rely on variables floating around in the user's environment are > creating hard-to-find bugs.> 2. We don't want functions with indirect effects (no <<- ), almost > always.> 3. Code should be vectorized where possible, C style for loops over > vector members should be avoided.> 4. We don't want gratuitous use of "return" at the end of functions. > Why do people still do that?> 5. Neatness counts. Code should look nice! Check out how beautiful > the functions in MASS look! I want code with spaces and " <- " rather > than everything jammed together with "=".> I don't mean to criticize any particular person's code in raising this > point. For teaching exemples, where to focus?> Here's one candidate I've found:> MNP. as far as I can tell, it meets the first 4 requirements. And it > has some very clear C code with it as well. I'm only hesitant there > because I'm not entirely sure that a package's C code should introduce > its own functions for handling vectors and matrices, when some general > purpose library might be more desirable. But that's a small point, > and clarity and completeness counts a great deal in my opinion.-- Seb
David Scott
2011-Feb-15 21:48 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
On 16/02/2011 7:04 a.m., Paul Johnson wrote:> Hello, > > I am looking for CRAN packages that don't teach bad habits. Can I > have suggestions? > > I don't mean the recommended packages that come with R, I mean the > contributed ones. I've been sampling a lot of examples and am > surprised that many ignore seemingly agreed-upon principles of R > coding. In r-devel, almost everyone seems to support the "functional > programming" theme in Chambers's book on Software For Data Analysis, > but when I go look at randomly selected packages, programmers don't > follow that advice. > > In particular: > > 1. Functions must avoid "mystery variables from nowhere." > > Consider a function's code, it should not be necessary to say "what's > variable X?" and go hunting in the commands that lead up to the > function call. If X is used in the function, it should be in a named > argument, or extracted from one of the named arguments. People who > rely on variables floating around in the user's environment are > creating hard-to-find bugs. > > 2. We don't want functions with indirect effects (no<<- ), almost always. > > 3. Code should be vectorized where possible, C style for loops over > vector members should be avoided. > > 4. We don't want gratuitous use of "return" at the end of functions. > Why do people still do that?Well I for one (and Jeff as well it seems) think it is good programming practice. It makes explicit what is being returned eliminating the possibility of mistakes and provides clarity for anyone reading the code. David Scott> > 5. Neatness counts. Code should look nice! Check out how beautiful > the functions in MASS look! I want code with spaces and "<- " rather > than everything jammed together with "=". > > I don't mean to criticize any particular person's code in raising this > point. For teaching exemples, where to focus? > > Here's one candidate I've found: > > MNP. as far as I can tell, it meets the first 4 requirements. And it > has some very clear C code with it as well. I'm only hesitant there > because I'm not entirely sure that a package's C code should introduce > its own functions for handling vectors and matrices, when some general > purpose library might be more desirable. But that's a small point, > and clarity and completeness counts a great deal in my opinion. > > > > >-- _________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics
Steven McKinney
2011-Feb-15 23:47 UTC
[Rd] Request: Suggestions for "good teaching" packages, esp. with C code
> -----Original Message----- > From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch > Sent: February-15-11 3:10 PM > To: Kevin Wright > Cc: R Devel List > Subject: Re: [Rd] Request: Suggestions for "good teaching" packages, esp. with C code > > On 15/02/2011 5:22 PM, Kevin Wright wrote: > > For those of you "familiar with R", here's a little quiz. What what's the > > difference between: > > > > > > f1<- function(){ > > a=5 > > } > > This returns 5, invisibly. It's also bad style, according to those of > us who prefer <- to = for assignment.For maximum clarity f0 <- function() { b <- 5 return( list( a = b ) ) }> f0()$a [1] 5 Steven McKinney> > > f2<- function(){ > > return(a=5) > > } > > This is a mistake: return() doesn't take named arguments. It is > lenient and lets you get away with this error (treating it the same as > return(5)), and returns the 5, visibly. > > Duncan Murdoch > > > f2() > > > > > > Kevin Wright > > > > > > > > > > On Tue, Feb 15, 2011 at 3:55 PM, Geoff Jentry<geoffjentry at hexdump.org>wrote: > > > >> On Wed, 16 Feb 2011, David Scott wrote: > >> > >>> 4. We don't want gratuitous use of "return" at the end of functions. > >>>> Why do people still do that? > >>>> > >>> Well I for one (and Jeff as well it seems) think it is good programming > >>> practice. It makes explicit what is being returned eliminating the > >>> possibility of mistakes and provides clarity for anyone reading the code. > >>> > >> > >> You're unnecessarily adding the overhead of a function call by explicitly > >> calling return(). > >> > >> Sure it seems odd for someone coming from the C/C++/Java/etc world, but > >> anyone familiar with R should find code that doesn't have an explicit > >> return() call to be fully readable& clear. > >> > >> -J > >> > >> > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel