Looking over the contents of various packages, including my own, it is clear that lots of things end up 'hidden away' in packages where they don't belong. My gregmisc package is a particularly egregious example, containing something from almost every functional category. I propose that from time to time the R community go through the complete set of packages and 'refactor' the functions and data sets into packages that have clearly defined goals. This should make it easier to ensure that new functions get placed into a location where users can easily find them, reduce the amount of re-implementation/duplication existing functionality, and assist in ensuring interoperability. It would be worthwhile, for instance, to pull all of the functions related to contrasts for generalized linear models into a common location, instead of having them spread between base, Hmisc, MASS, gregmisc, etc. Similarly, it would be helpful to pull together all of the genetics-computations into a single location. I recognize that not all package maintainers would be willing to participate and that not all functions could be easily categorized, but I believe that this effort would yield significant benefit and is compatible with the goal of R-core to streamline the base packages. To put my money where my mouth is, I'll volunteer to organize a group effort to do such a refactoring in conjunction with the userR! 2004 or the next DSC, whichever folks agree is better for this purpose. Gregory R. Warnes, Ph.D. Senior Coordinator Groton Non-Clinical Statistics Pfizer Global Research and Development <<Warnes, Gregory R.vcf>> LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
This is a good idea, and it would be great to have these refactored meta packages. But it actually implies having a group similar to R core that does code review of existing packages. For example, what happens if a function seems to work but is programmed horribly inefficiently ? What happens if something exists on both the R and C levels ? What happens with packages that rely on private versions of BLAS ? Suppose two packages provide the same functionality, how does one choose ? And can this be done without coding conventions ? Who is in charge ? On Nov 24, 2003, at 14:12, Warnes, Gregory R wrote:> > Looking over the contents of various packages, including my own, it is > clear > that lots of things end up 'hidden away' in packages where they don't > belong. My gregmisc package is a particularly egregious example, > containing > something from almost every functional category. > > I propose that from time to time the R community go through the > complete set > of packages and 'refactor' the functions and data sets into packages > that > have clearly defined goals. This should make it easier to ensure > that new > functions get placed into a location where users can easily find them, > reduce the amount of re-implementation/duplication existing > functionality, > and assist in ensuring interoperability. > > It would be worthwhile, for instance, to pull all of the functions > related > to contrasts for generalized linear models into a common location, > instead > of having them spread between base, Hmisc, MASS, gregmisc, etc. > Similarly, > it would be helpful to pull together all of the genetics-computations > into a > single location. > > I recognize that not all package maintainers would be willing to > participate > and that not all functions could be easily categorized, but I believe > that > this effort would yield significant benefit and is compatible with the > goal > of R-core to streamline the base packages. > > To put my money where my mouth is, I'll volunteer to organize a group > effort > to do such a refactoring in conjunction with the userR! 2004 or the > next > DSC, whichever folks agree is better for this purpose. > > > Gregory R. Warnes, Ph.D. > Senior Coordinator > Groton Non-Clinical Statistics > Pfizer Global Research and Development > <<Warnes, Gregory R.vcf>> > > > LEGAL NOTICE\ Unless expressly stated otherwise, this > messag...{{dropped}} > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel >==Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.ucla.edu homepage: http://gifi.stat.ucla.edu ------------------------------------------------------------------------ ------------------------- No matter where you go, there you are. --- Buckaroo Banzai http://gifi.stat.ucla.edu/sounds/nomatter.au
How about we use the information already stored in \keyword{}? It should be straighforward to allow navigating among the \keyword hirearchy. -G -----Original Message----- From: Paul Murrell To: Jan de Leeuw Cc: Warnes, Gregory R; 'R-devel@stat.math.ethz.ch' Sent: 11/24/03 6:12 PM Subject: Re: [Rd] Proposal: 'global' package refactoring Hi I have wanted to figure out a way to do something along these lines for the many, widely-scattered plotting functions. Something that would be less invasive (and less useful, but a valid step in the right direction), is simply a "directory" for different functional groups. A list of function names, plus descriptions of what they do, plus a pointer to the package they are in would I think be really useful. A lot of work to create and maintain, but really useful. For example, the web pages focused on "spatial projects" (http://sal.agecon.uiuc.edu/csiss/Rgeo/index.html) has summaries of all spatially related packages. The coordination of the DBMS stuff (http://developer.r-project.org/db/index.html) is another example of something similar. Then of course there is the R GUIs pages (http://www.sciviews.org/_rgui/) Paul Jan de Leeuw wrote:> This is a good idea, and it would be great to have these > refactored meta packages. But it actually implies having > a group similar to R core that does code review of > existing packages. For example, what happens if > a function seems to work but is programmed horribly > inefficiently ? What happens if something exists on both > the R and C levels ? What happens with packages that > rely on private versions of BLAS ? Suppose two packages > provide the same functionality, how does one choose ? > And can this be done without coding conventions ? Who is > in charge ? > > On Nov 24, 2003, at 14:12, Warnes, Gregory R wrote: > >> >> Looking over the contents of various packages, including my own, it >> is clear >> that lots of things end up 'hidden away' in packages where they don't >> belong. My gregmisc package is a particularly egregious example, >> containing >> something from almost every functional category. >> >> I propose that from time to time the R community go through the >> complete set >> of packages and 'refactor' the functions and data sets into packages>> that >> have clearly defined goals. This should make it easier to ensure >> that new >> functions get placed into a location where users can easily findthem,>> reduce the amount of re-implementation/duplication existing >> functionality, >> and assist in ensuring interoperability. >> >> It would be worthwhile, for instance, to pull all of the functions >> related >> to contrasts for generalized linear models into a common location, >> instead >> of having them spread between base, Hmisc, MASS, gregmisc, etc. >> Similarly, >> it would be helpful to pull together all of the genetics-computations>> into a >> single location. >> >> I recognize that not all package maintainers would be willing to >> participate >> and that not all functions could be easily categorized, but I believe>> that >> this effort would yield significant benefit and is compatible with >> the goal >> of R-core to streamline the base packages. >> >> To put my money where my mouth is, I'll volunteer to organize a group>> effort >> to do such a refactoring in conjunction with the userR! 2004 or thenext>> DSC, whichever folks agree is better for this purpose. >> >> >> Gregory R. Warnes, Ph.D. >> Senior Coordinator >> Groton Non-Clinical Statistics >> Pfizer Global Research and Development >> <<Warnes, Gregory R.vcf>> >> >> >> LEGAL NOTICE\ Unless expressly stated otherwise, this >> messag...{{dropped}} >> >> ______________________________________________ >> R-devel@stat.math.ethz.ch mailing list >> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel >> > ==> Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; > Editor: Journal of Multivariate Analysis, Journal of StatisticalSoftware> US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA90095-1554> phone (310)-825-9550; fax (310)-206-5658; email:deleeuw@stat.ucla.edu> homepage: http://gifi.stat.ucla.edu > >------------------------------------------------------------------------> ------------------------- > No matter where you go, there you are. --- Buckaroo Banzai > http://gifi.stat.ucla.edu/sounds/nomatter.au > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel-- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 paul@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
I have great faith in the R comminity's ability to come to consensus. One can (and should) avoid 'troublesom' issues / packages in a first pass. These can come later. Further, once we get the ball rolling, we may be able to convince the community to change from 'I have a neat function, lets make a package!' to 'I have a neat function. It belongs in package XXXXX'.) I would imagine that the first step of this process would be to draft a recommended package structure along with a list of what functions belong where. This would then be submitted *as a proposal* to R-core, who would have (of course) the final say on the structure. A package author will continue to be free to package and distribute his own work in any way that he wants (consistent with the license of code he's borrowed, of course.), although most authors will be pleased that thier code has been deemed 'worthy' of admission into the base set. I further recommend, that JStatSoft be used as a peer-recongnition system for functions that get included in the 'standard R extension packages' that result. These will, of course, have been peer reviewed! This would meet a two-fold need. First, it would provide peer recognition for code that is not suficciently substantial to merit its own packaage, making it easier for new contributors to participate. (Many of the components of gregmissc actually fall into this category.) Second, it will demonstrate the *volume* of contributions by individuals by the number of inclusions. Of course, there would need to be a reasonable minimum size/worth requirement to be recognized in this fashon. Smaller contributions will continue to be recognized by listing the names of all contributors in a master list. Further on, it will become possible to de-centralize the management of the package system so that certain individuals would take up management of specific package areas, (e.g. Paul Murrrel for graphics, etc.) -----Original Message----- From: Jan de Leeuw To: Warnes, Gregory R Cc: 'R-devel@stat.math.ethz.ch' Sent: 11/24/03 5:36 PM Subject: Re: [Rd] Proposal: 'global' package refactoring This is a good idea, and it would be great to have these refactored meta packages. But it actually implies having a group similar to R core that does code review of existing packages. For example, what happens if a function seems to work but is programmed horribly inefficiently ? What happens if something exists on both the R and C levels ? What happens with packages that rely on private versions of BLAS ? Suppose two packages provide the same functionality, how does one choose ? And can this be done without coding conventions ? Who is in charge ? On Nov 24, 2003, at 14:12, Warnes, Gregory R wrote: > > Looking over the contents of various packages, including my own, it is > clear > that lots of things end up 'hidden away' in packages where they don't > belong. My gregmisc package is a particularly egregious example, > containing > something from almost every functional category. > > I propose that from time to time the R community go through the > complete set > of packages and 'refactor' the functions and data sets into packages > that > have clearly defined goals. This should make it easier to ensure > that new > functions get placed into a location where users can easily find them, > reduce the amount of re-implementation/duplication existing > functionality, > and assist in ensuring interoperability. > > It would be worthwhile, for instance, to pull all of the functions > related > to contrasts for generalized linear models into a common location, > instead > of having them spread between base, Hmisc, MASS, gregmisc, etc. > Similarly, > it would be helpful to pull together all of the genetics-computations > into a > single location. > > I recognize that not all package maintainers would be willing to > participate > and that not all functions could be easily categorized, but I believe > that > this effort would yield significant benefit and is compatible with the > goal > of R-core to streamline the base packages. > > To put my money where my mouth is, I'll volunteer to organize a group > effort > to do such a refactoring in conjunction with the userR! 2004 or the > next > DSC, whichever folks agree is better for this purpose. > > > Gregory R. Warnes, Ph.D. > Senior Coordinator > Groton Non-Clinical Statistics > Pfizer Global Research and Development > <<Warnes, Gregory R.vcf>> > > > LEGAL NOTICE\ Unless expressly stated otherwise, this > messag...{{dropped}} > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > ==Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8130 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw@stat.ucla.edu homepage: http://gifi.stat.ucla.edu ------------------------------------------------------------------------ ------------------------- No matter where you go, there you are. --- Buckaroo Banzai http://gifi.stat.ucla.edu/sounds/nomatter.au
I am explicitly not prepared to `refactor' MASS. Not only is it explicitly support software for a book (which references it and those references cannot be changed retrospectively), it also represents much work over many years. Not that we get much credit for it, but we do get some and these days that does matter. Parts of MASS have been incorporated into both R and S-PLUS -- perhaps we have already gone too far. Indeed, I have floated the idea of migrating some functionality back, notably that of package lqs (which is part of MASS in the S version). On Mon, 24 Nov 2003, Warnes, Gregory R wrote:> > Looking over the contents of various packages, including my own, it is clear > that lots of things end up 'hidden away' in packages where they don't > belong. My gregmisc package is a particularly egregious example, containing > something from almost every functional category. > > I propose that from time to time the R community go through the complete set > of packages and 'refactor' the functions and data sets into packages that > have clearly defined goals. This should make it easier to ensure that new > functions get placed into a location where users can easily find them, > reduce the amount of re-implementation/duplication existing functionality, > and assist in ensuring interoperability. > > It would be worthwhile, for instance, to pull all of the functions related > to contrasts for generalized linear models into a common location, instead > of having them spread between base, Hmisc, MASS, gregmisc, etc. Similarly, > it would be helpful to pull together all of the genetics-computations into a > single location. > > I recognize that not all package maintainers would be willing to participate > and that not all functions could be easily categorized, but I believe that > this effort would yield significant benefit and is compatible with the goal > of R-core to streamline the base packages. > > To put my money where my mouth is, I'll volunteer to organize a group effort > to do such a refactoring in conjunction with the userR! 2004 or the next > DSC, whichever folks agree is better for this purpose. > > > Gregory R. Warnes, Ph.D. > Senior Coordinator > Groton Non-Clinical Statistics > Pfizer Global Research and Development > <<Warnes, Gregory R.vcf>> > > > LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Mon, 24 Nov 2003 17:12:24 -0500, you wrote:>I propose that from time to time the R community go through the complete set >of packages and 'refactor' the functions and data sets into packages that >have clearly defined goals.Package 'foreign' is currently such a multi-author common purpose package. One of the problems is that there isn't a single maintainer: users who have problems with any of the functions write to all of the authors for help. All of the authors are still active, so this gets a response, but I can see problems in some other package where an author moves on and doesn't want to maintain the code. If that happens to a package then the package will disappear from CRAN, once it stops passing tests in new releases. If it's just a function or two, what happens when it needs maintenance, or when it gets orphaned? Duncan Murdoch
> From: John Fox > > Dear Gregory, Paul, and Jan, > > I recall proposing something like this (that is, a classification of > available functions) some time ago, but it never got off the > ground. The > advantage of using keywords is that package authors would > classify their > own functions, but I don't think that the current set of keywords is > adequate. It would be particularly useful to work out a > hierarchical or > perhaps hyper-linked classification (without restricting particular > functions to just one terminal node).I guess something similar to GAMS would help a lot: http://gams.nist.gov/ Personally I think "refactoring" of packages is too difficult to manage, but cross-indexing of functions in an easily searchable fashion would go a long way. Best, Andy> Regards, > John