I am teaching a graduate course on Statistical Computing this semester. A major part of the grade is determined by a project in which a student or small group of students produce, test, and document some software for statistics. I will encourage those students who are developing in S to package their software as an R package. I would welcome suggestions of possible projects, especially projects that come under the heading of "Useful facilities to be added to R". Please keep in mind that the project must be completed by mid-December and that not all the students have extensive experience programming in S and C. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
1) Design of Experiments : - S (even before "-plus" existed) has had functions like fac.design oa.design fractionate etc ["White book", chapter 5]. [also a set of pre-stored useful fractional designs for (> 2)-leveled factors.] Some of these would look like ``student exercises'' to do. If you could have them work in an environment where S-plus was *not* installed, this would end up as "clean table" project w/o problematic code copying... - One could also think of code for SEQUENTIAL design [do a 2^{m-k} (= 8) initially; do a (fractionated) n1 x .. x n_N on the remaining N important factors, given the data for the first 8 experiments]. - Or "Taguchi" [- using (at least two) different kind of factors, some cheap, some expensive to change - multiple Y's, for some the "local" variance should be minimized, etc etc ] This looks tedious and maybe can well be partitioned into different student projects. Maybe JMC, AEF, RMH (authors of ch.5) or other experts can say much more here. ------ 2) For Computer Scientists : "Differentiation" / Symbolic derivatives,.. Improve the possibilities of D() and deriv(), and document them. Make these user-extensible. Think about the hessian in addition to the gradient. -------- Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO D10 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <>< -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Douglas Bates <bates@stat.wisc.edu> writes:> I am teaching a graduate course on Statistical Computing this > semester. A major part of the grade is determined by a project in > which a student or small group of students produce, test, and document > some software for statistics. I will encourage those students who are > developing in S to package their software as an R package. > > I would welcome suggestions of possible projects, especially projects > that come under the heading of "Useful facilities to be added to R". > Please keep in mind that the project must be completed by mid-December > and that not all the students have extensive experience programming in > S and C.These are probably too hard and too narrow, but now the topic is up: - getting predictions to work on new data in cases where model depends on data set (notably regressions splines with auto knot placement) - in lme, we can predict at level K would be nice to get SE of prediction (this also takes levels, extending distinction between confidence and tolerance intervals) - conditional tolerance in lme (much too hard I suspect) - in model.tables.aov, SE's for type="means" are sorely missed. This is not very hard, but maybe too small (although one will have to study issues of contrasts and internals of an lm object rather carefully): - extend pairwise.t.test to take a linear model and a factor in the model as argument. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
At 18:46 14/09/00 +0200, Martin Maechler wrote:> >1) Design of Experiments : > > - S (even before "-plus" existed) has had functions like > > fac.design > oa.design > fractionate etc ["White book", chapter 5]. > > [also a set of pre-stored useful fractional designs > for (> 2)-leveled factors.]The drawback with this project, worthy as it is, is that if you are not really on top of experimental design the computations can be surprisingly tricky. On the quiet I have just released a miniscule library for R, conf.design, that might make a good first step, though. It generates confounded symmetrical designs for the p^n case. You tell it which contrasts you are prepared to sacrifice across blocks and it generates the design that will do it. It also has things to generate "pseudo-factors", that is if you have a composite number of levels for your real factor it will generate a number of factors each with a prime number of levels which together are equivalent to yours. In this way you can partially extend the thing to the non-prime case, but not very far. There is a really interesting programming exercise in this waiting to be done here, I have to say, but the trickiness should not be under-estimated. I released it because I kept getting requests to do so, but it was done a long time ago and it is not exactly tidy. In particular the front end is obscure and should be formula based rather than matrix based. This would make a nice little programming project for a wet weekend or so. (conf.design also has a function, primes(), that is way overkill for the sorts of things I need it for in this library, but it generates primes < N using the classical sieve of eratosthenes in a pretty slick way.) Another approach to this whole problem is to look for approximate designs rather than insist on exact balance, orthogonality, or even absolutely highest efficiency. In the experimental design context (as opposed to, say, coding theory) you never need anything but a reasonably good design, anyway. There is some freely available software around for doing that, such as "Dopt" a fortran program by Alan Miller and Nam Nguyen that was published in Applied Statistics algorithms some time ago for constructing D-optimal (or nearly so) experiments for a given simple block structure. I had a go at putting a front-end on this some time ago and I think the result is still floating around StatLib, but the S code is c*o*m*p*l*e*x and probably far too optimistic in what it sets out to do. I have never returned to it but again, this had a fair following for some time (maybe even still) and would be a very useful thing to have for R. I also have some simulated annealing-type software that constructs near optimal block and row-column designs but I left that problem alone 9 years ago and have not thought much about it since. It could be resurrected, though, if anyone were interested. This is written in C and has never had an S front end. Bill Venables. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>>>> Douglas Bates writes:> I am teaching a graduate course on Statistical Computing this > semester. A major part of the grade is determined by a project in > which a student or small group of students produce, test, and document > some software for statistics. I will encourage those students who are > developing in S to package their software as an R package.> I would welcome suggestions of possible projects, especially projects > that come under the heading of "Useful facilities to be added to R". > Please keep in mind that the project must be completed by mid-December > and that not all the students have extensive experience programming in > S and C.Hopefully not too late for me to join the wishlist ... In addition to the useful suggestions re experimental design and by Peter, I have the following ctest-related projects which I think would fit very nicely. Ordering is according to decreasing priority. * Improved support for exact inference (p-values and, where appropriate, also confidence intervals ) for some of the tests, in particular for Kruskal-Wallis and (2-sample) Smirnov. In addition, we currently don't have exact p-values in the rank-based tests in case of ties, and one could deal with this using the Streitberg-Roehmel path suggested by Torsten Hothorn (see add-on package ExactDistr). Also, permutation tests might be useful in some cases ... [I have an NEW implementation of the Mehta-Patel network algorithm for dealing with the common odds ratio in 2 x 2 x k tables ready, hence will take care of mantelhaen.test() myself.] * Implement alternative definitions of the 2-sided p-value using p = 2[f P(X=x) + min{P(X<x),P(X>x)}] with 0 <= f <= 1 as definition. * Improve the code for fisher.test(), maybe re-implement from scratch? The memory management definitely needs to be rewritten for 1.2. -k -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Here are some further possibilities: 1. An equivalent of S-PLUS plot.gam(), which I would call plot.terms(). [predict(.., type="terms") provides the information that is plotted.] 2. An option in print.lm() summary.lm() etc. that would relate the number of decimal places shown for coefficients etc. to the SE, e.g. a precision equivalent to the 2nd d.p. of the SE. (This may be hard to get right!) 3. Label output where factors appear so that it is clear what parameterisation has been used. (Peter Dalgaard suggested a scheme some time ago that seemed to me eminently sensible.) 4. As an interim measure, until lattice arrives, implement simplified versions of trellis bwplot, qqmath, etc that allow two conditioning factors. This can be done as fairly easy adaptations of coplot(). So that it is not repeated in each individual function, the coplot() code that decodes the graphics formula should come out into a separate function. It needs slight modification to allow formulae such as ~x|a+b. (I have a version of such a function. Adding the code for e.g. a primitive version of bwplot is straightforward. I will do it myself shortly if there are not other offers.) 5. I have been investigating a function that has one list of information for each panel in a trellis type layout, then allowing the user to p rovide a panel function that operates on the list elements. For example, one may want graphs with different numbers of x-values to be superimposed in the same panel. Or a theoretical curve or plot of simulated data may be specified by a small number of parameters that are specific to each panel. (I have a primitive version of such a function.) John Maindonald. John Maindonald email : john.maindonald@anu.edu.au Statistical Consulting Unit, phone : (6249)3998 c/o CMA, SMS, fax : (6249)5549 John Dedman Mathematical Sciences Building Australian National University Canberra ACT 0200 Australia -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._