thr3ads.net - R devel - [Rd] Projects [Sep 2000]

If this information is useful, please help other people find it:
Share via:

Douglas Bates

2000-Sep-14 15:37 UTC

[Rd] Projects

I am teaching a graduate course on Statistical Computing this
semester.  A major part of the grade is determined by a project in
which a student or small group of students produce, test, and document
some software for statistics.  I will encourage those students who are
developing in S to package their software as an R package.

I would welcome suggestions of possible projects, especially projects
that come under the heading of "Useful facilities to be added to R".
Please keep in mind that the project must be completed by mid-December
and that not all the students have extensive experience programming in
S and C.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Martin Maechler

2000-Sep-14 16:46 UTC

head link

[Rd] Projects

1) Design of Experiments :

 - S (even before "-plus" existed) has had functions like

   fac.design
   oa.design
   fractionate   etc  ["White book", chapter 5].

		 [also a set of pre-stored useful fractional designs
		  for (> 2)-leveled factors.]

  Some of these would look like ``student exercises'' to do.
  If you could have them work in an environment where S-plus was *not*
  installed, this would end up as "clean table" project 
  w/o problematic code copying...

 - One could also think of  code for SEQUENTIAL design
  [do a  2^{m-k} (= 8) initially; 
   do a  (fractionated) n1 x .. x n_N on the remaining N important factors,
   given the data for the first 8 experiments].

 - Or "Taguchi" [- using (at least two) different kind of factors,
some cheap,
		   some expensive to change
		 - multiple Y's, for some the "local" variance should be
		   minimized, etc etc
		 ]
	      
  This looks tedious and maybe can well be partitioned into different student
  projects.  Maybe JMC, AEF, RMH (authors of ch.5) or other experts can say
  much more here.

------

2) For Computer Scientists :
    "Differentiation" / Symbolic derivatives,..

   Improve the possibilities of  D() and deriv(), and document them.
   Make these user-extensible.

   Think about the hessian in addition to the gradient.

  

--------

Martin Maechler <maechler@stat.math.ethz.ch>
http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard BSA

2000-Sep-14 17:16 UTC

head link

[Rd] Projects

Douglas Bates <bates@stat.wisc.edu> writes:
> I am teaching a graduate course on Statistical Computing this
> semester.  A major part of the grade is determined by a project in
> which a student or small group of students produce, test, and document
> some software for statistics.  I will encourage those students who are
> developing in S to package their software as an R package.
> 
> I would welcome suggestions of possible projects, especially projects
> that come under the heading of "Useful facilities to be added to
R".
> Please keep in mind that the project must be completed by mid-December
> and that not all the students have extensive experience programming in
> S and C.
These are probably too hard and too narrow, but now the topic is up:

- getting predictions to work on new data in cases where model depends
  on data set (notably regressions splines with auto knot placement)

- in lme, we can predict at level K would be nice to get SE of
  prediction (this also takes levels, extending distinction between
  confidence and tolerance intervals) 

- conditional tolerance in lme (much too hard I suspect)

- in model.tables.aov, SE's for type="means" are sorely missed.

This is not very hard, but maybe too small (although one will have to
study issues of contrasts and internals of an lm object rather
carefully):

- extend pairwise.t.test to take a linear model and a factor in the
  model as argument. 

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bill Venables

2000-Sep-15 02:18 UTC

head link

[Rd] Projects

At 18:46 14/09/00 +0200, Martin Maechler wrote:>
>1) Design of Experiments :
>
> - S (even before "-plus" existed) has had functions like
>
>   fac.design
>   oa.design
>   fractionate   etc  ["White book", chapter 5].
>
>		 [also a set of pre-stored useful fractional designs
>		  for (> 2)-leveled factors.]
The drawback with this project, worthy as it is, is that if you are not
really on top of experimental design the computations can be surprisingly
tricky.

On the quiet I have just released a miniscule library for R, conf.design,
that might make a good first step, though.  It generates confounded
symmetrical designs for the p^n case.  You tell it which contrasts you are
prepared to sacrifice across blocks and it generates the design that will
do it.  It also has things to generate "pseudo-factors", that is if
you
have a composite number of levels for your real factor it will generate a
number of factors each with a prime number of levels which together are
equivalent to yours.  In this way you can partially extend the thing to the
non-prime case, but not very far.  There is a really interesting
programming exercise in this waiting to be done here, I have to say, but
the trickiness should not be under-estimated.

I released it because I kept getting requests to do so, but it was done a
long time ago and it is not exactly tidy.  In particular the front end is
obscure and should be formula based rather than matrix based.  This would
make a nice little programming project for a wet weekend or so.

(conf.design also has a function, primes(), that is way overkill for the
sorts of things I need it for in this library, but it generates primes < N
using the classical sieve of eratosthenes in a pretty slick way.)

Another approach to this whole problem is to look for approximate designs
rather than insist on exact balance, orthogonality, or even absolutely
highest efficiency.  In the experimental design context (as opposed to,
say, coding theory) you never need anything but a reasonably good design,
anyway.  

There is some freely available software around for doing that, such as
"Dopt" a fortran program by Alan Miller and Nam Nguyen that was
published
in Applied Statistics algorithms some time ago for constructing D-optimal
(or nearly so) experiments for a given simple block structure.  

I had a go at putting a front-end on this some time ago and I think the
result is still floating around StatLib, but the S code is c*o*m*p*l*e*x
and probably far too optimistic in what it sets out to do.  I have never
returned to it but again, this had a fair following for some time (maybe
even still) and would be a very useful thing to have for R.  I also have
some simulated annealing-type software that constructs near optimal block
and row-column designs but I left that problem alone 9 years ago and have
not thought much about it since.  It could be resurrected, though, if
anyone were interested.  This is written in C and has never had an S front
end.

Bill Venables.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Kurt Hornik

2000-Sep-17 22:51 UTC

head link

[Rd] Projects

>>>>> Douglas Bates writes:
> I am teaching a graduate course on Statistical Computing this
> semester.  A major part of the grade is determined by a project in
> which a student or small group of students produce, test, and document
> some software for statistics.  I will encourage those students who are
> developing in S to package their software as an R package.
> I would welcome suggestions of possible projects, especially projects
> that come under the heading of "Useful facilities to be added to
R".
> Please keep in mind that the project must be completed by mid-December
> and that not all the students have extensive experience programming in
> S and C.
Hopefully not too late for me to join the wishlist ...

In addition to the useful suggestions re experimental design and by
Peter, I have the following ctest-related projects which I think would
fit very nicely.  Ordering is according to decreasing priority.

* Improved support for exact inference (p-values and, where appropriate,
also confidence intervals ) for some of the tests, in particular for
Kruskal-Wallis and (2-sample) Smirnov.  In addition, we currently don't
have exact p-values in the rank-based tests in case of ties, and one
could deal with this using the Streitberg-Roehmel path suggested by
Torsten Hothorn (see add-on package ExactDistr).  Also, permutation
tests might be useful in some cases ...

[I have an NEW implementation of the Mehta-Patel network algorithm for
dealing with the common odds ratio in 2 x 2 x k tables ready, hence will
take care of mantelhaen.test() myself.]

* Implement alternative definitions of the 2-sided p-value using

	p = 2[f P(X=x) + min{P(X<x),P(X>x)}]

with 0 <= f <= 1 as definition.

* Improve the code for fisher.test(), maybe re-implement from scratch?
The memory management definitely needs to be rewritten for 1.2.

-k
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

John Maindonald

2000-Sep-18 06:05 UTC

head link

[Rd] Projects

Here are some further possibilities:

1. An equivalent of S-PLUS plot.gam(), which I
would call plot.terms().
[predict(.., type="terms") provides the
information that is plotted.]

2. An option in print.lm() summary.lm() etc.
that would relate the number of decimal
places shown for coefficients etc. to the SE,
e.g. a precision equivalent to the 2nd d.p.
of the SE.
(This may be hard to get right!)

3. Label output where factors appear so that it
is clear what parameterisation has been used.
(Peter Dalgaard suggested a scheme some time
ago that seemed to me eminently sensible.)

4. As an interim measure, until lattice arrives,
implement simplified versions of trellis
bwplot, qqmath, etc that allow two conditioning
factors.  This can be done as fairly easy
adaptations of coplot().  So that it is not
repeated in each individual function, the
coplot() code that decodes the graphics formula
should come out into a separate function.
It needs slight modification to allow formulae
such as ~x|a+b.
(I have a version of such a function.  Adding
the code for e.g. a primitive version of 
bwplot is straightforward. I will do it myself 
shortly if there are not other offers.)

5. I have been investigating a function that has 
one list of information for each panel in a 
trellis type layout, then allowing the user to p
rovide a panel function that operates on the list 
elements. For example, one may want graphs with 
different numbers of x-values to be superimposed 
in the same panel.  Or a theoretical curve or plot
of simulated data may be specified by a small
number of parameters that are specific to each
panel.
(I have a primitive version of such a function.)

John Maindonald.
John Maindonald               email : john.maindonald@anu.edu.au        
Statistical Consulting Unit,  phone : (6249)3998        
c/o CMA, SMS,                 fax   : (6249)5549  
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Reasonably Related Threads

Search for more maybe matching threads

R devel - Sep 2000 - Projects

[Rd] Projects

[Rd] Projects

[Rd] Projects

[Rd] Projects

[Rd] Projects

[Rd] Projects

Reasonably Related Threads