Here are some of the ideas I have used in the past teaching a class like this:
Give them a paragraph of text describing data values and have them create a data
frame from the data (the prose is so that there is not an obvious table
structure to start with). Something like:
Patient number 1 (male) had blood pressure of 120/80 before the treatment and
110/70 after the treatment, patient number 2 lowered her systolic value from 130
to 120 with the treatment but her diastolic value stayed at 80, ...
I only used about 6 rows of final data, but this forces the students to think
about how they want to structure the data (should systolic before and after be
separate columns? Or 1 column with another column indicating before/after?)
Now have the students do some basic analyses on a sample dataset, t-tests,
summaries, basic regression, diagnostic plots.
Have them compute regression coefficients the hard way (doing the matrix
multiplications and/or minimizing the sum of squared residuals using optim),
this may help them appreciate the lm function.
Generate a population of random data and compute the mean and standard
deviation, then take 100 or 1,000 samples from this population and compute the
means of each sample. Compare the mean and standard deviation of the means to
the mean and standard deviation of the population, create a histogram of the
means and show summaries of the means and the population as reference lines on
the plot (cement the central limit theorem).
Write a function to do the classic number guessing game where the function will
choose an integer between 1 and 100 then prompt the user for a guess, then tell
them if their guess is too high, too low, or correct (not interesting
statistically, but gives some good basic use of programming logic).
Write a function that will compute the arithmetic, geometric, harmonic, and self
weighting means. The function needs the same optional arguments as mean.
Optionally have it plot a histogram of the data with reference lines at each of
the means.
Use regexpr and related functions to extract information from date(), or from
the rownames of a dataset (I often get data whith id values like M1, M2, F1, F2,
... and no column of sex info, so need to extract that from the id).
Generate data from the distribution f(x) = x/2 for 0<x<2. Generate
bivariate data from the joint distribution f(x,y) = 2x+2y-4xy, 0<x<1,
0<y<1. Plot the data to see if it looks like it comes from the
theoretical distribution.
Various simulations:
recreate the t-table (generate samples of normals, compute t, find the quantile
at which 5% of tests would be more extreme.
Generate data for a 2-sample t test, but decide whether to pool the variances
based on a test of the variances. Simulate under various conditions to see if
you get a different error rate than you should.
Do simulations to calculate power for different scenarios.
As part of the final I would usually have them write a function to do
Hottelling's multivariate T-test.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Erin Hodgess
> Sent: Wednesday, September 24, 2008 11:39 AM
> To: r-help at r-project.org
> Subject: [R] possible interesting R projects for undergrads
>
> Dear R People:
>
> I finally (Yay!) got R installed in a classroom!
>
> Anyhow, I have a respectful request, please: could anyone recommend
> some nice undergrad projects in R, please?
>
> This is in a statistical computation class; first time being run.
>
> Thanks,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodgess at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.