This is not a "help" request, just an experience with R that I found
amusing.
I have a machine learning module that I teach which was originally all
symbolic, but has a slowly growing numeric/statistical component.
Today I taught a two part lecture on instance based methods for
learning, with the first half being simple KNN, then kernel methods for
regression. The second half was using case-based reasoning to learn
machine translation from human training.
Anyhow, here's the relevant bit. In the tutorial I got one of the
students to fire up R so that we could use a normal pdf as the kernel
function for a worked example. I ended up that part of the tutorial
showing them the board data (just three points) plotted on a graph, and
then plotting the predicted (according to board calculations) y value
for the "unseen" x value as per the board.
Then since the tutorial had ended, I thought I'd get a little bit fancy.
I created some sample data where the x values were uniform random
between 0 and 8, and the y value was x^2 plus some gaussian noise. I
then plotted the raw data, and a lowess() curve, explaining to the
students that the lowess() curve was a more sophisticated method than
we'd covered in class (I had mentioned locally weighted regression
during the lecture). Finally I plotted the true y=x^2 line so that we
could compare the lowess() curve to the true curve. The students were
all very impressed at the close fit.
So, I then decided to show what happens when the noise gets really
large. So, I recreated my x^2 + noise sample with large amounts of
noise. Plot the lowess() curve and ... it's still pretty well bang on.
So, increase the noise even further, plot the lowess() curve and the
true curve and ... still very accurate. OK, quick explanation that the
random Gaussian noise is more or less evenly distributed either side of
the true line. So, I create some Gaussian noise with mean of 0, squared
it, subtracted the first constant I thought of, and added it to the x^2
values to create the fake "observed" y values. Plot the data, and you
can see the lower part of the curve clearly, but really random looking
noise above it. Plot the lowess() curve, plot the true curve and ...
pretty well bang on.
At this point I'd run out of time, and hence couldn't think of even
tougher tests. The students were all fairly impressed on this. I saw
several of them copying R (whole windows directory) off the hard drive
to take home. Others asked if I could write them some notes on lowess()
etc.
Once again I learn not to do things on the fly in lectures :-)
Cheers,
Ross-c