I have two data sets, each a vector of 1000 numbers, each vector representing a distribution (i.e. 1000 numbers each of which representing a frequency at one point on a scale between 1 and 1000). For similfication, here an short version with only 5 points. a <- c(8,10,8,12,4) b <- c(7,11,8,10,5) Leaving the obvious discussion about causality aside fro a moment, I would like to see how well i can predict b from a using a regression. Since I do not know anything about the distribution type and already discovered non-normality I cannot use parametric regression or anything GLM for that matter. How should I proceed in using non-parametric regression to model vector a and see how well it predicts b? Perhaps you could extend the given lines into a short example script to give me an idea? Are there any other options? Best, Ralf
>From reviewing the first google page result for "Non-parametric regressionR", I hope this link will prove useful: http://socserv.mcmaster.ca/jfox/Courses/Oxford-2005/R-nonparametric-regression.html ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Fri, Jul 9, 2010 at 11:01 AM, Ralf B <ralf.bierig@gmail.com> wrote:> I have two data sets, each a vector of 1000 numbers, each vector > representing a distribution (i.e. 1000 numbers each of which > representing a frequency at one point on a scale between 1 and 1000). > For similfication, here an short version with only 5 points. > > > a <- c(8,10,8,12,4) > b <- c(7,11,8,10,5) > > Leaving the obvious discussion about causality aside fro a moment, I > would like to see how well i can predict b from a using a regression. > Since I do not know anything about the distribution type and already > discovered non-normality I cannot use parametric regression or > anything GLM for that matter. > > How should I proceed in using non-parametric regression to model > vector a and see how well it predicts b? Perhaps you could extend the > given lines into a short example script to give me an idea? Are there > any other options? > > Best, > Ralf > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Just to be correct : gam is mentioned on the page Tal linked to, but is a semi-parametric approach using maximum likelihood. It stays valid though. Another thing : you detect non-normality. But can you use a Poisson distribution for example? The framework of generalized linear models and generalized additive models allows you to deal with non-normality of your data. In any case, I suggest you contact a statistician nearby for guidance. Cheers Joris On Fri, Jul 9, 2010 at 10:26 AM, Tal Galili <tal.galili at gmail.com> wrote:> >From reviewing the first google page result for "Non-parametric regression > R", I hope this link will prove useful: > > http://socserv.mcmaster.ca/jfox/Courses/Oxford-2005/R-nonparametric-regression.html > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | ?972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > > > > On Fri, Jul 9, 2010 at 11:01 AM, Ralf B <ralf.bierig at gmail.com> wrote: > >> I have two data sets, each a vector of 1000 numbers, each vector >> representing a distribution (i.e. 1000 numbers each of which >> representing a frequency at one point on a scale between 1 and 1000). >> For similfication, here an short version with only 5 points. >> >> >> a <- c(8,10,8,12,4) >> b <- c(7,11,8,10,5) >> >> Leaving the obvious discussion about causality aside fro a moment, I >> would like to see how well i can predict b from a using a regression. >> Since I do not know anything about the distribution type and already >> discovered non-normality I cannot use parametric regression or >> anything GLM for that matter. >> >> How should I proceed in using non-parametric regression to model >> vector a and see how well it predicts b? Perhaps you could extend the >> given lines into a short example script to give me an idea? Are there >> any other options? >> >> Best, >> Ralf >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
On Jul 9, 2010, at 4:01 AM, Ralf B wrote:> I have two data sets, each a vector of 1000 numbers, each vector > representing a distribution (i.e. 1000 numbers each of which > representing a frequency at one point on a scale between 1 and 1000). > For similfication, here an short version with only 5 points. > > > a <- c(8,10,8,12,4) > b <- c(7,11,8,10,5) > > Leaving the obvious discussion about causality aside fro a moment, I > would like to see how well i can predict b from a using a regression.You can use density estimation,. There was a recent thread that included worked examples using MASS::kde2d and locfit::locfit for graphical display of joint distributions.> Since I do not know anything about the distribution type and already > discovered non-normality I cannot use parametric regression or > anything GLM for that matter. > > How should I proceed in using non-parametric regression to model > vector a and see how well it predicts b? Perhaps you could extend the > given lines into a short example script to give me an idea? Are there > any other options? > > Best, > RalfDavid Winsemius, MD West Hartford, CT