Hi all, I have a simple graphing question that is not really a graphing question, but a question about repeating a task. I?m fiddling with some of McElreath?s Statistical Rethinking, and there?s a graph illustrating extreme overfitting (a number of polynomial terms in x equal to the number of observations), a subject I know well having taught it to grad students for many years. The plot I want to reproduce has, in effect: m1 <- lm( y ~ x) m2 <- lm( y ~ x + x^2) ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed by some plot() or lines() or ggplot2() call to render the data and fitted curves. Obviously I don?t want to run such regressions for any real purpose, but I think it might be useful to learn how to do such a thing in R without writing down each lm() call individually. It?s not obvious where I?d want to apply this, but I like learning how to repeat things in a compact way. So, something like: data( mtcars ) d <- mtcars v <- c( 1 , 2 , 3 , 4 , 5 , 6 ) m1 <- lm( mpg ~ hp , data = d ) and then somehow use for() with an index or some flavor of apply() with the vector v to repeat this process yielding m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d) m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d ) ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6) , data=d ) But finding a way to index these values including not just each value but each value+1 , then value+1 and value+2, and so on escapes me. Obviously I don?t want to include index values below zero. ==Richard Sherman rss.pdx at gmail.com
Hi Richard,
This may be what you want:
data(mtcars)
m<-list()
for(i in 1:6) {
rhterms<-paste(paste0("I(hp^",1:i,")"),sep="+")
lmexp<-paste0("lm(mpg~",rhterms,",mtcars)")
cat(lmexp,"\n")
m[[i]]<-eval(parse(text=lmexp))
}
plot(mpg~hp,mtcars,type="n")
for(i in 1:6) abline(m[[i]],col=i)
Jim
On Thu, Aug 23, 2018 at 9:07 AM, Richard Sherman <rss.pdx at gmail.com>
wrote:> Hi all,
>
> I have a simple graphing question that is not really a graphing question,
but a question about repeating a task.
>
> I?m fiddling with some of McElreath?s Statistical Rethinking, and there?s a
graph illustrating extreme overfitting (a number of polynomial terms in x equal
to the number of observations), a subject I know well having taught it to grad
students for many years.
>
> The plot I want to reproduce has, in effect:
>
> m1 <- lm( y ~ x)
> m2 <- lm( y ~ x + x^2)
>
> ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed by some
plot() or lines() or ggplot2() call to render the data and fitted curves.
>
> Obviously I don?t want to run such regressions for any real purpose, but I
think it might be useful to learn how to do such a thing in R without writing
down each lm() call individually. It?s not obvious where I?d want to apply this,
but I like learning how to repeat things in a compact way.
>
> So, something like:
>
> data( mtcars )
> d <- mtcars
> v <- c( 1 , 2 , 3 , 4 , 5 , 6 )
> m1 <- lm( mpg ~ hp , data = d )
>
> and then somehow use for() with an index or some flavor of apply() with the
vector v to repeat this process yielding
>
> m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d)
> m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d )
>
> ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4)
+ I(hp^5) + I(hp^6) , data=d )
>
> But finding a way to index these values including not just each value but
each value+1 , then value+1 and value+2, and so on escapes me. Obviously I don?t
want to include index values below zero.
>
> ==> Richard Sherman
> rss.pdx at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
I do not think this does what the OP wants -- it does not produce
polynomials of the form desired.
John Fox's solution using poly() seems to me to be the right approach, but
I will show what I think is a considerably simpler way to build up the
polynomial expressions just as an example of one way to do this sort of
thing in more general circumstances:
fm <- vector("character",6)
fm[1]<- "mpg ~ hp"
for(i in 2:6)fm[i]<- paste0(fm[i-1]," + I(hp^", i,")")
## yielding:> fm
[1] "mpg ~ hp"
[2] "mpg ~ hp + I(hp^2)"
[3] "mpg ~ hp + I(hp^2) + I(hp^3)"
[4] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4)"
[5] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5)"
[6] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6)"
Although fm is a character vector, the character strings will be
automatically coerced by lm to formulas (see ?lm), so, e.g.
results <- lapply(fm, lm,data = mtcars)
would yield a list of regressions which could then be summarized, plotted
or whatever (again using lapply). e.g.
> results[[3]]
Call:
FUN(formula = X[[i]], data = ..1)
Coefficients:
(Intercept) hp I(hp^2) I(hp^3)
4.422e+01 -2.945e-01 9.115e-04 -8.701e-07
One could also choose to do the plotting or whatever within the lapply
call, but I prefer to keep things simple if possible.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Aug 22, 2018 at 4:43 PM Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Richard,
> This may be what you want:
>
> data(mtcars)
> m<-list()
> for(i in 1:6) {
>
rhterms<-paste(paste0("I(hp^",1:i,")"),sep="+")
> lmexp<-paste0("lm(mpg~",rhterms,",mtcars)")
> cat(lmexp,"\n")
> m[[i]]<-eval(parse(text=lmexp))
> }
> plot(mpg~hp,mtcars,type="n")
> for(i in 1:6) abline(m[[i]],col=i)
>
> Jim
>
>
> On Thu, Aug 23, 2018 at 9:07 AM, Richard Sherman <rss.pdx at
gmail.com>
> wrote:
> > Hi all,
> >
> > I have a simple graphing question that is not really a graphing
> question, but a question about repeating a task.
> >
> > I?m fiddling with some of McElreath?s Statistical Rethinking, and
> there?s a graph illustrating extreme overfitting (a number of polynomial
> terms in x equal to the number of observations), a subject I know well
> having taught it to grad students for many years.
> >
> > The plot I want to reproduce has, in effect:
> >
> > m1 <- lm( y ~ x)
> > m2 <- lm( y ~ x + x^2)
> >
> > ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed by
> some plot() or lines() or ggplot2() call to render the data and fitted
> curves.
> >
> > Obviously I don?t want to run such regressions for any real purpose,
but
> I think it might be useful to learn how to do such a thing in R without
> writing down each lm() call individually. It?s not obvious where I?d want
> to apply this, but I like learning how to repeat things in a compact way.
> >
> > So, something like:
> >
> > data( mtcars )
> > d <- mtcars
> > v <- c( 1 , 2 , 3 , 4 , 5 , 6 )
> > m1 <- lm( mpg ~ hp , data = d )
> >
> > and then somehow use for() with an index or some flavor of apply()
with
> the vector v to repeat this process yielding
> >
> > m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d)
> > m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d )
> >
> > ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) +
I(hp^4)
> + I(hp^5) + I(hp^6) , data=d )
> >
> > But finding a way to index these values including not just each value
> but each value+1 , then value+1 and value+2, and so on escapes me.
> Obviously I don?t want to include index values below zero.
> >
> > ==> > Richard Sherman
> > rss.pdx at gmail.com
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Dear Bert,> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter > Sent: Wednesday, August 22, 2018 8:38 PM > To: Jim Lemon <drjimlemon at gmail.com> > Cc: rss.pdx at gmail.com; R-help <r-help at r-project.org> > Subject: Re: [R] graphing repeated curves > > I do not think this does what the OP wants -- it does not produce polynomials > of the form desired. > > John Fox's solution using poly() seems to me to be the right approach, but IActually, I didn't do a good job of graphing the polynomials between the observed x-values. Here's a better solution: x <- with(mtcars, seq(min(hp), max(hp), length=500)) plot(mpg ~ hp, data=mtcars) for (p in 1:6){ m <- lm(mpg ~ poly(hp, p), data=mtcars) lines(x, predict(m, newdata=data.frame(hp=x)), lty=p, col=p) } legend("top", legend=1:6, lty=1:6, col=1:6, title="order", inset=0.02) Best, John> will show what I think is a considerably simpler way to build up the > polynomial expressions just as an example of one way to do this sort of thing > in more general circumstances: > > fm <- vector("character",6) > fm[1]<- "mpg ~ hp" > for(i in 2:6)fm[i]<- paste0(fm[i-1]," + I(hp^", i,")") ## yielding: > > fm > [1] "mpg ~ hp" > [2] "mpg ~ hp + I(hp^2)" > [3] "mpg ~ hp + I(hp^2) + I(hp^3)" > [4] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4)" > [5] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5)" > [6] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6)" > > Although fm is a character vector, the character strings will be automatically > coerced by lm to formulas (see ?lm), so, e.g. > > results <- lapply(fm, lm,data = mtcars) > > would yield a list of regressions which could then be summarized, plotted or > whatever (again using lapply). e.g. > > > results[[3]] > > Call: > FUN(formula = X[[i]], data = ..1) > > Coefficients: > (Intercept) hp I(hp^2) I(hp^3) > 4.422e+01 -2.945e-01 9.115e-04 -8.701e-07 > > One could also choose to do the plotting or whatever within the lapply call, > but I prefer to keep things simple if possible. > > Cheers, > Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Aug 22, 2018 at 4:43 PM Jim Lemon <drjimlemon at gmail.com> > wrote: > > > Hi Richard, > > This may be what you want: > > > > data(mtcars) > > m<-list() > > for(i in 1:6) { > > rhterms<-paste(paste0("I(hp^",1:i,")"),sep="+") > > lmexp<-paste0("lm(mpg~",rhterms,",mtcars)") > > cat(lmexp,"\n") > > m[[i]]<-eval(parse(text=lmexp)) > > } > > plot(mpg~hp,mtcars,type="n") > > for(i in 1:6) abline(m[[i]],col=i) > > > > Jim > > > > > > On Thu, Aug 23, 2018 at 9:07 AM, Richard Sherman <rss.pdx at gmail.com> > > wrote: > > > Hi all, > > > > > > I have a simple graphing question that is not really a graphing > > question, but a question about repeating a task. > > > > > > I?m fiddling with some of McElreath?s Statistical Rethinking, and > > there?s a graph illustrating extreme overfitting (a number of > > polynomial terms in x equal to the number of observations), a subject > > I know well having taught it to grad students for many years. > > > > > > The plot I want to reproduce has, in effect: > > > > > > m1 <- lm( y ~ x) > > > m2 <- lm( y ~ x + x^2) > > > > > > ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed > > > by > > some plot() or lines() or ggplot2() call to render the data and fitted > > curves. > > > > > > Obviously I don?t want to run such regressions for any real purpose, > > > but > > I think it might be useful to learn how to do such a thing in R > > without writing down each lm() call individually. It?s not obvious > > where I?d want to apply this, but I like learning how to repeat things in a > compact way. > > > > > > So, something like: > > > > > > data( mtcars ) > > > d <- mtcars > > > v <- c( 1 , 2 , 3 , 4 , 5 , 6 ) > > > m1 <- lm( mpg ~ hp , data = d ) > > > > > > and then somehow use for() with an index or some flavor of apply() > > > with > > the vector v to repeat this process yielding > > > > > > m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d) > > > m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d ) > > > > > > ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + > > > I(hp^4) > > + I(hp^5) + I(hp^6) , data=d ) > > > > > > But finding a way to index these values including not just each > > > value > > but each value+1 , then value+1 and value+2, and so on escapes me. > > Obviously I don?t want to include index values below zero. > > > > > > ==> > > Richard Sherman > > > rss.pdx at gmail.com > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
These are great, thanks. I always forget about paste(). ==Richard Sherman rss.pdx at gmail.com> On Aug 22, 2018, at 17:56, Fox, John <jfox at mcmaster.ca> wrote: > > fm <- vector("character",6) > fm[1]<- "mpg ~ hp" > for(i in 2:6)fm[i]<- paste0(fm[i-1]," + I(hp^", i,")")