Hi all, I have a simple graphing question that is not really a graphing question, but a question about repeating a task. I?m fiddling with some of McElreath?s Statistical Rethinking, and there?s a graph illustrating extreme overfitting (a number of polynomial terms in x equal to the number of observations), a subject I know well having taught it to grad students for many years. The plot I want to reproduce has, in effect: m1 <- lm( y ~ x) m2 <- lm( y ~ x + x^2) ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed by some plot() or lines() or ggplot2() call to render the data and fitted curves. Obviously I don?t want to run such regressions for any real purpose, but I think it might be useful to learn how to do such a thing in R without writing down each lm() call individually. It?s not obvious where I?d want to apply this, but I like learning how to repeat things in a compact way. So, something like: data( mtcars ) d <- mtcars v <- c( 1 , 2 , 3 , 4 , 5 , 6 ) m1 <- lm( mpg ~ hp , data = d ) and then somehow use for() with an index or some flavor of apply() with the vector v to repeat this process yielding m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d) m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d ) ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6) , data=d ) But finding a way to index these values including not just each value but each value+1 , then value+1 and value+2, and so on escapes me. Obviously I don?t want to include index values below zero. ==Richard Sherman rss.pdx at gmail.com
Hi Richard, This may be what you want: data(mtcars) m<-list() for(i in 1:6) { rhterms<-paste(paste0("I(hp^",1:i,")"),sep="+") lmexp<-paste0("lm(mpg~",rhterms,",mtcars)") cat(lmexp,"\n") m[[i]]<-eval(parse(text=lmexp)) } plot(mpg~hp,mtcars,type="n") for(i in 1:6) abline(m[[i]],col=i) Jim On Thu, Aug 23, 2018 at 9:07 AM, Richard Sherman <rss.pdx at gmail.com> wrote:> Hi all, > > I have a simple graphing question that is not really a graphing question, but a question about repeating a task. > > I?m fiddling with some of McElreath?s Statistical Rethinking, and there?s a graph illustrating extreme overfitting (a number of polynomial terms in x equal to the number of observations), a subject I know well having taught it to grad students for many years. > > The plot I want to reproduce has, in effect: > > m1 <- lm( y ~ x) > m2 <- lm( y ~ x + x^2) > > ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed by some plot() or lines() or ggplot2() call to render the data and fitted curves. > > Obviously I don?t want to run such regressions for any real purpose, but I think it might be useful to learn how to do such a thing in R without writing down each lm() call individually. It?s not obvious where I?d want to apply this, but I like learning how to repeat things in a compact way. > > So, something like: > > data( mtcars ) > d <- mtcars > v <- c( 1 , 2 , 3 , 4 , 5 , 6 ) > m1 <- lm( mpg ~ hp , data = d ) > > and then somehow use for() with an index or some flavor of apply() with the vector v to repeat this process yielding > > m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d) > m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d ) > > ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6) , data=d ) > > But finding a way to index these values including not just each value but each value+1 , then value+1 and value+2, and so on escapes me. Obviously I don?t want to include index values below zero. > > ==> Richard Sherman > rss.pdx at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I do not think this does what the OP wants -- it does not produce polynomials of the form desired. John Fox's solution using poly() seems to me to be the right approach, but I will show what I think is a considerably simpler way to build up the polynomial expressions just as an example of one way to do this sort of thing in more general circumstances: fm <- vector("character",6) fm[1]<- "mpg ~ hp" for(i in 2:6)fm[i]<- paste0(fm[i-1]," + I(hp^", i,")") ## yielding:> fm[1] "mpg ~ hp" [2] "mpg ~ hp + I(hp^2)" [3] "mpg ~ hp + I(hp^2) + I(hp^3)" [4] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4)" [5] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5)" [6] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6)" Although fm is a character vector, the character strings will be automatically coerced by lm to formulas (see ?lm), so, e.g. results <- lapply(fm, lm,data = mtcars) would yield a list of regressions which could then be summarized, plotted or whatever (again using lapply). e.g.> results[[3]]Call: FUN(formula = X[[i]], data = ..1) Coefficients: (Intercept) hp I(hp^2) I(hp^3) 4.422e+01 -2.945e-01 9.115e-04 -8.701e-07 One could also choose to do the plotting or whatever within the lapply call, but I prefer to keep things simple if possible. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Aug 22, 2018 at 4:43 PM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Richard, > This may be what you want: > > data(mtcars) > m<-list() > for(i in 1:6) { > rhterms<-paste(paste0("I(hp^",1:i,")"),sep="+") > lmexp<-paste0("lm(mpg~",rhterms,",mtcars)") > cat(lmexp,"\n") > m[[i]]<-eval(parse(text=lmexp)) > } > plot(mpg~hp,mtcars,type="n") > for(i in 1:6) abline(m[[i]],col=i) > > Jim > > > On Thu, Aug 23, 2018 at 9:07 AM, Richard Sherman <rss.pdx at gmail.com> > wrote: > > Hi all, > > > > I have a simple graphing question that is not really a graphing > question, but a question about repeating a task. > > > > I?m fiddling with some of McElreath?s Statistical Rethinking, and > there?s a graph illustrating extreme overfitting (a number of polynomial > terms in x equal to the number of observations), a subject I know well > having taught it to grad students for many years. > > > > The plot I want to reproduce has, in effect: > > > > m1 <- lm( y ~ x) > > m2 <- lm( y ~ x + x^2) > > > > ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed by > some plot() or lines() or ggplot2() call to render the data and fitted > curves. > > > > Obviously I don?t want to run such regressions for any real purpose, but > I think it might be useful to learn how to do such a thing in R without > writing down each lm() call individually. It?s not obvious where I?d want > to apply this, but I like learning how to repeat things in a compact way. > > > > So, something like: > > > > data( mtcars ) > > d <- mtcars > > v <- c( 1 , 2 , 3 , 4 , 5 , 6 ) > > m1 <- lm( mpg ~ hp , data = d ) > > > > and then somehow use for() with an index or some flavor of apply() with > the vector v to repeat this process yielding > > > > m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d) > > m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d ) > > > > ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) > + I(hp^5) + I(hp^6) , data=d ) > > > > But finding a way to index these values including not just each value > but each value+1 , then value+1 and value+2, and so on escapes me. > Obviously I don?t want to include index values below zero. > > > > ==> > Richard Sherman > > rss.pdx at gmail.com > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Bert,> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter > Sent: Wednesday, August 22, 2018 8:38 PM > To: Jim Lemon <drjimlemon at gmail.com> > Cc: rss.pdx at gmail.com; R-help <r-help at r-project.org> > Subject: Re: [R] graphing repeated curves > > I do not think this does what the OP wants -- it does not produce polynomials > of the form desired. > > John Fox's solution using poly() seems to me to be the right approach, but IActually, I didn't do a good job of graphing the polynomials between the observed x-values. Here's a better solution: x <- with(mtcars, seq(min(hp), max(hp), length=500)) plot(mpg ~ hp, data=mtcars) for (p in 1:6){ m <- lm(mpg ~ poly(hp, p), data=mtcars) lines(x, predict(m, newdata=data.frame(hp=x)), lty=p, col=p) } legend("top", legend=1:6, lty=1:6, col=1:6, title="order", inset=0.02) Best, John> will show what I think is a considerably simpler way to build up the > polynomial expressions just as an example of one way to do this sort of thing > in more general circumstances: > > fm <- vector("character",6) > fm[1]<- "mpg ~ hp" > for(i in 2:6)fm[i]<- paste0(fm[i-1]," + I(hp^", i,")") ## yielding: > > fm > [1] "mpg ~ hp" > [2] "mpg ~ hp + I(hp^2)" > [3] "mpg ~ hp + I(hp^2) + I(hp^3)" > [4] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4)" > [5] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5)" > [6] "mpg ~ hp + I(hp^2) + I(hp^3) + I(hp^4) + I(hp^5) + I(hp^6)" > > Although fm is a character vector, the character strings will be automatically > coerced by lm to formulas (see ?lm), so, e.g. > > results <- lapply(fm, lm,data = mtcars) > > would yield a list of regressions which could then be summarized, plotted or > whatever (again using lapply). e.g. > > > results[[3]] > > Call: > FUN(formula = X[[i]], data = ..1) > > Coefficients: > (Intercept) hp I(hp^2) I(hp^3) > 4.422e+01 -2.945e-01 9.115e-04 -8.701e-07 > > One could also choose to do the plotting or whatever within the lapply call, > but I prefer to keep things simple if possible. > > Cheers, > Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Aug 22, 2018 at 4:43 PM Jim Lemon <drjimlemon at gmail.com> > wrote: > > > Hi Richard, > > This may be what you want: > > > > data(mtcars) > > m<-list() > > for(i in 1:6) { > > rhterms<-paste(paste0("I(hp^",1:i,")"),sep="+") > > lmexp<-paste0("lm(mpg~",rhterms,",mtcars)") > > cat(lmexp,"\n") > > m[[i]]<-eval(parse(text=lmexp)) > > } > > plot(mpg~hp,mtcars,type="n") > > for(i in 1:6) abline(m[[i]],col=i) > > > > Jim > > > > > > On Thu, Aug 23, 2018 at 9:07 AM, Richard Sherman <rss.pdx at gmail.com> > > wrote: > > > Hi all, > > > > > > I have a simple graphing question that is not really a graphing > > question, but a question about repeating a task. > > > > > > I?m fiddling with some of McElreath?s Statistical Rethinking, and > > there?s a graph illustrating extreme overfitting (a number of > > polynomial terms in x equal to the number of observations), a subject > > I know well having taught it to grad students for many years. > > > > > > The plot I want to reproduce has, in effect: > > > > > > m1 <- lm( y ~ x) > > > m2 <- lm( y ~ x + x^2) > > > > > > ?etc., through lm( y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 ), followed > > > by > > some plot() or lines() or ggplot2() call to render the data and fitted > > curves. > > > > > > Obviously I don?t want to run such regressions for any real purpose, > > > but > > I think it might be useful to learn how to do such a thing in R > > without writing down each lm() call individually. It?s not obvious > > where I?d want to apply this, but I like learning how to repeat things in a > compact way. > > > > > > So, something like: > > > > > > data( mtcars ) > > > d <- mtcars > > > v <- c( 1 , 2 , 3 , 4 , 5 , 6 ) > > > m1 <- lm( mpg ~ hp , data = d ) > > > > > > and then somehow use for() with an index or some flavor of apply() > > > with > > the vector v to repeat this process yielding > > > > > > m2 <- lm( mpg ~ hp + I( hp ^2 ) , data=d) > > > m3 <- lm( mpg ~ hp + I( hp^2 ) + I(hp^3) , data=d ) > > > > > > ? and the rest through m6 <- lm( mpg ~ hp + I(hp^2) + I(hp^3) + > > > I(hp^4) > > + I(hp^5) + I(hp^6) , data=d ) > > > > > > But finding a way to index these values including not just each > > > value > > but each value+1 , then value+1 and value+2, and so on escapes me. > > Obviously I don?t want to include index values below zero. > > > > > > ==> > > Richard Sherman > > > rss.pdx at gmail.com > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
These are great, thanks. I always forget about paste(). ==Richard Sherman rss.pdx at gmail.com> On Aug 22, 2018, at 17:56, Fox, John <jfox at mcmaster.ca> wrote: > > fm <- vector("character",6) > fm[1]<- "mpg ~ hp" > for(i in 2:6)fm[i]<- paste0(fm[i-1]," + I(hp^", i,")")