How can one possibly answer this without knowing the structure of your
dataset?
-- Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jul 30, 2018 at 8:24 AM, Baojun Sun <bsun1 at students.towson.edu>
wrote:
> The book "Introduction to Statistical Learning" gives R scripts
for its
> labs. I found a script for ridge regression that works on the dataset the
> book uses but is unusable on other datasets I own unless I clean the data.
>
>
> I'm trying to understand the syntax for I need for data cleaning and am
> stuck. I want to learn to do ridge regression. I tried using my own data
> set on this script rather than the book example but get errors. If you use
> your own data set rather than the Hitters dataset, then you'll get
errors
> unless you format your code. How do I change this script or clean any
> dataset so that this script for ridge regression useable for all datasets?
>
>
> library(ISLR)
>
> fix(Hitters)
>
> names(Hitters)
>
> dim(Hitters)
>
> sum(is.na(Hitters$Salary))
>
> Hitters=na.omit(Hitters)
>
> dim(Hitters)
>
> sum(is.na(Hitters))
>
> library(leaps)
>
>
>
> x=model.matrix(Salary~.,Hitters)[,-1]
>
> y=Hitters$Salary
>
>
>
> # Ridge Regression
>
>
>
> library(glmnet)
>
> grid=10^seq(10,-2,length=100)
>
> ridge.mod=glmnet(x,y,alpha=0,lambda=grid)
>
> dim(coef(ridge.mod))
>
> ridge.mod$lambda[50]
>
> coef(ridge.mod)[,50]
>
> sqrt(sum(coef(ridge.mod)[-1,50]^2))
>
> ridge.mod$lambda[60]
>
> coef(ridge.mod)[,60]
>
> sqrt(sum(coef(ridge.mod)[-1,60]^2))
>
> predict(ridge.mod,s=50,type="coefficients")[1:20,]
>
> set.seed(1)
>
> train=sample(1:nrow(x), nrow(x)/2)
>
> test=(-train)
>
> y.test=y[test]
>
> ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid, thresh=1e-12)
>
> ridge.pred=predict(ridge.mod,s=4,newx=x[test,])
>
> mean((ridge.pred-y.test)^2)
>
> mean((mean(y[train])-y.test)^2)
>
> ridge.pred=predict(ridge.mod,s=1e10,newx=x[test,])
>
> mean((ridge.pred-y.test)^2)
>
> ridge.pred=predict(ridge.mod,s=0,newx=x[test,],exact=T)
>
> mean((ridge.pred-y.test)^2)
>
> lm(y~x, subset=train)
>
> predict(ridge.mod,s=0,exact=T,type="coefficients")[1:20,]
>
> set.seed(1)
>
> cv.out=cv.glmnet(x[train,],y[train],alpha=0)
>
> plot(cv.out)
>
> bestlam=cv.out$lambda.min
>
> bestlam
>
> ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
>
> mean((ridge.pred-y.test)^2)
>
> out=glmnet(x,y,alpha=0)
>
> predict(out,type="coefficients",s=bestlam)[1:20
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]