varin sacha
2018-Oct-29 22:50 UTC
[R] MSE Cross-validation with factor interactions terms MARS regression
Hi Bert, Many thanks, I have fixed it but it still don't work... . Best, Le lundi 29 octobre 2018 ? 22:07:26 UTC+1, Bert Gunter <bgunter.4567 at gmail.com> a ?crit : I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code: y=Testing$wage y=Wage[-sam,]$wage This obviously makes no sense, so maybe you should fix this first and then proceed. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <r-help at r-project.org> wrote:> > Dear R-experts, > I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ? > > Here below my reproducible example. > > ####### > > install.packages("ISLR") > > library(ISLR) > > install.packages("earth") > > library(earth) > > a<-as.factor(Wage$education) > > # Create a list to store the results > > lst<-list() > > # This statement does the repetitions (looping) > > for(i in 1?:200) { > > n=dim(Wage)[1] > > p=0.667 > > sam=sample(1?:n,floor(p*n),replace=FALSE) > > Training =Wage [sam,] > > Testing = Wage [-sam,] > > mars5<-earth(wage~age+education+year+age*a, data=Wage) > > ypred=predict(mars5,newdata=Testing) > > y=Testing$wage > > y=Wage[-sam,]$wage > > MSE = mean(y-ypred)^2 > > MSE > > lst[i]<-MSE > > } > > mean(unlist(lst)) > > summary(mars5) > > ####### > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
peter dalgaard
2018-Oct-29 23:30 UTC
[R] MSE Cross-validation with factor interactions terms MARS regression
The two lines did the same thing, so little wonder... More likely, the culprit is that a is assigned in the global environment, and then used in a prediction on a subset. Also, - you are defining Training, but as far as I can tell, you're not using it. Not likely to be an issue in itself, but wouldn't you want to fit on the Training set and evaluate on the Testing? - your model de facto contains both education as a numeric predictor and as.factor(education) as well as the interaction term age:as.factor(education). Does that make sense modelling-wise?? -pd> On 29 Oct 2018, at 23:50 , varin sacha via R-help <r-help at r-project.org> wrote: > > Hi Bert, > > Many thanks, I have fixed it but it still don't work... . > Best, > > > > > > > Le lundi 29 octobre 2018 ? 22:07:26 UTC+1, Bert Gunter <bgunter.4567 at gmail.com> a ?crit : > > > > > > I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code: > > > y=Testing$wage > > y=Wage[-sam,]$wage > > This obviously makes no sense, so maybe you should fix this first and then proceed. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <r-help at r-project.org> wrote: >> >> Dear R-experts, >> I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ? >> >> Here below my reproducible example. >> >> ####### >> >> install.packages("ISLR") >> >> library(ISLR) >> >> install.packages("earth") >> >> library(earth) >> >> a<-as.factor(Wage$education) >> >> # Create a list to store the results >> >> lst<-list() >> >> # This statement does the repetitions (looping) >> >> for(i in 1 :200) { >> >> n=dim(Wage)[1] >> >> p=0.667 >> >> sam=sample(1 :n,floor(p*n),replace=FALSE) >> >> Training =Wage [sam,] >> >> Testing = Wage [-sam,] >> >> mars5<-earth(wage~age+education+year+age*a, data=Wage) >> >> ypred=predict(mars5,newdata=Testing) >> >> y=Testing$wage >> >> y=Wage[-sam,]$wage >> >> MSE = mean(y-ypred)^2 >> >> MSE >> >> lst[i]<-MSE >> >> } >> >> mean(unlist(lst)) >> >> summary(mars5) >> >> ####### >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
varin sacha
2018-Oct-30 20:38 UTC
[R] MSE Cross-validation with factor interactions terms MARS regression
Dear Prof. Dalgaard, I really thank you lots for your comments and responses. It perfectly works ! Many thanks. Le mardi 30 octobre 2018 ? 00:30:11 UTC+1, peter dalgaard <pdalgd at gmail.com> a ?crit : The two lines did the same thing, so little wonder... More likely, the culprit is that a is assigned in the global environment, and then used in a prediction on a subset. Also, - you are defining Training, but as far as I can tell, you're not using it. Not likely to be an issue in itself, but wouldn't you want to fit on the Training set and evaluate on the Testing? - your model de facto contains both education as a numeric predictor and as.factor(education) as well as the interaction term age:as.factor(education). Does that make sense modelling-wise?? -pd> On 29 Oct 2018, at 23:50 , varin sacha via R-help <r-help at r-project.org> wrote: > > Hi Bert, > > Many thanks, I have fixed it but it still don't work... . > Best, > > > > > > > Le lundi 29 octobre 2018 ? 22:07:26 UTC+1, Bert Gunter <bgunter.4567 at gmail.com> a ?crit : > > > > > > I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code: > > > y=Testing$wage > > y=Wage[-sam,]$wage > > This obviously makes no sense, so maybe you should fix this first and then proceed. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <r-help at r-project.org> wrote: >> >> Dear R-experts, >> I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ? >> >> Here below my reproducible example. >> >> ####### >> >> install.packages("ISLR") >> >> library(ISLR) >> >> install.packages("earth") >> >> library(earth) >> >> a<-as.factor(Wage$education) >> >> # Create a list to store the results >> >> lst<-list() >> >> # This statement does the repetitions (looping) >> >> for(i in 1 :200) { >> >> n=dim(Wage)[1] >> >> p=0.667 >> >> sam=sample(1 :n,floor(p*n),replace=FALSE) >> >> Training =Wage [sam,] >> >> Testing = Wage [-sam,] >> >> mars5<-earth(wage~age+education+year+age*a, data=Wage) >> >> ypred=predict(mars5,newdata=Testing) >> >> y=Testing$wage >> >> y=Wage[-sam,]$wage >> >> MSE = mean(y-ypred)^2 >> >> MSE >> >> lst[i]<-MSE >> >> } >> >> mean(unlist(lst)) >> >> summary(mars5) >> >> ####### >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.>> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk? Priv: PDalgd at gmail.com