On Dec 26, 2010, at 17:54 , Tiina Hakanen wrote:
> Hi!
>
> I have some questions about MARS model's coefficient of determination.
I use the MARS method in my master's thesis and I have noticed some problems
with
> the MARS model's R^2.
>
> You can see the following example that the MARS model's R^2 is too big
when i have used mars() -function for MARS model building, and when I have made
MARS-model using a linear regression, it gives much smaller R^2.
>
> So can you please tell me some information about why the MARS model R^2 is
so big? How can I get the MARS model?s correct R^2 in R-projector some another
way than in the following example or by calculating it myself using R^2-formula?
This isn't really to do with MARS as such. You have two equivalent linear
models, one with and one without an intercept (i.e., the first column m$x1 is
the constant 1). R computes the R^2 so that it is consistent with the overall F
test, which you can see has three numerator DF in the marsmodel, but only two in
the corresponding linear model. Put differently, the null model is zero in one
case and a constant in the other. This sometimes catches people out, but without
such a convention, no-intercept models could get negative R^2.
Pragmatically, if you are sure that the marsmodel will always contain the
intercept-only model, does lm(data[,1]~m$x) not provide the desired R^2, with a
warning that one parameter is aliased?
>
> I hope you can reply soon.
>
> Best regards,
>
> Tiina Hakanen
>
>
> library(ElemStatLearn)
> library(mda)
> data<-ozone
> m<-mars(data[,-1], data[,1], nk=4)
> m$factor[m$s,]
> m$cuts[m$s,]
> m$coef
> marsmodel<-lm(data[,1]~m$x-1)
> summary(marsmodel)
>
> Call:
> lm(formula = data[, 1] ~ m$x - 1)
>
> Residuals:
> Min 1Q Median 3Q Max
> -36.264 -15.993 -2.351 9.993 122.793
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> m$x1 52.9783 3.8894 13.621 < 2e-16 ***
> m$x2 4.7383 0.9599 4.936 2.92e-06 ***
> m$x3 -1.9428 0.3084 -6.300 6.61e-09 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 23.38 on 108 degrees of freedom
> Multiple R-squared: 0.8147, Adjusted R-squared: 0.8095
> F-statistic: 158.2 on 3 and 108 DF, p-value: < 2.2e-16
>
> knot1 <- function (x,k) ifelse(x > k, x-k, 0)
> knot2 <- function(x, k) ifelse(x < k, k-x, 0)
> reg <- lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data)
>
> summary(reg)
>
> Call:
> lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature,
> 85), data = data)
>
> Residuals:
> Min 1Q Median 3Q Max
> -36.264 -15.993 -2.351 9.993 122.793
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 52.9783 3.8894 13.621 < 2e-16 ***
> knot1(temperature, 85) 4.7383 0.9599 4.936 2.92e-06 ***
> knot2(temperature, 85) -1.9428 0.3084 -6.300 6.61e-09 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 23.38 on 108 degrees of freedom
> Multiple R-squared: 0.5153, Adjusted R-squared: 0.5064
> F-statistic: 57.42 on 2 and 108 DF, p-value: < 2.2e-16
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com