I'm trying to understand how the earth package treats linearly
dependent regressors. I was surprised when switching between two
linearly-dependent terms gave different results. Here's an example:
> library(earth)
> cars2 <- transform(cars, speed2=100-speed)
> earth(dist ~ speed, data=cars2)
Selected 3 of 7 terms, and 1 of 1 predictors
Importance: speed
Number of terms at each degree of interaction: 1 2 (additive model)
GCV 255.4974 RSS 10347.64 GRSq 0.622945 RSq
0.6819924> earth(dist ~ speed2, data=cars2)
Selected 3 of 7 terms, and 1 of 1 predictors
Importance: speed2
Number of terms at each degree of interaction: 1 2 (additive model)
GCV 246.9339 RSS 10000.82 GRSq 0.6355828 RSq 0.692651
Naively, I expected these two fits to be identical, since for each
step of the forward pass, the hinge functions considered for speed2
are the same as for speed, just exchanged and with a different
constant. Then, for the backward pass, each component should add the
same amount to the GCV.
Can anyone shed any light on what's going on here?
Thanks,
Johann