On 28-Aug-08 11:04:47, Wi> lliams, Robin wrote:> Hi all,
> When using lm to model a response with 8 explanatory variables,
> one of the variables is not defined due to "singularities".
> I have checked the csv file from which the data come, there are
> no na's in the dataset, etc. What should I be looking for in this
> variable to correct the problem?
> Thanks for any help.
>
> Robin Williams
> Met Office summer intern - Health Forecasting
> robin.williams at metoffice.gov.uk
You should be looking at whether the variable is expressible,
throughout, as a linear combination of the other explanatory
variables.
Try svd(X) where X is the transpose of the matrix of explanatory
variables from the CSV file (i.e. X has variables as rows, cases
as columns) -- if you get a very small value in svd(X)$d then that
points to your problem. Example:
X<-rbind(rnorm(10),rnorm(10),rnorm(10))
X <- rbind(X,X[1,]+ 0.5*X[2,]-0.25*X[3,])
svd(X)
# $d
# [1] 4.355094e+00 3.717386e+00 2.101743e+00 1.842137e-16
# $u
# [,1] [,2] [,3] [,4]
# [1,] -0.71645227 -0.1715990 -0.15753569 -0.657596
# [2,] 0.47501937 -0.8106696 0.09520123 -0.328798
# [3,] 0.09347153 -0.1262667 -0.97380325 0.164399
# [4,] -0.50231047 -0.5453671 0.13351574 0.657596
# (plus the right-hand eigenvectors)
Note the very small value of $d[4] -- this is the machine approximation
to zero. The corresponding left eigenvector $u[,4] annihilates X:
V <- svd(X)$u[,4]
t(V)%*%X
# [,1] [,2] [,3] [,4]
# [1,] 4.228388e-17 -3.894996e-17 -8.020386e-17 -9.790346e-17
# [,5] [,6] [,7] [,8]
# [1,] -2.710505e-17 -2.683400e-18 -3.876700e-17 8.250779e-17
# [,9] [,10]
# [1,] 1.439278e-16 1.015762e-17
all of which are machine approximations to zero!
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Aug-08 Time: 12:42:31
------------------------------ XFMail ------------------------------