You are using a method that needs to estimate the covariance matrix of all the variables. If you have 80 variables, there are (80+1)*80/2 = 3240 variances and covariances to estimate. How many data points do you think you need to do that? Some people assume the covariance matrix is diagonal (i.e., assuming all the variables are uncorrelated). Even then you still have 80 variances to estimate. Besides, I don't think lda() does that. If you really have to discrinimate 40 data points with 80 variables, use methods that do not rely on the estimation of covariance matrix. Andy> -----Original Message----- > From: Adaikalavan Ramasamy [mailto:ramasamy at stats.ox.ac.uk] > Sent: Friday, July 12, 2002 4:08 PM > To: r-help at r-project.org; allstat at jiscmail.ac.uk > Subject: [R] meaning of error message about collinearity > > > Just a quick question. I am trying to fit an LDA model with a > restricted > subsample. My X is a numerical matrix and Y is vector of > factor response. > > fit _ lda( Y[1:50] ~ X[1:50, ] ) > > gives the following error message: variables are collinear in: > lda.default(x, grouping, ...) > > I am guessing this is the problem of rank deficiency as I > have about 80 > variable. [since the lda works with subsample of size 80 and above] > > Q1: Is my interpretation of the error message correct ? > Q2: I am using the fit to prediction purposes etc. Is this > likely to be > affected. ie how serious is this problem ? > Q3: Is there a good website about sound statistical > theory/practical of > overcoming problem of rank deficiency if this is indeed the > source of the > error. > > Many thanks, Adai. > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Adaikalavan Ramasamy wrote asking about a collinearity problem where he had 40 cases and 80 variables. Andy Liaw replied (in part)>>>If you really have to discrinimate 40 data points with 80 variables, >>>use methods that do not rely on the estimation of covariance matrix.My reply I am curious as to how this could be meaningfully done. I would have thought that having more variables than data points would be a problem with ANY technique, but I am eager to be enlightened about ways to do this. Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> From: Peter Flom [mailto:flom at ndri.org] > Sent: Monday, July 15, 2002 11:09 AM > > Adaikalavan Ramasamy wrote asking about a collinearity > problem where he had 40 cases and 80 variables. > > Andy Liaw replied (in part) > >>>If you really have to discrinimate 40 data points with 80 > variables, >>>use methods that do not rely on the estimation > of covariance matrix. > > > My reply > > I am curious as to how this could be meaningfully done. I > would have thought that having more variables than data > points would be a problem with ANY technique, but I am eager > to be enlightened about ways to do this. >Have you not heard of PLS (partial least squares) or classification trees, to mention two of many? How do you think people deal with microarray data, where there are typically dozens of data points but > 7000 variables? It is a difficult problem, but people get by with what they can cobble together. Andy> > > Peter L. Flom, PhD > Assistant Director, Statistics and Data Analysis Core > Center for Drug Use and HIV Research > National Development and Research Institutes > 71 W. 23rd St > New York, NY 10010 > (212) 845-4485 (voice) > (917) 438-0894 (fax) > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._