Hi all, I am using the lda function from the MASS library to measure the discriminance of different variables with respect to different grouping variables by using lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where RESULTVARS contains some 750 different variables. Occasionally there is a variable within RESULTVARS that has the same values for all values of GROUPVAR, ie no variance so I get the error: Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1) As I understand it, this is due to the a division of zero in one svd function that is used by lda. The nature of my results are such that every now and than I will get a case where all the values for a RESULTVARS variable are constant. Is there a way of getting past this problem. For example, by using the tol=0 parameter I can avoid problems when the variables are the same within a particular group. As far as I am concerned, cases where the variables are constant across all groups is saying that that variable has zero discriminance. Example values of the grouping variable are:> d$subject[1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K F I J E D C B A G H K F I J E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K F I J Levels: A B C D E F G H I J K Example values of the results variables with no errors are:> d[,104][1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324 2.906696 3.045316 1.995411 2.488661 2.976581 2.917944 3.089677 2.850058 2.758467 [17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380 2.461088 2.539655 2.267584 2.599100 2.575934 2.858999 2.311193 2.515690 2.490992 [33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788 3.155828 3.250491 3.095101 2.956129 3.157974 3.093765 2.682200 3.072632 2.931168 [49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262 3.112574 2.890100 2.813462 2.694520 3.058201 2.761940 2.835700 2.829152 2.834158 [65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842 2.798146 2.576489 2.690214 2.865670 2.499521 2.900491 and finally, example of the results variables WITH ERRORS are: d[,105] [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Can anyone suggest a workaround for this problem. Your help would be greatly appreciated. Many Thanks For Your Help Rishabh -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 3 Jul 2002, Rishabh Gupta wrote:> Hi all, > I am using the lda function from the MASS library to measure the discriminance of different variables with respect to different > grouping variables by using > > lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where RESULTVARS contains some 750 different variables. > > Occasionally there is a variable within RESULTVARS that has the same values for all values of GROUPVAR, ie no variance so I get the > error: > > Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1) > > As I understand it, this is due to the a division of zero in one svd function that is used by lda. The nature of my results are such > that every now and than I will get a case where all the values for a RESULTVARS variable are constant. Is there a way of getting > past this problem. For example, by using the tol=0 parameter I can avoid problems when the variables are the same within a > particular group. As far as I am concerned, cases where the variables are constant across all groups is saying that that variable > has zero discriminance.You are wrong: that is what the tol argument is for. You need to delete the constant variables, which contradict the lda model. Do you really believe is multivariate normality for 750 dimensions? I wouldn't! -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Here is one idea: 1) use try(), as in try(lda()), then, when try() indicates an error, use a function (which you will have to write), to search through all the variables in RESULTVARS, find out which one(s) are offending, and then call lda() again omitting those. Here's an untested sketch of the search function (some debugging probably needed): findgood <- function(df,cols=1:750) { ok <- numeric() for (j in cols) if (sum(!duplicated(df[,j])) > 1) ok <- c(ok,j) ok } it is supposed to return a vector of column numbers of columns that are ok to use. Use it like this: ok <- findgood(RESULTVARS,1:750) then lda(RESULTVARS[,ok] , GROUPVAR) sapply() instead of for() might be faster, though perhaps not easier to understand. -Don At 7:12 PM +0100 7/3/02, Rishabh Gupta wrote:>Hi all, > I am using the lda function from the MASS library to measure the >discriminance of different variables with respect to different >grouping variables by using > > lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where >RESULTVARS contains some 750 different variables. > >Occasionally there is a variable within RESULTVARS that has the same >values for all values of GROUPVAR, ie no variance so I get the >error: > >Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1) > >As I understand it, this is due to the a division of zero in one svd >function that is used by lda. The nature of my results are such >that every now and than I will get a case where all the values for a >RESULTVARS variable are constant. Is there a way of getting >past this problem. For example, by using the tol=0 parameter I can >avoid problems when the variables are the same within a >particular group. As far as I am concerned, cases where the >variables are constant across all groups is saying that that variable >has zero discriminance. >Example values of the grouping variable are: >> d$subject > [1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K >F I J E D C B A G H K F I J E D C B A H G K F I J E D C B A H G >K F I J E D C B A H G K F I J > Levels: A B C D E F G H I J K > >Example values of the results variables with no errors are: >> d[,104] > [1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324 >2.906696 3.045316 1.995411 2.488661 2.976581 2.917944 3.089677 >2.850058 2.758467 >[17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380 >2.461088 2.539655 2.267584 2.599100 2.575934 2.858999 2.311193 >2.515690 2.490992 >[33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788 >3.155828 3.250491 3.095101 2.956129 3.157974 3.093765 2.682200 >3.072632 2.931168 >[49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262 >3.112574 2.890100 2.813462 2.694520 3.058201 2.761940 2.835700 >2.829152 2.834158 >[65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842 >2.798146 2.576489 2.690214 2.865670 2.499521 2.900491 > >and finally, example of the results variables WITH ERRORS are: >d[,105] > [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 >3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 >3 3 3 3 3 3 3 3 3 3 3 3 3 > >Can anyone suggest a workaround for this problem. Your help would be >greatly appreciated. > >Many Thanks For Your Help > >Rishabh > >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- >r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html >Send "info", "help", or "[un]subscribe" >(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA -------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Possibly Parallel Threads
- call lattice function in a function passing "groups" argument
- using "unstack" inside my function: that old scope problem again
- lm#contrasts#one level in factor: bug or feature
- Percentages for categorical data by group
- How to pass in a list of variables as an argument to a function?