thr3ads.net - R help - [R] lda from MASS function [Jul 2002]

If this information is useful, please help other people find it:
Share via:

Rishabh Gupta

2002-Jul-03 18:12 UTC

[R] lda from MASS function

Hi all,
    I am using the lda function from the MASS library to measure the
discriminance of different variables with respect to different
grouping variables by using

    lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 )    where RESULTVARS contains
some 750 different variables.

Occasionally there is a variable within RESULTVARS that has the same values for
all values of GROUPVAR, ie no variance so I get the
error:

Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1)

As I understand it, this is due to the a division of zero in one svd function
that is used by lda. The nature of my results are such
that every now and than I will get a case where all the values for a RESULTVARS
variable are constant. Is there a way of getting
past this problem. For example, by using the tol=0 parameter I can avoid
problems when the variables are the same within a
particular group. As far as I am concerned, cases where the variables are
constant across all groups is saying that that variable
has zero discriminance.
Example values of the grouping variable are:> d$subject    [1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K F I J E D C
B A G H K F I J E D C B A H G K F I J E D C B A H G
K F I J E D C B A H G K F I J
    Levels:  A B C D E F G H I J K

Example values of the results variables with no errors
are:> d[,104] [1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324 2.906696
3.045316 1.995411 2.488661 2.976581 2.917944 3.089677
2.850058 2.758467
[17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380 2.461088
2.539655 2.267584 2.599100 2.575934 2.858999 2.311193
2.515690 2.490992
[33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788 3.155828
3.250491 3.095101 2.956129 3.157974 3.093765 2.682200
3.072632 2.931168
[49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262 3.112574
2.890100 2.813462 2.694520 3.058201 2.761940 2.835700
2.829152 2.834158
[65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842 2.798146
2.576489 2.690214 2.865670 2.499521 2.900491

and finally, example of the results variables WITH ERRORS are:
d[,105]
 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3

Can anyone suggest a workaround for this problem. Your help would be greatly
appreciated.

Many Thanks For Your Help

Rishabh

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

ripley@stats.ox.ac.uk

2002-Jul-03 19:04 UTC

head link

[R] lda from MASS function

On Wed, 3 Jul 2002, Rishabh Gupta wrote:
> Hi all,
>     I am using the lda function from the MASS library to measure the
discriminance of different variables with respect to different
> grouping variables by using
>
>     lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 )    where RESULTVARS
contains some 750 different variables.
>
> Occasionally there is a variable within RESULTVARS that has the same values
for all values of GROUPVAR, ie no variance so I get the
> error:
>
> Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1)
>
> As I understand it, this is due to the a division of zero in one svd
function that is used by lda. The nature of my results are such
> that every now and than I will get a case where all the values for a
RESULTVARS variable are constant. Is there a way of getting
> past this problem. For example, by using the tol=0 parameter I can avoid
problems when the variables are the same within a
> particular group. As far as I am concerned, cases where the variables are
constant across all groups is saying that that variable
> has zero discriminance.
You are wrong: that is what the tol argument is for.
You need to delete the constant variables, which contradict the lda model.

Do you really believe is multivariate normality for 750 dimensions?
I wouldn't!

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Don MacQueen

2002-Jul-03 20:21 UTC

head link

[R] lda from MASS function

Here is one idea:
   1) use try(), as in try(lda()),
then, when try() indicates an error, use a function (which you will 
have to write), to search through all the variables in RESULTVARS, 
find out which one(s) are offending, and then call lda() again 
omitting those.

Here's an untested sketch of the search function (some debugging 
probably needed):

findgood <- function(df,cols=1:750) {
    ok <- numeric()
    for (j in cols) if (sum(!duplicated(df[,j])) > 1) ok <- c(ok,j)
    ok
}

it is supposed to return a vector of column numbers of columns that 
are ok to use. Use it like this:

   ok <- findgood(RESULTVARS,1:750)
then
   lda(RESULTVARS[,ok] , GROUPVAR)

sapply() instead of for() might be faster, though perhaps not easier 
to understand.

-Don

At 7:12 PM +0100 7/3/02, Rishabh Gupta wrote:>Hi all,
>     I am using the lda function from the MASS library to measure the 
>discriminance of different variables with respect to different
>grouping variables by using
>
>     lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 )    where 
>RESULTVARS contains some 750 different variables.
>
>Occasionally there is a variable within RESULTVARS that has the same 
>values for all values of GROUPVAR, ie no variance so I get the
>error:
>
>Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1)
>
>As I understand it, this is due to the a division of zero in one svd 
>function that is used by lda. The nature of my results are such
>that every now and than I will get a case where all the values for a 
>RESULTVARS variable are constant. Is there a way of getting
>past this problem. For example, by using the tol=0 parameter I can 
>avoid problems when the variables are the same within a
>particular group. As far as I am concerned, cases where the 
>variables are constant across all groups is saying that that variable
>has zero discriminance.
>Example values of the grouping variable are:
>>  d$subject
>     [1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K 
>F I J E D C B A G H K F I J E D C B A H G K F I J E D C B A H G
>K F I J E D C B A H G K F I J
>     Levels:  A B C D E F G H I J K
>
>Example values of the results variables with no errors are:
>>  d[,104]
>  [1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324 
>2.906696 3.045316 1.995411 2.488661 2.976581 2.917944 3.089677
>2.850058 2.758467
>[17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380 
>2.461088 2.539655 2.267584 2.599100 2.575934 2.858999 2.311193
>2.515690 2.490992
>[33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788 
>3.155828 3.250491 3.095101 2.956129 3.157974 3.093765 2.682200
>3.072632 2.931168
>[49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262 
>3.112574 2.890100 2.813462 2.694520 3.058201 2.761940 2.835700
>2.829152 2.834158
>[65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842 
>2.798146 2.576489 2.690214 2.865670 2.499521 2.900491
>
>and finally, example of the results variables WITH ERRORS are:
>d[,105]
>  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 
>3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
>3 3 3 3 3 3 3 3 3 3 3 3 3
>
>Can anyone suggest a workaround for this problem. Your help would be 
>greatly appreciated.
>
>Many Thanks For Your Help
>
>Rishabh
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jul 2002 - lda from MASS function

[R] lda from MASS function

[R] lda from MASS function

[R] lda from MASS function

Possibly Parallel Threads