Hi all,
I am using the lda function from the MASS library to measure the
discriminance of different variables with respect to different
grouping variables by using
lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where RESULTVARS contains
some 750 different variables.
Occasionally there is a variable within RESULTVARS that has the same values for
all values of GROUPVAR, ie no variance so I get the
error:
Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1)
As I understand it, this is due to the a division of zero in one svd function
that is used by lda. The nature of my results are such
that every now and than I will get a case where all the values for a RESULTVARS
variable are constant. Is there a way of getting
past this problem. For example, by using the tol=0 parameter I can avoid
problems when the variables are the same within a
particular group. As far as I am concerned, cases where the variables are
constant across all groups is saying that that variable
has zero discriminance.
Example values of the grouping variable are:> d$subject
[1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K F I J E D C
B A G H K F I J E D C B A H G K F I J E D C B A H G
K F I J E D C B A H G K F I J
Levels: A B C D E F G H I J K
Example values of the results variables with no errors
are:> d[,104]
[1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324 2.906696
3.045316 1.995411 2.488661 2.976581 2.917944 3.089677
2.850058 2.758467
[17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380 2.461088
2.539655 2.267584 2.599100 2.575934 2.858999 2.311193
2.515690 2.490992
[33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788 3.155828
3.250491 3.095101 2.956129 3.157974 3.093765 2.682200
3.072632 2.931168
[49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262 3.112574
2.890100 2.813462 2.694520 3.058201 2.761940 2.835700
2.829152 2.834158
[65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842 2.798146
2.576489 2.690214 2.865670 2.499521 2.900491
and finally, example of the results variables WITH ERRORS are:
d[,105]
[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3
Can anyone suggest a workaround for this problem. Your help would be greatly
appreciated.
Many Thanks For Your Help
Rishabh
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 3 Jul 2002, Rishabh Gupta wrote:> Hi all, > I am using the lda function from the MASS library to measure the discriminance of different variables with respect to different > grouping variables by using > > lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where RESULTVARS contains some 750 different variables. > > Occasionally there is a variable within RESULTVARS that has the same values for all values of GROUPVAR, ie no variance so I get the > error: > > Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1) > > As I understand it, this is due to the a division of zero in one svd function that is used by lda. The nature of my results are such > that every now and than I will get a case where all the values for a RESULTVARS variable are constant. Is there a way of getting > past this problem. For example, by using the tol=0 parameter I can avoid problems when the variables are the same within a > particular group. As far as I am concerned, cases where the variables are constant across all groups is saying that that variable > has zero discriminance.You are wrong: that is what the tol argument is for. You need to delete the constant variables, which contradict the lda model. Do you really believe is multivariate normality for 750 dimensions? I wouldn't! -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Here is one idea:
1) use try(), as in try(lda()),
then, when try() indicates an error, use a function (which you will
have to write), to search through all the variables in RESULTVARS,
find out which one(s) are offending, and then call lda() again
omitting those.
Here's an untested sketch of the search function (some debugging
probably needed):
findgood <- function(df,cols=1:750) {
ok <- numeric()
for (j in cols) if (sum(!duplicated(df[,j])) > 1) ok <- c(ok,j)
ok
}
it is supposed to return a vector of column numbers of columns that
are ok to use. Use it like this:
ok <- findgood(RESULTVARS,1:750)
then
lda(RESULTVARS[,ok] , GROUPVAR)
sapply() instead of for() might be faster, though perhaps not easier
to understand.
-Don
At 7:12 PM +0100 7/3/02, Rishabh Gupta wrote:>Hi all,
> I am using the lda function from the MASS library to measure the
>discriminance of different variables with respect to different
>grouping variables by using
>
> lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where
>RESULTVARS contains some 750 different variables.
>
>Occasionally there is a variable within RESULTVARS that has the same
>values for all values of GROUPVAR, ie no variance so I get the
>error:
>
>Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1)
>
>As I understand it, this is due to the a division of zero in one svd
>function that is used by lda. The nature of my results are such
>that every now and than I will get a case where all the values for a
>RESULTVARS variable are constant. Is there a way of getting
>past this problem. For example, by using the tol=0 parameter I can
>avoid problems when the variables are the same within a
>particular group. As far as I am concerned, cases where the
>variables are constant across all groups is saying that that variable
>has zero discriminance.
>Example values of the grouping variable are:
>> d$subject
> [1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K
>F I J E D C B A G H K F I J E D C B A H G K F I J E D C B A H G
>K F I J E D C B A H G K F I J
> Levels: A B C D E F G H I J K
>
>Example values of the results variables with no errors are:
>> d[,104]
> [1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324
>2.906696 3.045316 1.995411 2.488661 2.976581 2.917944 3.089677
>2.850058 2.758467
>[17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380
>2.461088 2.539655 2.267584 2.599100 2.575934 2.858999 2.311193
>2.515690 2.490992
>[33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788
>3.155828 3.250491 3.095101 2.956129 3.157974 3.093765 2.682200
>3.072632 2.931168
>[49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262
>3.112574 2.890100 2.813462 2.694520 3.058201 2.761940 2.835700
>2.829152 2.834158
>[65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842
>2.798146 2.576489 2.690214 2.865670 2.499521 2.900491
>
>and finally, example of the results variables WITH ERRORS are:
>d[,105]
> [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
>3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
>3 3 3 3 3 3 3 3 3 3 3 3 3
>
>Can anyone suggest a workaround for this problem. Your help would be
>greatly appreciated.
>
>Many Thanks For Your Help
>
>Rishabh
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Maybe Matching Threads
- call lattice function in a function passing "groups" argument
- using "unstack" inside my function: that old scope problem again
- lm#contrasts#one level in factor: bug or feature
- Percentages for categorical data by group
- How to pass in a list of variables as an argument to a function?