thr3ads.net - R help - [R] error using daisy() in library(cluster). Bug? [Aug 2004]

If this information is useful, please help other people find it:
Share via:

javier garcia - CEBAS

2004-Aug-12 10:53 UTC

[R] error using daisy() in library(cluster). Bug?

Hi,
I'm using the cluster library to examine multivariate data.
The data come from a connection to a postgres database, and I did a short R 
script to do the analisys. With the cluster version included in R1.8.0, daisy 
worked well for my data, but now, when I call daisy, I obtain the following 
messages:
---------
Error in if (any(sx == 0)) { : missing value where TRUE/FALSE needed
In addition: Warning message:
binary variable(s) 116 treated as interval scaled in: 
daisy(concentracion.data.frame, stand = TRUE)
---------

Al the variables in my dataframe are numeric. Although I've got NA values, 
and I've seen that if a do the analisys for a subset of the dataframe, 
selecting just columns with no NA, the result is good.
Could this be a bug?

Thanks, and best regards

Javier

Martin Maechler

2004-Aug-12 12:24 UTC

head link

[R] error using daisy() in library(cluster). Bug?

??Hola Javier!

since I am the maintainer of the cluster  
*package* (not "library"), I'm interested to find out more about
this problem.  I assume, you now use R 1.9.1.

Can you give us an example we can reproduce?
Give the exact R commands you use and 
maybe attach the save()d data file (*.rda) in a private e-mail?

Or do this on R-help and give an URL where one can download the
data (you can't attach such binary files for R-help).

Thank you,
Martin Maechler
>>>>> "javier" == javier garcia <- CEBAS <rn001
at cebas.csic.es>>
>>>>>     on Thu, 12 Aug 2004 12:53:28 +0200 writes:
    javier> Hi, I'm using the cluster library to examine
    javier> multivariate data.  The data come from a connection
    javier> to a postgres database, and I did a short R script
    javier> to do the analisys. With the cluster version
    javier> included in R1.8.0, daisy worked well for my data,
    javier> but now, when I call daisy, I obtain the following
    javier> messages: --------- Error in if (any(sx == 0)) { :
    javier> missing value where TRUE/FALSE needed In addition:
    javier> Warning message: binary variable(s) 116 treated as
    javier> interval scaled in: daisy(concentracion.data.frame,
    javier> stand = TRUE) ---------

    javier> Al the variables in my dataframe are
    javier> numeric. Although I've got NA values, and I've seen
    javier> that if a do the analisys for a subset of the
    javier> dataframe, selecting just columns with no NA, the
    javier> result is good.  Could this be a bug?

    javier> Thanks, and best regards

    javier> Javier

Martin Maechler

2004-Aug-12 15:59 UTC

head link

[R] error using daisy() in library(cluster). Bug?

[Reverted back to R-help, after private exchange]
>>>>> "MM" == Martin Maechler <maechler at
stat.math.ethz.ch>
>>>>>     on Thu, 12 Aug 2004 17:12:01 +0200 writes:
>>>>> "javier" == javier garcia <- CEBAS <rn001
at cebas.csic.es>>
>>>>>     on Thu, 12 Aug 2004 16:28:27 +0200 writes:
    javier> Martin; Yes I know that there are variables with all
    javier> five values 'NA'. I've left them as they are just
    javier> because of saving a couple of lines in the script,
    javier> and because I like to see that they are there,
    javier> although all values are 'NA'.  I don't expect they
    javier> are used in the analysis, but are they the source of
    javier> the problem?

    MM> yes, but only because of "stand = TRUE".

    MM> Yes, one could imagine that it might be good when
    MM> standardizing these "all NA variables" would work

    MM> I'll think a bit more about it.  Thank you for the
    MM> example.

Ok. I've thought (and looked at the R code) a bit longer.
Also considered the fact (you mentioned) that this worked in R 1.8.0.
Hence, I'm considering the current behavior a bug.

Here is the patch (apply to cluster/R/daisy.q in the *source*
 or at the appriopriate place in <cluster_installed>/R/cluster ) :

--- daisy.q	2004/06/25 16:17:47	1.17
+++ daisy.q	2004/08/12 15:23:26
@@ -78,8 +78,8 @@
     if(all(type2 == "I")) {
 	if(stand) {
             x <- scale(x, center = TRUE, scale = FALSE) #-> 0-means
-            sx <- colMeans(abs(x))
-            if(any(sx == 0)) {
+	    sx <- colMeans(abs(x), na.rm = TRUE)# can still have NA's
+	    if(0 %in% sx) {
                 warning(sQuote("x"), " has constant columns
",
                         pColl(which(sx == 0)), "; these are standardized
to 0")
                 sx[sx == 0] <- 1


Thank you for helping to find and fix this bug.
Martin Maechler, ETH Zurich, Switzerland

    javier> El Jue 12 Ago 2004 15:11, MM escribi??:

    >>> Javier, I could well read your .RData and try your
    >>> script to produce the same error from daisy().
    >>> 
    >>> Your dataframe is of dimension 5 x 180 and has many
    >>> variables that have all five values 'NA' (see below).
    >>> 
    >>> You can't expect to use these, do you?  Martin

Seemingly Similar Threads

Search for more maybe matching threads

R help - Aug 2004 - error using daisy() in library(cluster). Bug?

[R] error using daisy() in library(cluster). Bug?

[R] error using daisy() in library(cluster). Bug?

[R] error using daisy() in library(cluster). Bug?

Seemingly Similar Threads