Gundala Viswanath
2013-Dec-08 07:11 UTC
[R] Why daisy() in cluster library failed to exclude NA when computing dissimilarity
Hi, According to daisy function from cluster documentation, it can compute dissimilarity when NA (missing) value(s) is present. http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html But why when I tried this code library(cluster) x <- c(1.115,NA,NA,0.971,NA) y <- c(NA,1.006,NA,NA,0.645) df <- as.data.frame(rbind(x,y)) daisy(df,metric="gower") It gave this message: Dissimilarities : x y NA Metric : mixed ; Types = I, I, I, I, I Number of objects : 2 Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I welcome other alternative than gower. I expect the dissimilarity output gives a non-NA value e.g. 0. What's the right way to do it? G.V.
Sarah Goslee
2013-Dec-08 14:15 UTC
[R] Why daisy() in cluster library failed to exclude NA when computing dissimilarity
Hi, Please don't cross-post. It's also not necessary to post more than once to the same list if you don't get an immediate response, especially if you've posted on the weekend. On Sunday, December 8, 2013, Gundala Viswanath wrote:> Hi, > > > According to daisy function from cluster documentation, it can compute > dissimilarity when NA (missing) value(s) is present. > > http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html > > But why when I tried this code > > library(cluster) > x <- c(1.115,NA,NA,0.971,NA) > y <- c(NA,1.006,NA,NA,0.645) > df <- as.data.frame(rbind(x,y)) > daisy(df,metric="gower") > > It gave this message: > > Dissimilarities : > x > y NA > > Metric : mixed ; Types = I, I, I, I, I > Number of objects : 2 > Warning messages: > 1: In min(x) : no non-missing arguments to min; returning Inf > 2: In max(x) : no non-missing arguments to max; returning -InfThe third column of your dataframe (note that df() is a base function and thus a bad name for a user object) is all NA, so it's impossible to apply the Gower standardization. daisy() does handle NA values, but it can't read minds to figure out what you expect if all values are NA.> I welcome other alternative than gower. > > I expect the dissimilarity output gives a non-NA value e.g. 0. What's > the right way to do it?If a column has all NA values then it adds nothing to the analysis except problems, and you need to remove it. Sarah -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]]
Martin Maechler
2013-Dec-09 10:36 UTC
[R] Why daisy() in cluster library failed to exclude NA when computing dissimilarity
>>>>> Gundala Viswanath <gundalav at gmail.com> >>>>> on Sun, 8 Dec 2013 16:11:12 +0900 writes:> Hi, According to daisy function from cluster > documentation, it can compute dissimilarity when NA > (missing) value(s) is present. > http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html > But why when I tried this code > library(cluster) > x <- c(1.115,NA,NA,0.971,NA) > y <- c(NA,1.006,NA,NA,0.645) > df <- as.data.frame(rbind(x,y)) > daisy(df,metric="gower") > It gave this message: > Dissimilarities : > x > y NA > Metric : mixed ; Types = I, I, I, I, I > Number of objects : 2 > Warning messages: > 1: In min(x) : no non-missing arguments to min; returning Inf > 2: In max(x) : no non-missing arguments to max; returning -Inf > I welcome other alternative than gower. > I expect the dissimilarity output gives a non-NA value e.g. 0. What's > the right way to do it? Thank you, Gundala, for using a simple reproducible example. Reading the documentation about Gower's distance a bit more, you'd have found that it works by basically giving weight zero to *pairs* of variable values where one of the two values is missing. In situations like yours, *all* pairs have at least one missing, so there's no way to get a non-NA distance. *AND* the documentation already contains this, at the very end of the section 'Details' : If all weights w_k delta(ij;k) are zero, the dissimilarity is set to ?NA?. I.e., we have> install.packages("fortunes") > fortune("WTFM")This is all documented in TFM. Those who WTFM don't want to have to WTFM again on the mailing list. RTFM. -- Barry Rowlingson R-help (October 2003) ... which I now did in spite of Barry's excellent point ... let's say it's because of approaching Christmas ! Martin Maechler, ETH Zurich
James W. MacDonald
2013-Dec-09 14:57 UTC
[R] [BioC] Why daisy() in cluster library failed to exclude NA when computing dissimilarity
Hi Gundala, This question isn't about a Bioconductor package, so should be asked on R-help instead. Best, Jim On Sunday, December 08, 2013 2:11:12 AM, Gundala Viswanath wrote:> Hi, > > > According to daisy function from cluster documentation, it can compute > dissimilarity when NA (missing) value(s) is present. > > http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html > > But why when I tried this code > > library(cluster) > x <- c(1.115,NA,NA,0.971,NA) > y <- c(NA,1.006,NA,NA,0.645) > df <- as.data.frame(rbind(x,y)) > daisy(df,metric="gower") > > It gave this message: > > Dissimilarities : > x > y NA > > Metric : mixed ; Types = I, I, I, I, I > Number of objects : 2 > Warning messages: > 1: In min(x) : no non-missing arguments to min; returning Inf > 2: In max(x) : no non-missing arguments to max; returning -Inf > > I welcome other alternative than gower. > > I expect the dissimilarity output gives a non-NA value e.g. 0. What's > the right way to do it? > > G.V. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor-- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099