The following code in clusplot.default (package cluster) is in error: x1 <- cmdscale(x, k = 2, eig = TRUE) var.dec <- sum(x1$eig)/sum(diag(x1$x)) if (var.dec < 0) var.dec <- 0 if (var.dec > 1) var.dec <- 1 x1 <- x1$points x1 has components with names "points" and "eig", not "x", so sum(diag(x1$x)) returns 0, the division gives Inf which is later replaced by 1. So in the plot it is reported (always) that "These two components explain 100% of the variability". Besides, is it reasonable that sum(NULL) returns 0 without at least a warning? Another small point about the cluster package: it loads automatically mva, but that is not mentioned in the Depends field in the description file. Kjetil Halvorsen -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 7 Jan 2002 kjetilh@umsanet.edu.bo wrote:> > > > > > The following code in clusplot.default (package cluster) is in error: > > x1 <- cmdscale(x, k = 2, eig = TRUE) > var.dec <- sum(x1$eig)/sum(diag(x1$x)) > if (var.dec < 0) > var.dec <- 0 > if (var.dec > 1) > var.dec <- 1 > x1 <- x1$points > > x1 has components with names "points" and "eig", not "x", so > sum(diag(x1$x)) returns 0, the division gives Inf which is later replaced > by 1. > So in the plot it is reported (always) that "These two components explain > 100% > of the variability". > > Besides, is it reasonable that sum(NULL) returns 0 without at least a > warning?Yes: an empty sum is zero in mathematics. S code relies on such things.> Another small point about the cluster package: it loads automatically > mva, but that is not mentioned in the Depends field in the description file.It depends on base too, but that's not mentioned. The convention is to only include packages that are not in the standard tar bundle. We can (and do) move things around between those, and indeed have talked about some major reorganization of them (back as far DSC99, in fact). You'll find most packages follow that convention: MASS uses almost all the standard packages somewhere. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
maechler@stat.math.ethz.ch
2002-Jan-10 08:25 UTC
[Rd] cluster - clusplot.default (PR#1249)
>>>>> "kjetil" == kjetil halvorsen <kjetilh@umsanet.edu.bo> writes:kjetil> The following code in clusplot.default (package cluster) is in error: kjetil> x1 <- cmdscale(x, k = 2, eig = TRUE) kjetil> var.dec <- sum(x1$eig)/sum(diag(x1$x)) kjetil> if (var.dec < 0) var.dec <- 0 kjetil> if (var.dec > 1) var.dec <- 1 kjetil> x1 <- x1$points kjetil> x1 has components with names "points" and "eig", not kjetil> "x", so sum(diag(x1$x)) returns 0, the division kjetil> gives Inf which is later replaced by 1. So in the kjetil> plot it is reported (always) that "These two kjetil> components explain 100% of the variability". Thank you Kjetil. Yes, there's definitely a problem there. However the solution is not as easy: Doing the replacement you suggest is not enough, since var.dec still is not scaled to [0,1]. Before the lines you cite above, there is ##x1 <- cmd(x, k = 2, eig = T, add = T) ##if(x1$ac < 0) ## x1 <- cmd(x, k = 2, eig = T) which was Rousseeuw et al's original code -- instead of the x1 <- cmdscale(...) line above. And cmd() was an internal function calling directly into undocumented S-plus internal Fortran code... The original porter of the cluster package had replaced the cmd() by cmdscale() which seemed but was not ok. I'll have a look. Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <>< -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
maechler@stat.math.ethz.ch
2002-Jan-30 08:11 UTC
[Rd] cluster - clusplot.default (PR#1249)
>>>>> "MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:>>>>> "kjetil" == kjetil halvorsen <kjetilh@umsanet.edu.bo> writes:kjetil> The following code in clusplot.default (package kjetil> cluster) is in error: kjetil> x1 <- cmdscale(x, k = 2, eig = TRUE) var.dec <- kjetil> sum(x1$eig)/sum(diag(x1$x)) if (var.dec < 0) var.dec kjetil> <- 0 if (var.dec > 1) var.dec <- 1 x1 <- x1$points kjetil> x1 has components with names "points" and "eig", not kjetil> "x", so sum(diag(x1$x)) returns 0, the division kjetil> gives Inf which is later replaced by 1. So in the kjetil> plot it is reported (always) that "These two kjetil> components explain 100% of the variability". MM> Thank you Kjetil. Yes, there's definitely a problem MM> there. However the solution is not as easy: Doing the MM> replacement you suggest is not enough, since var.dec MM> still is not scaled to [0,1]. MM> Before the lines you cite above, there is MM> ##x1 <- cmd(x, k = 2, eig = T, add = T) MM> ##if(x1$ac < 0) ## x1 <- cmd(x, k = 2, eig = T) MM> which was Rousseeuw et al's original code -- instead of MM> the x1 <- cmdscale(...) line above. And cmd() was an MM> internal function calling directly into undocumented MM> S-plus internal Fortran code... The original porter of MM> the cluster package had replaced the cmd() by cmdscale() MM> which seemed but was not ok. MM> I'll have a look. This is fixed in the now current version of cluster, 1.4-0. Note that cmdscale() will have an ``add ='' option from R 1.5.0 on. Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <>< -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._