thr3ads.net - similar to: "Any better way of optimizing time for calculating distances in the mentioned scenario??"

Displaying 20 results from an estimated 1000 matches similar to: "Any better way of optimizing time for calculating distances in the mentioned scenario??"

cclust causes R to crash when using manhattan kmeans

2006 Apr 07

cclust causes R to crash when using manhattan kmeans

Dear R users, When I run the following code, R crashes: require(cclust) x <- matrix(c(0,0,0,1.5,1,-1), ncol=2, byrow=TRUE) cclust(x, centers=x[2:3,], dist="manhattan", method="kmeans") While this works: cclust(x, centers=x[2:3,], dist="euclidean", method="kmeans") I'm posting this here because I am not sure if it is a bug. I've been searching

[cluster package question] What is the "sum of the dissimilarities" in the pam command ?

2009 Mar 29

[cluster package question] What is the "sum of the dissimilarities" in the pam command ?

Hello Martin Maechler and All, A simple question (I hope): How can I compute the "sum of the dissimilarities" that appears in the pam command (from the cluster package) ? Is it the "manhattan" distance (such as the one implemented by "dist") ? I am asking since I am running clustering on a dataset. I found 7 medoids with the pam command, and from it I have the

bug (?!) in "pam()" clustering from fpc package ?

2008 Dec 17

bug (?!) in "pam()" clustering from fpc package ?

Hello all. I wish to run k-means with "manhattan" distance. Since this is not supported by the function "kmeans", I turned to the "pam" function in the "fpc" package. Yet, when I tried to have the algorithm run with different starting points, I found that pam ignores and keep on starting the algorithm from the same starting-points (medoids). For my

title for plot contain 4 subplots

2003 Sep 14

title for plot contain 4 subplots

Hi, I'm plotting 4 graphs on one page (2x2 matrix) but I cant seem to get the title for the whole page right. I'm doing: op <- par(mfrow = c(2,2), pty="s") hist(var$V2, breaks="FD",main="Euclidean Metric", xlab="Sum of 3NN ... hist(var$V2, breaks="FD",main="Manhattan Metric", xlab="Sum of 3NN ... hist(var$V2,

Document clustering for R

2005 Sep 12

Document clustering for R

I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for

Manhattan Plot

2011 Sep 09

Manhattan Plot

To whom it may concern: My name is Jillian Weinfeld. I am currently and undergraduate student at New York University and working at Mount Sinai School of Medicine doing research with epilepsy patients. At the moment I am creating a manhattan plot with my data set. After reading many forums and such, I have appropriately plotted my data, however, I wanted to see how I can change the colors of the

mahalanobis distance

2004 Sep 12

mahalanobis distance

Is there a function that calculate the mahalanobis distance in R . The dist function calculates "euclidean"', '"maximum"', '"manhattan"', '"canberra"', '"binary"' or '"minkowski"'. Thanks ../Murli

Clustering

2007 Nov 28

Clustering

Hello all! I am performingsome clustering analysis on microarray data using agnes{cluster} and I have created my own dissimilarity matrix according to a distance measure different from "euclidean" or "manhattan" etc. My question is, if I choose for example method="complete", how are the distances between the elements calculated? Are they taken form the dissimilarity

problem with cclust[er] package

2003 Mar 05

problem with cclust[er] package

I have checked that section already. Sorry, I should have mentioned that. Memory limit increase does not work. Installtion of msvcrt.dll does not work either. Thank you. -----Original Message----- From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk] Sent: Wednesday, March 05, 2003 2:44 PM To: Igor Oleinik Cc: r-help at stat.math.ethz.ch Subject: Re: [R] problem with cclust[er]

Manhattan Plot

2013 Apr 27

Manhattan Plot

Hi, Tenfei, I have two group of data composed of gene mutation and deletion on specific sites. Will it be possible for me to use the Manhattan Plot for comparison? Thank you for you attention! Li-Wu Guo, Ph.D. Sent from Windows Mail [[alternative HTML version deleted]]

Manhattan Plot

2013 Jan 08

Manhattan Plot

Hello, I am trying to create a simple Manhattan plot for a small list of 200 SNPs spread out in the genome in different genes. I have tried different functions (using ggplot2 and a function created by Stephen Turner, mhtplot etc.)-none of them work smoothly. Does anyone have a simple way to create the plot (not for all 22 chromosomes)- with the x axis showing the genes name and not the

dist() {"mva" package} bug: treats +/- Inf as NA

2002 Oct 21

dist() {"mva" package} bug: treats +/- Inf as NA

Vince Carey found this (thank you!). Since the fix to the problem is not entirely obvious, I post this to R-devel as RFC: help(dist) says: >> Missing values are allowed, and are excluded from all computations >> involving the rows within which they occur. If some columns are >> excluded in calculating a Euclidean, Manhattan or Canberra >> distance, the sum is

problem with ccluster package

2003 Mar 05

problem with ccluster package

Hello, I am calling cclust function in cclust package repeatedly until some ceratain conditions for a cluster are met. Unfortunately, the system crashes on the second call (after debugging). # kmeans res1 is a well defined matrix cl <- cclust(res1, as.numeric(ncntrs), iter.max = 20, verbose = FALSE, dist="manhattan", method="kmeans") RGui has generated errors and will

cutree (PR#1067)

2001 Aug 22

cutree (PR#1067)

Full_Name: Anja von Heydebreck Version: 1.3.0 OS: Alpha Unix Submission from: (NULL) (141.14.19.61) Hi, I repeatedly obtained meaningless results from the function 'cutree' in the 'mva' package, when the argument 'h' was greater or equal to the maximum height occuring: > library('mva') > y [,1] [,2] [,3] [,4] [1,] 0 1 -1 1 [2,] 0 -1

hclust title and paste - messed up

2004 Oct 11

hclust title and paste - messed up

I use the following code to scan a (limited) parameter space of clustering strategies ... data <- read.table(... dataTranspose <- t(data) distMeth <- c("euclidean", "maximum", "manhattan", "canberra", "binary" ) clustMeth <- c("ward",

hierarchical clustering within a size limit

2011 May 11

hierarchical clustering within a size limit

Hello List, I am trying to implement a hierarchical cluster using the hclust method agglomerative single linkage method with a small wrinkle. I would like to cluster a set of numbers on a number line only if they are within a distance of 500. I would then like to print out the members of this list. So far I can put a vector: > x<-c(2,10,200,300,600,700) into a distance matrix: >

cophenetic matrix

2001 Jun 12

cophenetic matrix

Hello, I analyse some free-sorting data so I use hierarchical clustering. I want to compare my proximity matrix with the tree representation to evalute the fitting. (stress, cophenetic correlation (pearson's correlation)...) "The cophenetic similarity of two objects a and b is defined as the similarity level at wich objects a and b become members of the same cluster during the course of

indexing and regression testing

2007 Aug 23

indexing and regression testing

Dear all, It was a pleasure to meet you at Iowa State University. Two days ago I submitted two experimental packages to CRAN (hope it will be there soon): rindex: quick indexing of large objects (currently only character, see ?index) regtest: some first support for automated regression testing (heavily used in \dontshow{} section of ?index) With rindex you can for example i <-

Problem with the cluster package

2006 Apr 24

Problem with the cluster package

Hi everybody, I want to use the cluster package (Cluster Analysis Extended Rousseeuw et al.). I downloaded it from the CRAN and installed it on my linux system (fedora core 4). All seemed to be allright. But when trying to launch examples, I obtained the following message : > library(cluster) > data(votes.repub) > agn1 <- agnes(votes.repub, metric = "manhattan",

Erratic behaviour of sammon()

2001 Nov 01

Erratic behaviour of sammon()

I'm not sure this list is the right place for this thing. I noticed some erratic behaviour in sammon(). Running sammon on two nearly identical sets of data results in very different results. Below is an example. I create an initial configuration with cmdscale() and store it into 'vec1'. I write this to file, and read it back in again to 'vec2'. According to cor() on the three

similar to: Any better way of optimizing time for calculating distances in the mentioned scenario??