similar to: problems with a large data set

Displaying 20 results from an estimated 4000 matches similar to: "problems with a large data set"

2001 Apr 27
0
weithed clustering (was: Re: problems with a large data set)
kmeans and clara work great. Thank you for the tip. I have another question: Is it possible to weight the observations in a cluster analysis ? I haven't found any mention of this in the kmeans of clara help texts. Moritz Lennert Charg? de recherche IGEAT - ULB t?l: 32-2-650.65.16 fax: 32-2-650.50.92 email: mlennert at ulb.ac.be > On Wed, 25 Apr 2001, Moritz Lennert wrote: >
2001 Apr 06
2
automatic levels
Hello, I've imported a csv, semi-colon spearated file with read.csv2, containing one column of rownames and one column of floating point numbers. When I look at the column of data with framename$columnname, I get the values of the column plus level values. Are these level values created automatically ? The problem is when I try to calcluate the correlation coefficient between this set of data
2010 Apr 26
2
Cluster analysis: dissimilar results between R and SPSS
Hello everyone! My data is composed of 277 individuals measured on 8 binary variables (1=yes, 2=no). I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The objective is to have the means for each variable per retained cluster. 1) the R analysis ran as followed: > call data > dist=dist(data,method="euclidean") >
2009 Feb 20
2
cluster analysis: mean values for each variable and cluster
Hi all! I'm new to R and don't know many about it. Because it is free, I managed to learn it a little bit. Here is my problem: I did a cluster analysis on 30 observations and 16 variables (monde, figaro, liberation, etc.). Here is the .txt data file:
2010 May 05
2
custom metric for dist for use with hclust/kmeans
Hi guys, I've been using the kmeans and hclust functions for some time now and was wondering if I could specify a custom metric when passing my data frame into hclust as a distance matrix. Actually, kmeans doesn't even take a distance matrix; it takes the data frame directly. I was wondering if there's a way or if there's a package that lets you create distance matrices from
2012 Jul 04
1
Error in hclust?
Dear R users, I have noted a difference in the merge distances given by hclust using centroid method. For the following data: x<-c(1009.9,1012.5,1011.1,1011.8,1009.3,1010.6) and using Euclidean distance, hclust using centroid method gives the following results: > x.dist<-dist(x) > x.aah<-hclust(x.dist,method="centroid") > x.aah$merge [,1] [,2] [1,] -3 -6
2004 Jan 04
5
Analyzing dendograms??
I have used heatmap to visualize my microarray data. I have a matrix of M-values. I do the following. #The distance between the columns. sampdist <- dist(t(matrix[,]), method="euclidean") sclus <- hclust(sampdist, method="average") #The distance between the rows. genedist <- dist(matrix[,], method="euclidean") gclus <- hclust(genedist,
2003 Nov 04
1
hclust doesn't return merge details [Solved]
Thanks to Andy and Thomas, Reading help(hclust) more carefully would have done it but sometimes you do not see the wood for the trees... So hc$merge does exactly what I want. I have never been aware of the command str to get the structure of an R-object. It seems pretty useful to me. Thanks, Arne > -----Original Message----- > From: Liaw, Andy [mailto:andy_liaw at merck.com] >
2007 Dec 07
1
pvclust warning message
Hi all I am trying to perform the follwing: fit<-pvclust(wq, method.hclust="ward", method.dist="euclidean") but get a strange error message that I just cant figure out. Has anyone come across this? Any help would be most appricieated Error in hclust(distance, method = method.hclust) : NA/NaN/Inf in foreign function call (arg 11) In addition: Warning message: NAs
2011 Dec 12
1
Is there a way to print branch distances for hclust function?
The R function hclust is used to do cluster analysis, but based on R help I see no way to print the actual fusion distances (that is, the vertical distances for each connected branch pairs seen in the cluster dendrogram). Any ideas? I'd like to use them test for significant differences from the mean fusion distance (i.e. The Best Cut Test). To perform a cluster analysis I'm using: x
2017 Jun 21
1
getting error while trying to make dendogram based on gene expression
I am trying to make dendogram based on gene expression matrix , but getting some error: I countMatrix = read.table("count.row.txt",header=T,sep='\t',check.names=F) colnames(countMatrix) count_matrix <- countMatrix[,-1] # remove first column (gene names) rownames(count_matrix) <- countMatrix[,1] #added first column gene names as rownames) >
2011 May 17
1
simprof test using jaccard distance
Dear All, I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function: > simprof function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "euclidean", method.transform =
2010 May 25
1
Hierarchical clustering using own distance matrices
Hey Everyone! I wanted to carry out Hierarchical clustering using distance matrices i have calculated ( instead of euclidean distance etc.) I understand as.dist is the function for this, but the distances in the dendrogram i got by using the following script(1) were not the distances defined in my distance matrices. script: var<-read.table("the distance matrix i calculated",
2013 Dec 07
1
How to perform clustering without removing rows where NA is present in R
I have a data which contain some NA value in their elements. What I want to do is to **perform clustering without removing rows** where the NA is present. I understand that `gower` distance measure in `daisy` allow such situation. But why my code below doesn't work? __BEGIN__ # plot heat map with dendogram together. library("gplots") library("cluster")
2013 Dec 12
2
method default for hclust function
I could not figure out what was the default when I ran hclust() without specifying the method. For example: I just have a code like: hclust(dist(data)) Any input would be appreciated:) [[alternative HTML version deleted]]
2004 Oct 11
2
hclust title and paste - messed up
I use the following code to scan a (limited) parameter space of clustering strategies ... data <- read.table(... dataTranspose <- t(data) distMeth <- c("euclidean", "maximum", "manhattan", "canberra", "binary" ) clustMeth <- c("ward",
2007 Jun 13
2
Formatted Data File Question for Clustering -Quickie Project
I am trying to learn how to format Ascii data files for scan or read into R. Precisely for a quickie project, I found some code (at end of this email) to do exactly what I need: To cluster and graph a dendrogram from package (stats). I am stuck on how to format a text file to run the script. I looked at the dataset USArrests (which would be replaced by my data and labels) using UltraEdit. That
2001 Jun 12
1
cophenetic matrix
Hello, I analyse some free-sorting data so I use hierarchical clustering. I want to compare my proximity matrix with the tree representation to evalute the fitting. (stress, cophenetic correlation (pearson's correlation)...) "The cophenetic similarity of two objects a and b is defined as the similarity level at wich objects a and b become members of the same cluster during the course of
2006 Jul 10
2
pvclust missing values problem
Hello all, I posted a question to this list last week and received no response. I am unsure if this means no-one knows the answer or if I posed the question badly. I'm going to assume I posed the question badly and try again. I am new to R so it is quite likely it's a very naive question, however if there is something blindingly obvious that I am missing or if there is another resource I
2001 Apr 10
5
Similarity matrix
I frequently use hclust on a similarity matrix. In R only a distance matrix is allowed. Is there a simple reliable transformation of a similarity matrix that will result in a distance matrix making hclust work the same as S-Plus with a similarity matrix? Venables & Ripley 3rd edition implies that a simple reversal of values will suffice. Thanks -Frank -- Frank E Harrell Jr