thr3ads.net - similar to: "problems with a large data set"

Displaying 20 results from an estimated 4000 matches similar to: "problems with a large data set"

weithed clustering (was: Re: problems with a large data set)

2001 Apr 27

weithed clustering (was: Re: problems with a large data set)

kmeans and clara work great. Thank you for the tip. I have another question: Is it possible to weight the observations in a cluster analysis ? I haven't found any mention of this in the kmeans of clara help texts. Moritz Lennert Charg? de recherche IGEAT - ULB t?l: 32-2-650.65.16 fax: 32-2-650.50.92 email: mlennert at ulb.ac.be > On Wed, 25 Apr 2001, Moritz Lennert wrote: >

automatic levels

2001 Apr 06

automatic levels

Hello, I've imported a csv, semi-colon spearated file with read.csv2, containing one column of rownames and one column of floating point numbers. When I look at the column of data with framename$columnname, I get the values of the column plus level values. Are these level values created automatically ? The problem is when I try to calcluate the correlation coefficient between this set of data

Cluster analysis: dissimilar results between R and SPSS

2010 Apr 26

Cluster analysis: dissimilar results between R and SPSS

Hello everyone! My data is composed of 277 individuals measured on 8 binary variables (1=yes, 2=no). I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The objective is to have the means for each variable per retained cluster. 1) the R analysis ran as followed: > call data > dist=dist(data,method="euclidean") >

cluster analysis: mean values for each variable and cluster

2009 Feb 20

cluster analysis: mean values for each variable and cluster

Hi all! I'm new to R and don't know many about it. Because it is free, I managed to learn it a little bit. Here is my problem: I did a cluster analysis on 30 observations and 16 variables (monde, figaro, liberation, etc.). Here is the .txt data file:

custom metric for dist for use with hclust/kmeans

2010 May 05

custom metric for dist for use with hclust/kmeans

Hi guys, I've been using the kmeans and hclust functions for some time now and was wondering if I could specify a custom metric when passing my data frame into hclust as a distance matrix. Actually, kmeans doesn't even take a distance matrix; it takes the data frame directly. I was wondering if there's a way or if there's a package that lets you create distance matrices from

Error in hclust?

2012 Jul 04

Error in hclust?

Dear R users, I have noted a difference in the merge distances given by hclust using centroid method. For the following data: x<-c(1009.9,1012.5,1011.1,1011.8,1009.3,1010.6) and using Euclidean distance, hclust using centroid method gives the following results: > x.dist<-dist(x) > x.aah<-hclust(x.dist,method="centroid") > x.aah$merge [,1] [,2] [1,] -3 -6

Analyzing dendograms??

2004 Jan 04

Analyzing dendograms??

I have used heatmap to visualize my microarray data. I have a matrix of M-values. I do the following. #The distance between the columns. sampdist <- dist(t(matrix[,]), method="euclidean") sclus <- hclust(sampdist, method="average") #The distance between the rows. genedist <- dist(matrix[,], method="euclidean") gclus <- hclust(genedist,

hclust doesn't return merge details [Solved]

2003 Nov 04

hclust doesn't return merge details [Solved]

Thanks to Andy and Thomas, Reading help(hclust) more carefully would have done it but sometimes you do not see the wood for the trees... So hc$merge does exactly what I want. I have never been aware of the command str to get the structure of an R-object. It seems pretty useful to me. Thanks, Arne > -----Original Message----- > From: Liaw, Andy [mailto:andy_liaw at merck.com] >

pvclust warning message

2007 Dec 07

pvclust warning message

Hi all I am trying to perform the follwing: fit<-pvclust(wq, method.hclust="ward", method.dist="euclidean") but get a strange error message that I just cant figure out. Has anyone come across this? Any help would be most appricieated Error in hclust(distance, method = method.hclust) : NA/NaN/Inf in foreign function call (arg 11) In addition: Warning message: NAs

Is there a way to print branch distances for hclust function?

2011 Dec 12

Is there a way to print branch distances for hclust function?

The R function hclust is used to do cluster analysis, but based on R help I see no way to print the actual fusion distances (that is, the vertical distances for each connected branch pairs seen in the cluster dendrogram). Any ideas? I'd like to use them test for significant differences from the mean fusion distance (i.e. The Best Cut Test). To perform a cluster analysis I'm using: x

getting error while trying to make dendogram based on gene expression

2017 Jun 21

getting error while trying to make dendogram based on gene expression

I am trying to make dendogram based on gene expression matrix , but getting some error: I countMatrix = read.table("count.row.txt",header=T,sep='\t',check.names=F) colnames(countMatrix) count_matrix <- countMatrix[,-1] # remove first column (gene names) rownames(count_matrix) <- countMatrix[,1] #added first column gene names as rownames) >

simprof test using jaccard distance

2011 May 17

simprof test using jaccard distance

Dear All, I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function: > simprof function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "euclidean", method.transform =

Hierarchical clustering using own distance matrices

2010 May 25

Hierarchical clustering using own distance matrices

Hey Everyone! I wanted to carry out Hierarchical clustering using distance matrices i have calculated ( instead of euclidean distance etc.) I understand as.dist is the function for this, but the distances in the dendrogram i got by using the following script(1) were not the distances defined in my distance matrices. script: var<-read.table("the distance matrix i calculated",

How to perform clustering without removing rows where NA is present in R

2013 Dec 07

How to perform clustering without removing rows where NA is present in R

I have a data which contain some NA value in their elements. What I want to do is to **perform clustering without removing rows** where the NA is present. I understand that `gower` distance measure in `daisy` allow such situation. But why my code below doesn't work? __BEGIN__ # plot heat map with dendogram together. library("gplots") library("cluster")

method default for hclust function

2013 Dec 12

method default for hclust function

I could not figure out what was the default when I ran hclust() without specifying the method. For example: I just have a code like: hclust(dist(data)) Any input would be appreciated:) [[alternative HTML version deleted]]

hclust title and paste - messed up

2004 Oct 11

hclust title and paste - messed up

I use the following code to scan a (limited) parameter space of clustering strategies ... data <- read.table(... dataTranspose <- t(data) distMeth <- c("euclidean", "maximum", "manhattan", "canberra", "binary" ) clustMeth <- c("ward",

Formatted Data File Question for Clustering -Quickie Project

2007 Jun 13

Formatted Data File Question for Clustering -Quickie Project

I am trying to learn how to format Ascii data files for scan or read into R. Precisely for a quickie project, I found some code (at end of this email) to do exactly what I need: To cluster and graph a dendrogram from package (stats). I am stuck on how to format a text file to run the script. I looked at the dataset USArrests (which would be replaced by my data and labels) using UltraEdit. That

cophenetic matrix

2001 Jun 12

cophenetic matrix

Hello, I analyse some free-sorting data so I use hierarchical clustering. I want to compare my proximity matrix with the tree representation to evalute the fitting. (stress, cophenetic correlation (pearson's correlation)...) "The cophenetic similarity of two objects a and b is defined as the similarity level at wich objects a and b become members of the same cluster during the course of

pvclust missing values problem

2006 Jul 10

pvclust missing values problem

Hello all, I posted a question to this list last week and received no response. I am unsure if this means no-one knows the answer or if I posed the question badly. I'm going to assume I posed the question badly and try again. I am new to R so it is quite likely it's a very naive question, however if there is something blindingly obvious that I am missing or if there is another resource I

Similarity matrix

2001 Apr 10

Similarity matrix

I frequently use hclust on a similarity matrix. In R only a distance matrix is allowed. Is there a simple reliable transformation of a similarity matrix that will result in a distance matrix making hclust work the same as S-Plus with a similarity matrix? Venables & Ripley 3rd edition implies that a simple reversal of values will suffice. Thanks -Frank -- Frank E Harrell Jr

similar to: problems with a large data set