thr3ads.net - similar to: "maxitems in cluster validity"

Displaying 20 results from an estimated 3000 matches similar to: "maxitems in cluster validity"

2010 Oct 22

(no subject)

I am doing cluster analysis on 8768 respondents on 5 lifestyle variables and am having difficulty constructing a dissimilarity matrix which I will use for PAM. I always get an error: “cannot allocate vector of size 293.3 Mb” even if I have already increased my memory to its limit of 4000. I did it on 2GB , 32-bit OS . I tried ff and filehash and I still get the same error. Can you please

cluster.stats

2008 Jun 13

cluster.stats

Dear list, I just tried to use the function cluster.stat in the package fpc. I just have a couple of questions about the syntax: cluster.stats(d,clustering,alt.clustering=NULL, silhouette=TRUE,G2=FALSE,G3=FALSE) 1) the distance object (d) is an object obtained by the function dist() on my own original matrix? 2) clustering is the clusters vector as result of one of the many clustering methods?

transposing a column table

2010 Oct 29

transposing a column table

Dear R-user, I need help on how to transpose this column of clustering vector in R with 8768 entries derived from a PAM clustering output in a vertical view to an excel file Clustering vector: [1] 1 1 2 2 1 2 1 2 1 1 2 2 1 2 2 2 2 1 1 1 1 2 2 1 2 2 1 2 2 2 2 2 2 2 2 1 2 [38] 2 1 1 2 2 2 2 2 1 2 1 2 2 2 2 1 2 1 2 2 1 2 2 2 2 2 2 1 2 1 2 2 2 1 1 2 2 [75] 2 1 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2

Regression slope confidence interval

2005 Sep 29

Regression slope confidence interval

Hi list, is there any direct way to obtain confidence intervals for the regression slope from lm, predict.lm or the like? (If not, is there any reason? This is also missing in some other statistics softwares, and I thought this would be quite a standard application.) I know that it's easy to implement but it's for explanation to people who faint if they have to do their own programming...

DICE Coefficient of similarity measure

2010 Apr 24

DICE Coefficient of similarity measure

Hi, I wanted the DICE coefficient (similarity measure for binary variables) to be calculated in R and found that the "igraph" package has the option of "similarity.dice" to do this. But, for this command, the input object should be an igraph object. But, I have a dataframe of columns containing 1's and 0's. Can I convert this dataframe into an igraph object, so that

selecting outliers

2005 Aug 08

selecting outliers

Hi everybody, I'd like to know if there's an easy way for extracting outliers record from a dataset, in order to perform further analysis on them. Thanks Alessandro

k-nn hierarchical clustering

2011 Jun 09

k-nn hierarchical clustering

Hi there, is there any R-function for k-nearest neighbour agglomerative hierarchical clustering? By this I mean standard agglomerative hierarchical clustering as in hclust or agnes, but with the k-nearest neighbour distance between clusters used on the higher levels where there are at least k>1 distances between two clusters (single linkage is 1-nearest neighbour clustering)? Best regards,

R CMD check error

2006 Aug 09

R CMD check error

Dear list, R CMD check on my updated package now generated the following error: "LaTeX errors when creating DVI version. This typically indicates Rd problems." But the Rd files (and everything else) were checked as "OK" (I removed the problem about which I asked the list some hours ago, but answers are still appreciated because I rather created a rough workaround than

mixture models/latent class regression comparison

2011 Feb 28

mixture models/latent class regression comparison

Dear list, I have been comparing the outputs of two packages for latent class regression, namely 'flexmix', and 'mmlcr'. What I have noticed is that the flexmix package appears to come up with a much better fit than the mmlcr package (based on logLik, AIC, BIC, and visual inspection). Has anyone else observed such behaviour? Has anyone else been successful in using the mmlcr

computationally singular

2005 Aug 08

computationally singular

Hi, I have a dataset which has around 138 variables and 30,000 cases. I am trying to calculate a mahalanobis distance matrix for them and my procedure is like this: Suppose my data is stored in mymatrix > S<-cov(mymatrix) # this is fine > D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S)) Error in solve.default(cov, ...) : system is computationally

Rd-file error: non-ASCII input and no declared encoding

2010 Sep 01

Rd-file error: non-ASCII input and no declared encoding

Dear list, I came across the following error for three of my newly written Rd-files: non-ASCII input and no declared encoding I can't make sense of this. Below I copied in one of the three files. Can anybody please tell me what's wrong with it? Thank you, Christian \name{tetragonula} \alias{tetragonula} \alias{tetragonula.coord} \docType{data} % \non_function{} \title{Microsatellite

cluster analysis using Dmax

2006 Nov 01

cluster analysis using Dmax

Dear All, a long time ago I ran a cluster analysis where the dissimilarity matrix used consisted of Dmax (or Kolmogorov-Smirnov distance) values. In other words the maximum difference between two cumulative proportion curves. This all worked very well indeed. The matrix was calculated using Dbase III+ and took a day and a half and the clustering was done using MV-ARCH, with the resultant

Save Cluster results to data frame

2009 May 18

Save Cluster results to data frame

If I cluster my data into 3 sets, using pam for instance, is there a way to save the resultant cluster results, to the originating data frame. and related to that how do i say change the cluster names to something a bit more meaningful that 1..2...3 So it goes like this. Data ---> Cluster into 3 groups ----> given them meaningful names

Package "prabclus" not available?

2010 Oct 10

Package "prabclus" not available?

Hi there, I just tried to install the package prabclus on a computer running Ubuntu Linux 9.04 using install.packages from within R. This gave me a message: Warning message: In install.packages("prabclus") : package ?prabclus? is not available I tried to do this selecting two different CRAN mirrors (same result) and with other packages (installing them works fine). Looking up the

K-means result - variance between cluster

2010 Jul 02

K-means result - variance between cluster

Hi, I like to present the results from the clustering method k-means in terms of variances: within and between Cluster. The k-means object gives only the within cluster sum of squares by cluster, so the between variance part is missing,for calculation the following table, which I try to get. Number of | Variance within | Var between | Var total | F-value Cluster k | cluster | cluster

Cluster analysis, factor variables, large data set

2011 Mar 31

Cluster analysis, factor variables, large data set

Dear R helpers, I have a large data set with 36 variables and about 50.000 cases. The variabels represent labour market status during 36 months, there are 8 different variable values (e.g. Full-time Employment, Student,...) Only cases with at least one change in labour market status is included in the data set. To analyse sub sets of the data, I have used daisy in the cluster-package to create

cluster

2005 Jul 25

cluster

Dear listers: Here I have a question on clustering methods available in R. I am trying to down-sampling the majority class in a classification problem on an imbalanced dataset. Since I don't want to lose information in the original dataset, I don't want to use naive down-sampling: I think using clustering on the majority class' side to select "representative" samples might

cluster size

2009 Dec 11

cluster size

hi r-help, i am doing kmeans clustering in stats. i tried for five clusters clustering using: kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife", "rellife","hordlife","doutlife","symtlife","washlife",

R-update - what about packages and ESS?

2006 Aug 18

R-update - what about packages and ESS?

Hi there, it seems that if I update R, it doesn't find previously installed packages anymore and is also not found by ESS. Actually the update has been done by our system administrator who assumed that there would be no problems with these things (I don't have root access to this system) and will perhaps not be too keen on installing everything else again. Is there any simple way how

cluster/distance large matrix

2010 Feb 11

cluster/distance large matrix

Hi all, I've stumbled upon some memory limitations for the analysis that I want to run. I've a matrix of distances between 38000 objects. These distances were calculated outside of R. I want to cluster these objects. For smaller sets (egn=100) this is how I proceed: A<-matrix(scan(file, n=100*100),100,100, byrow=TRUE) ad<-as.dist(A)

similar to: maxitems in cluster validity