thr3ads.net - similar to: "Cluster Analysis"

Displaying 20 results from an estimated 200 matches similar to: "Cluster Analysis"

2010 Dec 02

kmeans() compared to PROC FASTCLUS

Hello all, I've been comparing results from kmeans() in R to PROC FASTCLUS in SAS and I'm getting drastically different results with a real life data set. Even with a simulated data set starting with the same seeds with very well seperated clusters the resulting cluster means are still different. I was hoping to look at the source code of kmeans(), but it's in C and FORTRAN and

GSoC 2017 Project Proposal

2017 Mar 09

GSoC 2017 Project Proposal

Hello devs. I would like to propose how I plan to go about improving and getting a system that can be integrated into Xapian in this GSoC for the clustering branch. I have identified three areas of work which were not touched last time. 1) Automated Performance Analysis I had roughly implemented 2 evaluation techniques previously (Distance b/w document and centroids within clusters and

Clustering Large Applications..sort of

2011 Aug 10

Clustering Large Applications..sort of

Hello all, I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library 'hclust' is good (though it may be sub par for this size of problem, thus doubly poor for this application) in that I do not want to make assumptions about the number of

how to cluster rows of words in a text file

2012 Mar 23

how to cluster rows of words in a text file

Hi: I am trying to cluster the rows of a text file with kmeans: I load the data as follows file1 <- read.csv("somefile.csv") and the file can be viewed having the following line of words > file1 1 word1 word3 word4 word1 2 word1 word4 word3 word1 3 word4 word2 word4 word3 4 word4 word2 word1 word3 5 word2 word2 word4 word2 file_as_matrix <- as.matrix(file1); Now,

DICE Coefficient of similarity measure

2010 Apr 24

DICE Coefficient of similarity measure

Hi, I wanted the DICE coefficient (similarity measure for binary variables) to be calculated in R and found that the "igraph" package has the option of "similarity.dice" to do this. But, for this command, the input object should be an igraph object. But, I have a dataframe of columns containing 1's and 0's. Can I convert this dataframe into an igraph object, so that

Help with stemDocument

2012 Apr 13

Help with stemDocument

Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:* > stemDocument(t_cmts[[100]]) *Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor

Calinsky and Harabasz Index for Cluster Determination with Diana in R

2008 Apr 13

Calinsky and Harabasz Index for Cluster Determination with Diana in R

Hello all, I have a set of data points, which I have pair distances for. I managed to create dendrogram for this data set using diana() in R, however this only gives me the tree and not the clusters themselves. I am trying to determine clusters using Calinsky and Harabasz Index (CH Index). I, however, cannot find how to accomplish this using R. Is there anyone who could help me with this? I

Clustering and Calinski's index

2002 Feb 20

Clustering and Calinski's index

I have to solve a clustering problem. My first step is to determinate the number of clusters, that's why I 'm using the Calinski index ( [tr(b)/(k-1)]/[tr(w)/(k-1)] ) which i try to maximize to have the best number of clusters. A function is already implemented in R to calculate this index : clustIndex(cl,x, index="calinski") where cl is the result of a clustering method ,

Agnes in Cluster Package and index.G1 in the clusterSim package questions

2010 Apr 05

Agnes in Cluster Package and index.G1 in the clusterSim package questions

Dear R Users: I am new to R and I am trying to do a cluster analysis on a single continuous variable using the Agnes [Agglomerative Nesting (Hierarchical Clustering) ] in the Package ‘cluster’. I was able to apply this clustering method to my data: ward1 <- Agnes(balances, diss= FALSE, metric = "euclidean", stand = TRUE, method = "ward", keep.diss =TRUE, keep.data =

different results in MASS's mca and SAS's corresp

2011 Feb 05

different results in MASS's mca and SAS's corresp

Dear list: I have tried MASS's mca function and SAS's PROC corresp on the farms data (included in MASS, also used as mca's example), the results are different: R: mca(farms)$rs: 1 2 1 0.059296637 0.0455871427 2 0.043077902 -0.0354728795 3 0.059834286 0.0730485572 4 0.059834286 0.0730485572 5 0.012900181 -0.0503121890 6

R and SAS proc format

2007 Mar 06

R and SAS proc format

Dear all, Is there an R equivalent to SAS's proc format? Best regards J. Lamack _________________________________________________________________ O Windows Live Spaces ? seu espa?o na internet com fotos (500 por m?s), blog e agora com rede social http://spaces.live.com/

Equivalent of SAS's FIRST. And LAST. Variable in R?

2010 Jul 13

Equivalent of SAS's FIRST. And LAST. Variable in R?

Hi all, I'm just wondering if there is a equivalent of SAS's FIRST. and LAST. variables in R? For example, suppose this is a snapshot of the data: ClientCode CaseCode open close Important 1 37 28 2003-07-08 2003-09-02 1 2 37 310 2003-11-01 2004-09-10 1 3 37 1562 2007-04-03 2007-07-27 1 4

NMDS plot and Adonis (PerMANOVA) of community composition with presence absence and relative intensity

2011 Sep 09

NMDS plot and Adonis (PerMANOVA) of community composition with presence absence and relative intensity

Hi! Thanks for providing great help in R-related statistics. Now, however I'm stuck. I'm not a statistics person but I was recommended to use R to perform a nmds plot and PerMANOVA of my dataset. Sample(treatment) in the columns and species (OTU) in the rows. I have 4 treatments (Ambient Temperature, Ambient temperature+Low pH, High temperature, High temperature+low pH), and I have 16

Selecting Best Regression Equation

2004 Apr 05

Selecting Best Regression Equation

Dear all, Does R or S-plus or any of their packages provide any command to form any of the following procedures to find Best Regression Equation - 1. 'All Possible Regressions Procedures' (is there any automated command to perform 2^p regressions and ordering according to criteria R2(adj), mallows Cp, s2- by not setting all the regression models manually), 2. 'Backward

Package cclust error

2004 Mar 09

Package cclust error

Hello, here is my problem, After looking at the mail archives, I found a description of the error I get when I use this package. At first I even tought that they were showing how to solve it. But the thing is that by saying "the programmer forgot drop=FALSE" doesn't show me how I should get rid of the problem I have looked inside the package very quickly and I found three

about mantelhaen.test (PR#7779)

2005 Apr 07

about mantelhaen.test (PR#7779)

Full_Name: Chien-yu Peng Version: 2.0.1 OS: Windows XP Professional Submission from: (NULL) (140.109.72.181) Dear all: Although I don't know you, I am thankful for your help. When I use the function mantelhaen.test for R x C x K (R, C > 2) table, the output is not the same as SAS's. I don't know that the result consist with one of SAS's. But it works correctly for 2

R version of SAS Proc Varclus

2008 Feb 08

R version of SAS Proc Varclus

I am interested in finding an R version of SAS "Proc Varclus". SAS's Proc Varclus implements an oblique cluster analysis based on principal components. How can I find out if R has a package that runs the same algorithm implemented in SAS "Proc Varclus"? Thank you, Mary Helen Black __________________________ Mary Helen Black, M.S. Keck School of Medicine of USC

Consistency of Logistic Regression

2010 Nov 11

Consistency of Logistic Regression

Dear R developers, I have noticed a discrepancy between the coefficients returned by R's glm() for logistic regression and SAS's PROC LOGISTIC. I am using dist = binomial and link = logit for both R and SAS. I believe R uses IRLS whereas SAS uses Fisher's scoring, but the difference is something like 100 SE on the intercept. What accounts for such a huge difference? Thank you for

piecing together statements (macro?)

2007 Mar 09

piecing together statements (macro?)

Hi All I am pretty new to R but saw stata and sas's macro facilities and am looking for how such things work in R. I am trying to piece together a series of statements: n = 5 #want to have it dynamic with respect to n for (j in 1:n) { eval(paste("x", j, "=x[", j, "]", sep="")) } I want the created statements 'x1=x[1]' immediately executed

R equiv to proc gremove in maps package

2004 Mar 15

R equiv to proc gremove in maps package

Is there an R equivalent to SAS's proc gremove? You would use this procedure to combine the units on an existing map, for example to build a map of Metropolitan Statistical Areas (MSAs) from the [US] counties dataset where the internal boundries surround the MSAs (which are groups of counties) rather than the individual counties. I can imagine the mechanism would be to find and erase the

similar to: Cluster Analysis