thr3ads.net - similar to: "K MEANS clustering"

Displaying 20 results from an estimated 2000 matches similar to: "K MEANS clustering"

2016 Jul 27

K MEANS clustering

Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean

2nd week progress

2016 Jun 09

2nd week progress

Hello devs, I have filled out the repo link on TRAC as suggested. I'll also keep the journal updated on TRAC from now on. I am almost done with defining all the base classes required for the clusterer and have started coding the euclidian distance metric. This should be completed by tomorrow after which I'll be spending one day to test and make sure everything functions as expected, so

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

KMeans - Evaluation Results

2016 Aug 19

KMeans - Evaluation Results

On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a

GSOC-2016 Project : Clustering of search results

2016 Mar 05

GSOC-2016 Project : Clustering of search results

Hello devs, I am Richhiey Thomas, pursuing my third year of undergraduate studies in Computer Science from Mumbai University. I had gone through the project list for this year and the project idea based on clustering caught my attention. I spoke to Assem Chelli on IRC who guided me to the code and got me started. I started going through the code and have successfully built Xapian on my machine.

GSOC-2016 Project : Clustering of search results

2016 Mar 07

GSOC-2016 Project : Clustering of search results

On Mon, Mar 07, 2016 at 01:36:43AM +0530, Richhiey Thomas wrote: > My questions are: > 1) Can you direct me on how to convert this raw idea into a proposal in > context to Xapian with more detail? What areas do I focus on? Our GSoC guide has an application template <https://trac.xapian.org/wiki/GSoCApplicationTemplate> which you should use to structure your proposal. It has some

KMeans - Evaluation Results

2016 Aug 18

KMeans - Evaluation Results

> > > > Actually, you're doing something slightly unusual there: making the > internal member public. Protected would be better, and private is I think > most usual; library clients aren't going to have access to the Internal > class declaration, so they can't call things on it. This means it's > actually difficult right now to subclass Feature. > > I

GSOC-2016 Project : Clustering of search results

2016 Mar 12

GSOC-2016 Project : Clustering of search results

On Sat, Mar 12, 2016 at 04:27:55PM +0530, Richhiey Thomas wrote: > Below I write a raw version of my proposal for Clustering of Search Results > based on our previous mails. Hi, Richhiey. Thanks for putting this together ahead of the formal start of applications, and sharing it with us -- and it's really not too long! Project proposals for something that will last the summer are

GSoC 2016 - Introduction

2016 May 01

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.

KMeans - Evaluation Results

2016 Aug 15

KMeans - Evaluation Results

Hello, I've recently finished with an implementation of KMeans with two initialization techniques, random initialization and KMeans++. I would like to share my findings after evaluating the same. I have tested this implementation of KMeans with a BBC news article dataset. I am currently working on evaluating the same with FIRE datasets. Currently, clustering more than 500 documents

k-means with euclidian distance but no coordinates

2001 Dec 13

k-means with euclidian distance but no coordinates

Hi, I'm trying to build a thesaurus that will sensible values for rare words. I suspect the best algorithm to use is k-means although I'm not sure about that -- I would have preferred a k dimensional space with a binary cluster in each dimension so a word can belong to 0..k clusters, but I digress... I can measure the strength of correlation between words fairly easily by counting

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org> wrote: > >> How long does 200?300 documents take to cluster? How does it grow as > more documents are included in the MSet? We'd expect an MSet of 1000 > documents to take longer to cluster than one with 100, but the important > thing is _how_ the time increases as the number of documents

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying

GSOC-2016 Project : Clustering of search results

2016 Mar 14

GSOC-2016 Project : Clustering of search results

On Mon, Mar 14, 2016 at 02:09:13AM +0530, Richhiey Thomas wrote: > The way the paper has been written I guess is the main source of your > confusion. Let me provide a paper that explains this same concept in a way > that is easier to understand. I was confused by eq (3) that you mentioned > too. Here it is : > http://www.sau.ac.in/~vivek/softcomp/clustering%20PSO+K-means.pdf Ah,

Otpmial initial centroid in kmeans

2008 Jul 03

Otpmial initial centroid in kmeans

Helo there. I am using kmeans of base package to cluster my customers. As the results of kmeans is dependent on the initial centroid, may I know: 1) how can we specify the centroid in the R function? (I don't want random starting pt) 2) how to determine the optimal (if not, a good) centroid to start with? (I am not after the fixed seed solution as it only ensure that the

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

keep the centre fixed in K-means clustering

2013 May 21

keep the centre fixed in K-means clustering

Dear R users I have the matrix of the centres of some clusters, e.g. 20 clusters each with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric values. I have collected new data (each with 100 numeric values) and would like to keep the above 20 centres fixed/'unmoved' whilst just see how my new data fit in this grouping system, e.g. if the data is close to cluster 1

GSoC 2017 Project Proposal

2017 Mar 09

GSoC 2017 Project Proposal

Hello devs. I would like to propose how I plan to go about improving and getting a system that can be integrated into Xapian in this GSoC for the clustering branch. I have identified three areas of work which were not touched last time. 1) Automated Performance Analysis I had roughly implemented 2 evaluation techniques previously (Distance b/w document and centroids within clusters and

K-Means clustering Algorithm

2012 Aug 28

K-Means clustering Algorithm

I was wondering if there was an R equivalent to the two phased approach that MATLAB uses in performing the Kmeans algorithm. If not is there away that I can determine if the kmeans in R and the kmeans in MATLAB are essentially giving me the same clustering information within a small amount of error? -- View this message in context:

k-means clustering

2007 Sep 12

k-means clustering

Dear list, first apologies for this is not strictly an R question but a theoretical one. I have read that use of k-means clustering assumes sphericity of data distribution. Can anyone explain me what this means? My statistical background is too poor. Is it another kind of distribution, like gaussian or binomial? What does it happen if the distribution is not spherical? Could you give me an

similar to: K MEANS clustering