thr3ads.net - similar to: "GSoC 2017 Project Proposal"

Displaying 20 results from an estimated 2000 matches similar to: "GSoC 2017 Project Proposal"

2016 Aug 19

KMeans - Evaluation Results

On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a

KMeans - Evaluation Results

2016 Aug 18

KMeans - Evaluation Results

> > > > Actually, you're doing something slightly unusual there: making the > internal member public. Protected would be better, and private is I think > most usual; library clients aren't going to have access to the Internal > class declaration, so they can't call things on it. This means it's > actually difficult right now to subclass Feature. > > I

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org> wrote: > >> How long does 200?300 documents take to cluster? How does it grow as > more documents are included in the MSet? We'd expect an MSet of 1000 > documents to take longer to cluster than one with 100, but the important > thing is _how_ the time increases as the number of documents

2nd week progress

2016 Jun 09

2nd week progress

Hello devs, I have filled out the repo link on TRAC as suggested. I'll also keep the journal updated on TRAC from now on. I am almost done with defining all the base classes required for the clusterer and have started coding the euclidian distance metric. This should be completed by tomorrow after which I'll be spending one day to test and make sure everything functions as expected, so

Introduction and Doubts

2016 Mar 09

Introduction and Doubts

Hello All,I am Nirmal Singhania from NIIT University,India. I am interested in Clustering of search results Topic. I have been in field of practical machine learning and information retrieval from quite some time. I took various courses/MOOC on Information retrieval and Text Mining and have been working on real life datasets(KDD99,AWID,Movielens). Because the problems you face in real life ML/IR

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

I've gone through the link that you sent me and I currently understand how this helps and works to some extent, but I am not too sure of how I should start with converting the current interface to PIMPL design. I'm not used to this design pattern so its taking some time to sink in :) Say I start with the Clusterer class, I create a ClustererImpl class which is the internal class that

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

K MEANS clustering

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

> How long does 200?300 documents take to cluster? How does it grow as more > documents are included in the MSet? We'd expect an MSet of 1000 documents > to take longer to cluster than one with 100, but the important thing is > _how_ the time increases as the number of documents grows. > > Currently, the number of seconds taken for clustering a set of documents for varying

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

difference between trees in R?

2001 Aug 21

difference between trees in R?

Hi. I am wondering if anybody has studied and/or written code in R to calculate the distance between 2 "trees". For example, if one does a hierarchical agglomerative clustering and say, a hierachical divisive clustering (represented as trees) and wishes to compute a metric on them. I am thinking of something like the symmetric difference as mentioned in Margush and McMorris (1982).

Introduction and Doubts

2016 Mar 10

Introduction and Doubts

I was not sharing it on maling list because i thought that someone can use all ideas i proposed in their GSOC proposal. Surely i will contribute to xapian project. sorry if that was against the rules The algorithm is not developed by me but after having much research on various clustering techniques. I found that there is a new algorithm called CLUBS(Clustering Using Binary Splitting) which gives

K MEANS clustering

2016 Jul 27

K MEANS clustering

Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean

KMeans - Evaluation Results

2016 Aug 15

KMeans - Evaluation Results

Hello, I've recently finished with an implementation of KMeans with two initialization techniques, random initialization and KMeans++. I would like to share my findings after evaluating the same. I have tested this implementation of KMeans with a BBC news article dataset. I am currently working on evaluating the same with FIRE datasets. Currently, clustering more than 500 documents

Problem with lsa package (data.frame) on Windows XP

2007 Aug 18

Problem with lsa package (data.frame) on Windows XP

Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1,

Xapian-discuss Digest, Vol 127, Issue 1

2015 Jan 03

Xapian-discuss Digest, Vol 127, Issue 1

Hey Richhiey, Most probably Xapian is used with CYGWIN in Windows and Windows Specific Code in Xapian is based on CYGWIN, However we would be able to help you out with this issue, if you could pastebin whole 'gnu-make' generated report. Regards, Abhishek On Sat, Jan 3, 2015 at 5:30 PM, <xapian-discuss-request at lists.xapian.org> wrote: > Send Xapian-discuss mailing list

Bitsize project: Krovetz Stemmer

2015 Feb 15

Bitsize project: Krovetz Stemmer

Hello xapian devs, I had shown interest in writing a krovetz stemmer for xapian and spoke to James Aylett about it. Since it was hard to code the stemmer in snowball, I came up with a C++ implementation of the stemmer. But since it is a dictionary based stemmer, im having problems on deciding how to create the dictionary. I did check out some of the implementations of the Krovetz stemmer online

Bitsize project - Krovetz stemmer

2015 Feb 10

Bitsize project - Krovetz stemmer

Hello Xapian devs, -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20150210/c848e9b7/attachment-0002.html>

Weighting schemes for Xapian

2015 Mar 28

Weighting schemes for Xapian

Hello xapian devs, Sorry for not getting back sooner. I was stuck up with coursework. I would like to work on LDA based document modelling and Heimstra's language modelling and would like to form a concrete plan on how to proceed. It would be really helpful if I could have a mentor to assist me with this. Looking forwards to your reply. Thanks. :) -------------- next part -------------- An

agglomerative coefficient in agnes (cluster)

2005 Jan 25

agglomerative coefficient in agnes (cluster)

I haven't read the book, but could anyone explain more about this parameter? help(agnes) says that ac measures the amount of clustering structure found. From the definition given in help(agnes.object), however, it seems that as long as the dissimilarity of the merger in the final step of the algorithm is large enough, the ac value will be close to 1. So what does ac really mean? Thank

similar to: GSoC 2017 Project Proposal