Displaying 20 results from an estimated 2000 matches similar to: "KMeans - Evaluation Results"
2016 Aug 19
2
KMeans - Evaluation Results
On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:
> I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class.
I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a
2016 Aug 17
2
KMeans - Evaluation Results
On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org>
wrote:
> >> How long does 200?300 documents take to cluster? How does it grow as
> more documents are included in the MSet? We'd expect an MSet of 1000
> documents to take longer to cluster than one with 100, but the important
> thing is _how_ the time increases as the number of documents
2016 Aug 17
2
KMeans - Evaluation Results
I've gone through the link that you sent me and I currently understand how
this helps and works to some extent, but I am not too sure of how I should
start with converting the current interface to PIMPL design. I'm not used
to this design pattern so its taking some time to sink in :)
Say I start with the Clusterer class, I create a ClustererImpl class which
is the internal class that
2016 Aug 17
2
KMeans - Evaluation Results
> How long does 200?300 documents take to cluster? How does it grow as more
> documents are included in the MSet? We'd expect an MSet of 1000 documents
> to take longer to cluster than one with 100, but the important thing is
> _how_ the time increases as the number of documents grows.
>
> Currently, the number of seconds taken for clustering a set of documents
for varying
2016 Aug 15
2
KMeans - Evaluation Results
Hello,
I've recently finished with an implementation of KMeans with two
initialization techniques, random initialization and KMeans++. I would like
to share my findings after evaluating the same.
I have tested this implementation of KMeans with a BBC news article
dataset. I am currently working on evaluating the same with FIRE datasets.
Currently, clustering more than 500 documents
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello,
I have finished moving the API to PIMPL classes and will fix issues within
the current code over the next week, based on reviews from mentors.
The next step going forward is to start with forming document vectors that
are reduced and more useful. This majorly helps in saving run time (since
time for distance calculation depends on number of terms). Getting the
useful terms within a
2017 Mar 09
2
GSoC 2017 Project Proposal
Hello devs.
I would like to propose how I plan to go about improving and getting a
system that can be integrated into Xapian in this GSoC for the clustering
branch.
I have identified three areas of work which were not touched last time.
1) Automated Performance Analysis
I had roughly implemented 2 evaluation techniques previously (Distance b/w
document and centroids within clusters and
2016 Jul 26
3
K MEANS clustering
Hello,
I've been working on the KMeans clustering algorithm recently and since the
past week, I have been stuck on a problem which I'm not able to find a
solution to.
Since we are representing documents as Tf-idf vectors, they are really
sparse vectors (a usual corpus can have around 5000 terms). So it gets
really difficult to represent these sparse vectors in a way that would be
2008 Jul 03
1
Otpmial initial centroid in kmeans
Helo there. I am using kmeans of base package to cluster my customers. As
the results of kmeans is dependent on the initial centroid, may I know:
1) how can we specify the centroid in the R function? (I don't want random
starting pt)
2) how to determine the optimal (if not, a good) centroid to start with? (I
am not after the fixed seed solution as it only ensure that the
2004 May 28
6
distance in the function kmeans
Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance.
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET
2016 Jun 29
2
xapian-letor: FeatureVector discussion
>
>
>
> The approach I was thinking would look something like this:
>
> * instead of Features, which is really a namespace implemented as a
> class, we separate out the calculation of the different features
> into distinct subclasses of Feature, whose only job is to calculate
> a single feature. Currently the FeatureManager calls these (via
>
2006 Aug 07
5
kmeans and incom,plete distance matrix concern
Hi there
I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created
i.e:
[
mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2),
dimnames = list(levels(DF$V1), levels(DF$V2)))
mat[cbind(DF$V1, DF$V2)] <- DF$V3
This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.
My query
2016 May 05
2
GSoC 2016 - Introduction
Hello,
Thanks James for the reply. That cleared a few things out. Apologies for
replying late because of exams going on.
I was going through the previous clustering API to understand how it worked
and it seems like the the approach for construction of the termlists which
are used for distance metrics use TF-IDF weighting with cosine similarity,
which is very similar to the approach I would need
2005 Mar 31
2
Using kmeans given cluster centroids and data with NAs
Hello,
I have used the functions agnes and cutree to cluster my data (4977
objects x 22 variables) into 8 clusters. I would like to refine the
solution using a k-means or similar algorithm, setting the initial
cluster centres as the group means from agnes. However my data matrix
has NA's in it and the function kmeans does not appear to accept this?
> dim(centres)
[1] 8 22
> dim(data)
2016 Jul 27
2
K MEANS clustering
Hey Parth,
Thanks for the reply.
I am considering implementing a cosine distance metric too, along with
euclidian distance because of the dimensionality issue that comes in with
K-Means and euclidian distance metric.
That does help when we deal with sparse vectors for documents. The
particular problem I'm having is representing centroids in an efficient way.
For example, when we find the mean
2006 Jul 09
2
distance in kmeans algorithm?
Hello.
Is it possible to choose the distance in the kmeans algorithm?
I have m vectors of n components and I want to cluster them using kmeans
algorithm but I want to use the Mahalanobis distance or another distance.
How can I do it in R?
If I use kmeans, I have no option to choose the distance.
Thanks in advance,
Arnau.
2016 Mar 05
2
GSOC-2016 Project : Clustering of search results
Hello devs,
I am Richhiey Thomas, pursuing my third year of undergraduate studies in
Computer Science from Mumbai University. I had gone through the project
list for this year and the project idea based on clustering caught my
attention. I spoke to Assem Chelli on IRC who guided me to the code and got
me started.
I started going through the code and have successfully built Xapian on my
machine.
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org>
wrote:
> On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote:
>
> K-Means or something related certainly seems like a viable approach,
> so what you'll need to do is to come up with a proposal of how you'd
> implement this in Xapian (either with reference to the previous work,
>
2016 Jun 09
2
2nd week progress
Hello devs,
I have filled out the repo link on TRAC as suggested. I'll also keep the
journal updated on TRAC from now on.
I am almost done with defining all the base classes required for the
clusterer and have started coding the euclidian distance metric. This
should be completed by tomorrow after which I'll be spending one day to
test and make sure everything functions as expected, so
2016 Mar 12
2
GSOC-2016 Project : Clustering of search results
On Sat, Mar 12, 2016 at 04:27:55PM +0530, Richhiey Thomas wrote:
> Below I write a raw version of my proposal for Clustering of Search Results
> based on our previous mails.
Hi, Richhiey. Thanks for putting this together ahead of the formal
start of applications, and sharing it with us -- and it's really not
too long! Project proposals for something that will last the summer
are