thr3ads.net - Xapian devel - GSoC 2016 - Introduction [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Richhiey Thomas

2016-Apr-25 12:01 UTC

GSoC 2016 - Introduction

Hello devs,

My name is Richhiey Thomas.and I've been selected for GSoC 2016 for the
project Clustering of Search Results. I would like to thank the Xapian GSoC
admin's for giving me this opportunity and James and Olly to help me with
my first merge request.

In the next two to three days, I'll critically examine all the aspects of
the project that I could have any doubts in and clear them out, so that I
can have a clear idea of how to start before the coding period.

Most of my doubts are around writing tests and how do I start implementing
the proposed solution to clustering in the current codebase. I'll form a
few concrete questions and get back in a day or two.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160425/2b095927/attachment.html>

James Aylett

2016-Apr-29 11:03 UTC

head link

GSoC 2016 - Introduction

On Mon, Apr 25, 2016 at 05:31:45PM +0530, Richhiey Thomas wrote:
> Most of my doubts are around writing tests and how do I start
> implementing the proposed solution to clustering in the current
> codebase. I'll form a few concrete questions and get back in a day
> or two.
Hi, Richhiey -- hopefully you're beginning to get a feeling for how
tests work within Xapian. What I'd suggest you do is to start by
writing a single test that does a very simple cluster (perhaps just a
single document database). The test won't compile initially, but it
will help you think about the API, because you'll be writing client
code to implement the test.

You can probably open a PR containing this as a place to discuss that
API, and then move from there to declaring the classes you need (at
which point the tests will compile but not link), some stub
implementations (link but not pass) and then finally on to building
the actual implementation, at which point you'll also have ideas for
other tests to write.

I'd probably put the tests in a new xapian-core/tests/api_cluster.cc
file (and add it into collased_apitest_sources in
xapian-cores/tests/Makefile.am).

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org

Richhiey Thomas

2016-May-01 16:23 UTC

head link

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like
to clarify a few higher level things that I am still in doubt about.

1) As discussed during the IRC interview, I was suggested about first
implementing a normal K-means clustering implementation and then adding on
the PSO module as a functionality that can be used to improve quality of
clustering for speed as a trade off. This is the way I should see the
project, right?

2) Isn't it easier to first think about the API for the clustering
functionality rather then deriving it through test cases? (I haven't been
used to thinking like this so it gets kind of hard to think in reverse). Do
correct me if writing tests before is the better way.

3) The fitness measure I plan to use for the PSO part and also for
evaluating the clustering results is ADDC (average distance of documents to
the cluster centroid). Is this the best fit?

4) For parameters in K-means and PSO, default values can be set which can
be overridden in a special use case?

5) There is already a clustering branch that was created before. Do I have
to continue work with the existing implementation or do I start afresh?

Currently I'm looking at the previous clustering branch and the test API
and getting used to the things I am not familiar with in the codebase. Once
I am confident, I'll go ahead with a simple test for the clustering as you
suggested.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160501/c1a4725d/attachment.html>

Xapian devel - Apr 2016 - GSoC 2016 - Introduction

GSoC 2016 - Introduction

GSoC 2016 - Introduction

GSoC 2016 - Introduction