thr3ads.net - Xapian devel - [Xapian-devel] [GSoC 2014] About "Clustering of Search Results" [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Chi Liu

2014-Mar-11 02:11 UTC

[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

2014-03-11 8:47 GMT+08:00 Olly Betts <olly at survex.com> wrote:
> Most applications of Xapian are interactive, so to actually be
> practically useful clustering needs to complete in a reasonable amount
> of time (a fraction of a second ideally).  I think that needs to be a key
> aim of the project.
> If by "find new approaches" you mean different approach to that
used by
> the existing clustering branch, then sure.  If you're talking about
> doing original research, I'd be a little cautious about that, as
> clustering is a relatively mature field, and I'm a bit dubious a
student
> could development and implement a new approach in the GSoC timescale.
> But if that aim is addressed, exactly what else the project consists of
> is largely up to you.


Thank you for your patient explanation about the project. My understanding
about
the project "Clustering of Search Results" is that we mainly focus on
processing
speed of the existing code.

By "find new approaches" I mean trying other known clustering
algorithms.
What I am
concerned is whether the low efficiency is caused by improper algorithm. I
am reading
the existing clustering branch code and have not completely finished yet. I
might be
able to talk more about existing code in my application of GSoC. But now, I
really
can not comment before fully understanding exiting code.



> That's a good question - I'm not sure how clustering effectiveness
is
> typically measured.  But if we're implementing known approaches,
> a formal evaluation of effectiveness is probably less necessary.


My idea about measure clustering effectiveness is that when we trying other
known
clustering algorithms, we can use the old clustering result as a baseline.
If the difference
of clustering results is acceptable and new clustering algorithm has high
efficiency,
we may find a better approach. I will give more details about this in
my application of GSoC.



Thanks
Liu Chi


2014-03-11 8:47 GMT+08:00 Olly Betts <olly at survex.com>:
> On Mon, Mar 10, 2014 at 08:50:14PM +0800, Chi Liu wrote:
> > The topic of "Clustering of Search Results" looks
interesting and I think
> > it suits me. I have been involved in a project that aims to clustering
> > tweets based on the text similarity and user profile. I noticed that
> > "Clustering of Search Results" have mentioned disappointing
performance.I
> > am puzzled that is this project just concerned improve the performance
of
> > the old code or also trying to find new approaches?
>
> Most applications of Xapian are interactive, so to actually be
> practically useful clustering needs to complete in a reasonable amount
> of time (a fraction of a second ideally).  I think that needs to be a key
> aim of the project.
>
> But if that aim is addressed, exactly what else the project consists of
> is largely up to you.
>
> If by "find new approaches" you mean different approach to that
used by
> the existing clustering branch, then sure.  If you're talking about
> doing original research, I'd be a little cautious about that, as
> clustering is a relatively mature field, and I'm a bit dubious a
student
> could development and implement a new approach in the GSoC timescale.
>
> > Besides clustering speed, how to evaluate clustering effect?
>
> That's a good question - I'm not sure how clustering effectiveness
is
> typically measured.  But if we're implementing known approaches,
> a formal evaluation of effectiveness is probably less necessary.
>
> Cheers,
>     Olly
>


-- 
Chi Liu
+86-15210624786
Undergraduate Student
Team of Search Engine and Web Mining
School of Electronic Engineering  and Computer Science
Peking University, Beijing, 100871, P.R.China
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140311/9bab424f/attachment-0002.html>

Olly Betts

2014-Mar-11 13:33 UTC

head link

[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

On Tue, Mar 11, 2014 at 10:11:31AM +0800, Chi Liu wrote:> Thank you for your patient explanation about the project. My
> understanding about the project "Clustering of Search Results" is
that
> we mainly focus on processing speed of the existing code.
We need something which can cluster larger result sets faster than the
current code.  Speeding up the existing code might be the best way to do
that, but we could start again.  If we start again, I'd suggest it would
be prudent to try to understand why the previous attempt didn't succeed.
We don't want to end up repeating that.
> By "find new approaches" I mean trying other known clustering
algorithms.
OK - that's fine then.
> What I am concerned is whether the low efficiency is caused by
> improper algorithm. I am reading the existing clustering branch code
> and have not completely finished yet. I might be able to talk more
> about existing code in my application of GSoC. But now, I really can
> not comment before fully understanding exiting code.
Sure.
> My idea about measure clustering effectiveness is that when we trying
> other known clustering algorithms, we can use the old clustering
> result as a baseline.  If the difference of clustering results is
> acceptable and new clustering algorithm has high efficiency, we may
> find a better approach. I will give more details about this in my
> application of GSoC.
Great.

Cheers,
    Olly

Chi Liu

2014-Mar-16 22:13 UTC

head link

[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

Hello,
I have submitted my proposal on GSoC.
But I have little idea about the timeline. Many things are difficult to be
determined.


Cheers,
   Liu Chi


2014-03-11 21:33 GMT+08:00 Olly Betts <olly at survex.com>:
> On Tue, Mar 11, 2014 at 10:11:31AM +0800, Chi Liu wrote:
> > Thank you for your patient explanation about the project. My
> > understanding about the project "Clustering of Search
Results" is that
> > we mainly focus on processing speed of the existing code.
>
> We need something which can cluster larger result sets faster than the
> current code.  Speeding up the existing code might be the best way to do
> that, but we could start again.  If we start again, I'd suggest it
would
> be prudent to try to understand why the previous attempt didn't
succeed.
> We don't want to end up repeating that.
>
> > By "find new approaches" I mean trying other known
clustering algorithms.
>
> OK - that's fine then.
>
> > What I am concerned is whether the low efficiency is caused by
> > improper algorithm. I am reading the existing clustering branch code
> > and have not completely finished yet. I might be able to talk more
> > about existing code in my application of GSoC. But now, I really can
> > not comment before fully understanding exiting code.
>
> Sure.
>
> > My idea about measure clustering effectiveness is that when we trying
> > other known clustering algorithms, we can use the old clustering
> > result as a baseline.  If the difference of clustering results is
> > acceptable and new clustering algorithm has high efficiency, we may
> > find a better approach. I will give more details about this in my
> > application of GSoC.
>
> Great.
>
> Cheers,
>     Olly
>


-- 
Chi Liu
+86-15210624786
Undergraduate Student
Team of Search Engine and Web Mining
School of Electronic Engineering  and Computer Science
Peking University, Beijing, 100871, P.R.China
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140317/121a8af8/attachment-0002.html>

Reasonably Related Threads

Search for more possibly parallel threads

Xapian devel - Mar 2014 - [GSoC 2014] About "Clustering of Search Results"

[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

[Xapian-devel] [GSoC 2014] About "Clustering of Search Results"

Reasonably Related Threads