thr3ads.net - similar to: "2nd week progress"

Displaying 20 results from an estimated 2000 matches similar to: "2nd week progress"

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

K MEANS clustering

2016 Jul 27

K MEANS clustering

Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean

KMeans - Evaluation Results

2016 Aug 19

KMeans - Evaluation Results

On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

Is the project learning to rank need to be improved?

2013 Sep 25

Is the project learning to rank need to be improved?

As Olly has already pointed out the 2012 branch is not merged. I think there are some compilation errors in the branch. The code in branch is better refactored. The Ranker and FeatureManager classes are well defined and implemented. Parth. On Wed, Sep 25, 2013 at 9:02 AM, Olly Betts <olly at survex.com> wrote: > On Tue, Sep 24, 2013 at 08:34:10PM +0800, jiangwen jiang wrote: >

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

I've gone through the link that you sent me and I currently understand how this helps and works to some extent, but I am not too sure of how I should start with converting the current interface to PIMPL design. I'm not used to this design pattern so its taking some time to sink in :) Say I start with the Clusterer class, I create a ClustererImpl class which is the internal class that

[GSOC 2014] Some questions about Letor module

2014 Mar 09

[GSOC 2014] Some questions about Letor module

Thanks for your reply! For the third question: In https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can find inex2010-article.qrels in 2010 assessment, but can?t find query files. Could you send me the link? I have registered on INEX website. And I also need to download ``INEX 2009 collection without annotation tags: (unofficial)`` on

GSoC 2017 Project Proposal

2017 Mar 09

GSoC 2017 Project Proposal

Hello devs. I would like to propose how I plan to go about improving and getting a system that can be integrated into Xapian in this GSoC for the clustering branch. I have identified three areas of work which were not touched last time. 1) Automated Performance Analysis I had roughly implemented 2 evaluation techniques previously (Distance b/w document and centroids within clusters and

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org> wrote: > >> How long does 200?300 documents take to cluster? How does it grow as > more documents are included in the MSet? We'd expect an MSet of 1000 > documents to take longer to cluster than one with 100, but the important > thing is _how_ the time increases as the number of documents

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

GSOC-2016 Project : Clustering of search results

2016 Mar 05

GSOC-2016 Project : Clustering of search results

Hello devs, I am Richhiey Thomas, pursuing my third year of undergraduate studies in Computer Science from Mumbai University. I had gone through the project list for this year and the project idea based on clustering caught my attention. I spoke to Assem Chelli on IRC who guided me to the code and got me started. I started going through the code and have successfully built Xapian on my machine.

[GSOC 2014] Indexing INEX dataset

2014 Mar 22

[GSOC 2014] Indexing INEX dataset

For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere

GSoC 2016 - Introduction

2016 May 01

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.

Is the project learning to rank need to be improved?

2013 Sep 23

Is the project learning to rank need to be improved?

Hi, guys: I found this project idea http://trac.xapian.org/wiki/GSoCProjectIdeas#Project:LearningtoRank If it is need to be improved, I will try to handle it thanks Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130923/d0ced0d6/attachment-0002.html>

GSOC 2014

2014 Feb 26

GSOC 2014

Just to add on top of what Olly has already mentioned. > > Now, I'm reading the resources provided on ideas' page. Do you have > > any other suggestions of knowing more about the letor? > > And I'd like to test the function of letor. But I can't find code > > example. Can u give me some suggestions? > > Hopefully Parth can help here. > In order

Test Dataset for performance and accuracy analysis

2014 Mar 04

Test Dataset for performance and accuracy analysis

Hi Parth, I implemented DFR algorithms in Xapian as a part of GSOC last year under the mentorship of Olly. This year, I want to work on analyzing and optimizing the performance of the DFR algorithms and comparing them with BM25.I also want to work on profiling the query expansion schemes and test the relevance(precision and recall) / speed(time taken) of the

xapian-letor: FeatureVector discussion

2016 Jun 27

xapian-letor: FeatureVector discussion

Hello James, Parth, Following our discussion on IRC and on code review, the way FeatureVector class works needs some discussion. Presently, the FeatureVector class is defined as follows, with a fixed number of feature count (19): class FeatureVector::Internal : public Xapian::Internal::intrusive_base{ friend class FeatureVector; double label; double score;

KMeans - Evaluation Results

2016 Aug 18

KMeans - Evaluation Results

> > > > Actually, you're doing something slightly unusual there: making the > internal member public. Protected would be better, and private is I think > most usual; library clients aren't going to have access to the Internal > class declaration, so they can't call things on it. This means it's > actually difficult right now to subclass Feature. > > I

Xapian and GSoC 2014

2014 Jan 23

Xapian and GSoC 2014

Hi Tejas, Thank you for your interest in Letor project in Xapian. We would definitely like to consider Letor for this year's GSoC project. What I would suggest you is, start playing with the code and get acquainted with it. The latest code can be obtained from http://trac.xapian.org/wiki/ GSoC2012/LTR Regards, Parth. On Wed, Jan 22, 2014 at 10:14 PM, Tejas Nikumbh <tejasnikumbh at

Mid-term progress

2012 Jul 12

Mid-term progress

Hi Rishabh, As per our last progress meeting, I am off for some days and as now its the time to mid-term evaluation, it would be better to generate a progress report. For that, you should first commit the code as it is and then write 2 to 3 page summary explaining the deliverables so far and then the future plan. Tomorrow is the last day, so better by tomorrow morning, send this across and by

similar to: 2nd week progress