thr3ads.net - Xapian devel - GSoC: Weighting Schemes [May 2016]

If this information is useful, please help other people find it:
Share via:

Vivek Pal

2016-May-08 11:06 UTC

GSoC: Weighting Schemes

Hi James,

Thanks for clearing doubts I had earlier.
>>if we can introduce the variants using optional parameters that default
to
>>(effectively) 'off' that might be better than distinct ones,
Yes, this will definitely be the better approach for introducing the
variants of existing weighting functions.
Thanks for the suggestion.
Next, I will try to come up with a draft of pseudo-code for each of those
variants in next few days. Would be helpful
if you could review them before coding period begins. It will help me get a
clear picture of implementation in advance.
>>you need to independently calculate, or independently
>>verify, the correct outputs for some test sets (you should be able to
>>use the existing test databases).
So, careful manual testing of implemented code and automated testing
through xapian-core/tests/api_weight.cc
using the existing test databases is what I'd need to perform for complete
testing of implemented weighting functions.
Please correct me if I am wrong or missing something here.
>>You should talk to Guarav about that, in particular looking at the
evaluation
work he did previously>>(https://github.com/samuelharden/xapian-evaluation)
I've started exploring and trying to get this evaluation module running on
my system.
Facing some issues initially so trying to sort out those issues with the
help from Gaurav on IRC.
>>We may want to take the opportunity to discuss whether parts or all of
>>this evaluation framework can be moved into the main Xapian repo, and
>>if there are changes that will make it easier to use for evaluation infuture.

Yes, it'd be a huge plus for us as it would help to compare
Xapian's performance based on the different weighting functions.
I'll add this under "Additional tasks" in my project wiki and
would like to
work with Gaurav after completing my GSoC project.
>>If Nishad doesn't find time to take this forward,
>>it should be fine for you to pick up and complete this normalisation.
Sure, I'll do it as a part of Additional tasks after GSoC period :)
>>Yes, that's a good idea. You might want, at the end of the project,
to
>>transfer any remaining ideas and thoughts either into the bug tracker
>>or to somewhere on the wiki
I've got 3 ideas for this section so far after all discussions:-
1. Implement remaining SMART normalizations of tf-idf weighting function ,
2. Work with Gaurav to get parts of evaluation module in main repo to start
with.
>>Good luck with them!
Thanks :)

Regards,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160508/a9641133/attachment.html>

James Aylett

2016-May-09 10:33 UTC

head link

GSoC: Weighting Schemes

On Sun, May 08, 2016 at 04:36:16PM +0530, Vivek Pal wrote:
> >>you need to independently calculate, or independently
> >>verify, the correct outputs for some test sets (you should be able
to
> >>use the existing test databases).
> 
> So, careful manual testing of implemented code and automated testing
> through xapian-core/tests/api_weight.cc
> using the existing test databases is what I'd need to perform for
complete
> testing of implemented weighting functions.
Almost -- the manual step should just be in calculating the correct
outputs. All the actual testing, verifying that the weights come out
correctly, should be automated.
> I've started exploring and trying to get this evaluation module
> running on my system.  Facing some issues initially so trying to
> sort out those issues with the help from Gaurav on IRC.
Great -- I note that Olly has dropped something in IRC about this, so
hopefully you're able to keep moving forward.
> >>We may want to take the opportunity to discuss whether parts or all
of
> >>this evaluation framework can be moved into the main Xapian repo,
and
> >>if there are changes that will make it easier to use for evaluation
in
> >>future.
> 
> Yes, it'd be a huge plus for us as it would help to compare
> Xapian's performance based on the different weighting functions.
> I'll add this under "Additional tasks" in my project wiki and
would like to
> work with Gaurav after completing my GSoC project.
Perfect.

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org

Gaurav Arora

2016-May-09 11:07 UTC

head link

GSoC: Weighting Schemes

Hi Vivek,

 I saw your comments on IRC,  as noted by olly :
*<olly> vivekp: (if you check the logs) - you want: ./trec_index config*
*<olly> you want to run the compiled binary (no ".cc") not the
source
file...*

But i guess you are not able to compile the setup. I can write steps and
send across how to compile and sample files from config.

Most of the files in config are from a test collection taken from FIRE, we
would need to ask permission to gain access to those files by FIRE team(
http://fire.irsi.res.in/fire/static/data). It needs to be signed to gain
access by organization to permit us to use data.

@olly and @James
Earlier being part of fire team, i was able to use this data. Not sure i
have the data now, Should we fill these form and ask permission to use this
data from FIRE team?

Thanks,
Gaurav


On Mon, May 9, 2016 at 4:03 PM, James Aylett <james-xapian at
tartarus.org>
wrote:
> On Sun, May 08, 2016 at 04:36:16PM +0530, Vivek Pal wrote:
>
> > >>you need to independently calculate, or independently
> > >>verify, the correct outputs for some test sets (you should be
able to
> > >>use the existing test databases).
> >
> > So, careful manual testing of implemented code and automated testing
> > through xapian-core/tests/api_weight.cc
> > using the existing test databases is what I'd need to perform for
> complete
> > testing of implemented weighting functions.
>
> Almost -- the manual step should just be in calculating the correct
> outputs. All the actual testing, verifying that the weights come out
> correctly, should be automated.
>
> > I've started exploring and trying to get this evaluation module
> > running on my system.  Facing some issues initially so trying to
> > sort out those issues with the help from Gaurav on IRC.
>
> Great -- I note that Olly has dropped something in IRC about this, so
> hopefully you're able to keep moving forward.
>
> > >>We may want to take the opportunity to discuss whether parts
or all of
> > >>this evaluation framework can be moved into the main Xapian
repo, and
> > >>if there are changes that will make it easier to use for
evaluation in
> > >>future.
> >
> > Yes, it'd be a huge plus for us as it would help to compare
> > Xapian's performance based on the different weighting functions.
> > I'll add this under "Additional tasks" in my project
wiki and would like
> to
> > work with Gaurav after completing my GSoC project.
>
> Perfect.
>
> J
>
> --
>   James Aylett, occasional trouble-maker
>   xapian.org
>
>

-- 
Regards,
Gaurav Arora
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160509/f758f18b/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

Xapian devel - May 2016 - GSoC: Weighting Schemes

GSoC: Weighting Schemes

GSoC: Weighting Schemes

GSoC: Weighting Schemes

Reasonably Related Threads