Maciej Zięba
2009-Feb-25 16:10 UTC
[Xapian-discuss] Index-time weight of a document and weight per document field
Hello :-) When indexing documents, I would like to influence future search results order. I've used the verb "influence", because I don't want to change the ordering completely but only to give a "hint" about it. There are two ways in which I would like to do that: 1. Weight of a document I would like to be able to say that some documents are more important than the other and should therefore end up higher in the results. An example: - Document A has weight of 2 - Document B has weight of 1 - Document C has weight of 3 - We search for "xyz" and find it in all 3 documents - The order in which results are given would be: C, A, B (Of course this is just an example, so I'm disregarding all other things that influence relevance, like number of "xyz" occurences, document length, etc.) 2. Weight of a field (per document, not in general) I would like to be able to say that a given field in a particular document is more important than in another. An example: - Let's say that we have a "keywords" field - Document A has weight of 1 and it's keywords field has weight of 3 - Document B has weight of 1 and it's keywords field has weight of 1 - Document C has weight of 1 and it's keywords field has weight of 2 - We search for "xyz" and find it in "keywords" fields of all 3 documents - The order in which results are given would be: A, C, B I've tried searching myself for information on how to make something like this but without success (not giving up yet, though ;-) ). I would be really, really grateful for any suggestions how I could achieve something like this and/or if it is possible at all. I guess this can't be done with any existing tool (for example with scriptindex) and I would have to write my own indexer (I will try to use Python bindings). Am I right? Please excuse if my explanations are not clear enough (English is not my mother tongue), I'm glad to answer any questions :-) Best regards, Maciej
Thomas Viehmann
2009-Feb-25 19:37 UTC
[Xapian-discuss] Index-time weight of a document and weight per document field
Hi Maciej, Maciej Zi?ba wrote:> I guess this can't be done with any existing tool (for example with > scriptindex) and I would have to write my own indexer (I will try to use > Python bindings). Am I right?Internally, Xapian assigns weights to terms attached to a document. In the Xapian API, a TermGenerator's index text takes an optional weight parameter and (at a lower level) the Document's add_term or add_posting method take an (optional) argument wdfinc to specify weight increase. For scriptindex, the weight parameter is specified as part of the field definitions in the .script (see the examples). For a quick test, you could likely (too cumbersome unless you have a very limited number of classes of documents) use different .script files for varying the weights. Kind regards T. -- Thomas Viehmann, http://thomas.viehmann.net/
Olly Betts
2009-Feb-26 00:31 UTC
[Xapian-discuss] Index-time weight of a document and weight per document field
On Wed, Feb 25, 2009 at 05:10:56PM +0100, Maciej Zi??ba wrote:> Hello :-) > > When indexing documents, I would like to influence future search results order. > I've used the verb "influence", because I don't want to change the ordering > completely but only to give a "hint" about it. > > There are two ways in which I would like to do that: > > 1. Weight of a document > I would like to be able to say that some documents are more important than the > other and should therefore end up higher in the results. An example: > - Document A has weight of 2 > - Document B has weight of 1 > - Document C has weight of 3 > - We search for "xyz" and find it in all 3 documents > - The order in which results are given would be: C, A, BThere isn't really a clean way to do this in 1.0.x - the best I can think of is to add a term to all documents you want to give a weight boost to with a wdf which models this weight boost (say XBOOST) and then combine this with the parsed query like so: Xapian::Query q = queryparser.parse_query(query_string); q = Xapian::Query(Xapian::Query::OP_AND_MAYBE, q, Xapian::Query("XBOOST)); With SVN trunk, you can use Xapian::PostingSource to do this: http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst> 2. Weight of a field (per document, not in general) > I would like to be able to say that a given field in a particular document is > more important than in another. An example: > - Let's say that we have a "keywords" field > - Document A has weight of 1 and it's keywords field has weight of 3 > - Document B has weight of 1 and it's keywords field has weight of 1 > - Document C has weight of 1 and it's keywords field has weight of 2 > - We search for "xyz" and find it in "keywords" fields of all 3 documents > - The order in which results are given would be: A, C, Bhttp://trac.xapian.org/wiki/FAQ/ExtraWeight> I guess this can't be done with any existing tool (for example with > scriptindex) and I would have to write my own indexer (I will try to use > Python bindings). Am I right?The "XBOOST" technique could be done by massaging the input file to scriptindex and using a suitable index script. The index-time extra weight technique described in the FAQ is supported by scriptindex directly (weight=FACTOR). Cheers, Olly