similar to: Changing weights per field

Displaying 20 results from an estimated 3000 matches similar to: "Changing weights per field"

2013 Mar 11
1
Implementation of the PL2 weighting scheme of the DFR Framework
Hello guys.I am working on implementing the PL2 weighting scheme of the DFR framework by Gianni Amati. It uses the Poisson approximation of the Binomial as the probabilistic model (P), the Laplace law of succession to calculate the after effect of sampling or the risk gain (L) and within document frequency normalization H2(2) (as proposed by Amati in his PHD thesis). The formula for w(t,d) in
2006 May 15
1
term / posting question
Hi guys Sorry to take up your time with this, I have just been stuck on a little problem with xapian for a few days here and I can't seem to figure it out for myself. I have created an xapian index (using the php bindings). I have added documents to it, with values, terms and postings. I can successfully search in this index on anything that is in a posting, but if I search on a word that
2016 Jul 24
3
Xapian 1.4.0 released
On Fri, Jul 22, 2016 at 07:19:43PM -0700, Kevin Duraj wrote: > I would like to propose to change the following code while indexing a > term that is larger than 245 characters and then crashing and aborting > the entire index, we could rather truncate the term to 245 characters > and continue with indexing. Kevin -- I wonder what others are currently doing when this comes up (or if
2006 Jan 31
1
retrieving attributes of searchresults
i use the perl interface of Search::Xapian to index documents, now i got metadata i store with the index like title, date, author, .. and i wonder how to retrieve them from the index again without pulling them from the database. i am pretty sure this is a stupid question and that the answer is obvious i dont seem to be able to find it. regards m
2010 Feb 02
1
How to use a custom stemmer from Python bindings?
Hi, I'm using Xapian bindings for Python in my project. How could I use a custom stemmer instead of the included one (Snowball)? The one I'm looking at right now is Hunspell (http://hunspell.sourceforge.net/) which has Python bindings (http://code.google.com/p/pyhunspell/). Thanks in advance, Eugene
2007 May 15
1
Document ID 0 is invalid... but not always...
Note: this is rather long and not very important and I don't want to prevent the team from releasing version 1.0, so go on reading only if you have too much free time !!! ;-) 0 is not a valid document ID, never, ever, but I just found a special case in which xapian will create a record and return 0 for the newly created record. In fact, I was "hacking", trying to store metadata
2008 Jan 15
7
PHP indexing, what's the PHP method for indexscript
Currently I have the following indexscript: pid : unique=Q boolean=Q field=pid postdate : field=startdate author_name: unhtml boolean=XAUTHORNAME field=author author_id: boolean=XAUTHORID field=authorid url : field=url sample : weight=1 index field=sample How can I create the same indexing using PHP? With this, I can get an searchable index, but I have no idea how to set the fields, so that I
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2008 Sep 27
3
Query::MatchAll
Why there still been rank when using Query::MatchAll() ?
2009 Feb 12
1
problem when using xapian's static libs in windows
I have download source ?1.10? from the internet and build it into lib Then I create a project as the helpdoc said I using vc2005(vc8) The source in my test project is as follow??copy from the helpdoc? #include <xapian.h> #include <iostream> using namespace std; int main(int argc, char **argv) { // Simplest possible options parsing: we just require three or more
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)
2012 Nov 21
1
about index speed of xapian
hi, i use xapian to index a txt file, it's size is 268M. i take each line as a document, and each line has two field like 13445511 | 111115151. the recored size is 10000000. the XAPIAN_FLUSH_THRESHOLD set 1000000. it takes 1026544ms to index the file, it is more slower than lucene. The lucene speed is about 40000 records per second. code: try { Xapian::WritableDatabase
2008 Jul 12
1
add_term
i used to use document.add_term("term"); to associate document with a term that did not appear in html, but add_term function might have changed, as i no longer get results for associated terms. what would be the new way to do it ? Thank You
2016 May 16
2
Weighting recent results
I was thinking about this some more: Is there a reason I can't just weight by some function of recency at indexing time? $weight = get_weight_based_on_recency(...); $tg->index_text($txt,$weight); If I wanted to allow the user the option of searching either in recency-weighted mode or not, I could index each document into 2 different databases, one with and one without. This avoids
2010 Jun 07
2
Is there a 64 character term size limit? In Ruby bindings?
I've just found some items in my Xapian database which aren't being indexed, when the terms are quite long. Example term: Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust It represents that the Freedom of Information request was made to a particular public body. It results in pages like this not correctly showing results:
2012 Jun 11
2
Define a variable on a non-standard year interval (Water Years)
Hello, I am trying to define a different interval for a "year". In hydrology, a "water year" is defined as the period between October 1st and September 30 of the following year. I was wondering how I might do this in R. Say I have a data.frame like the following and I want to extract a variable with the water year specs as defined above:
2008 Sep 16
1
Some Questions From the beginner of Xapian
Dear, guys: I am a beginner of Xapian, when reading the documents, I encountered follow questions. (1) I see the Xapian::Document has a method void add_value (Xapian::valueno valueno, const std::string &value) What's the purpose of this method? Document will related to the terms, but what's the purpose of this? (2) add_posting method will add term to a documents. void
2005 Feb 25
2
Bug in TermIterator::skip_to() ?
Hi all, I've been toying with xapian (mostly using the Python bindings) and I think I've hit a bug in the TermIterator::skip_to() method (or maybe in QuartzAllTermsList::skip_to()). I've attached a c++ source file that demonstrates the issue. In short, if you have a WritableDatabase, ask for the all-terms TermIterator with db.allterms_begin(), and then skip_to() a word that is itself
2007 Mar 21
1
scoring question
Hi All I have just realized that if I set a query like 'green jelly bean' xapian will turn that query into 'green OR jelly OR bean' This causes documents containing just one of the words to be considered a 100% hit. The behavior I would like to see is that each word gives a 33.3% hit, so that a document containing all 3 words gets placed above a document with only 1 or 2
2014 Mar 26
3
about sort_by_value
Hello, I have found that the use of sort_by_value very slow. 16800 result, return to the previous 10, sorting takes about 25ms. And if you do not sort, returns 10, need only about 0.3ms. How to make the sort faster? -------------- next part -------------- An HTML attachment was scrubbed... URL: