search for: termfreq

Displaying 20 results from an estimated 23 matches for "termfreq".

2007 Mar 06
1
Merging stats from multiple databases for expand
In matcher/expandweight.cc we have: OmExpandBits operator+(const OmExpandBits &bits1, const OmExpandBits &bits2) { OmExpandBits sum(bits1); sum.multiplier += bits2.multiplier; sum.rtermfreq += bits2.rtermfreq; // FIXME - try to share this information rather than pick half of it if (bits2.dbsize > sum.dbsize) { DEBUGLINE(WTCALC, "OmExpandBits::operator+ using second operand: " << bits2.termfreq << "/" << bi...
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
...ccur in a few documents) should be able to give a higher weight to the documents they index compared to terms which occur in many documents .Also,the higher the within document frequency in the document ,more is the weight given by the term to the document. The basic formula is W(t,d)=wdf* log(N/termfreq) . However,various normalizations can be applied to both wdf and idf. The extra per document component will be 0 here and so get_maxextra( ) will return 0 . Moreover,an upper bound on W(t,d) for get_maxpart( ) can be found out easily for a particular normalization (if I have all the required m...
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
...y record table structure checked OK termlist: baseB blocksize=8K items=1886756 lastblock=417475 revision=6207 levels=3 root=83720 B-tree checked okay termlist table structure checked OK postlist: baseB blocksize=8K items=8872525 lastblock=524452 revision=6207 levels=3 root=238 B-tree checked okay termfreq 197211 != # of entries 197210 collfreq 10861536 != sum wdf 10861533 termfreq 14189 != # of entries 14188 collfreq 98354 != sum wdf 98344 termfreq 9866 != # of entries 9865 collfreq 56453 != sum wdf 56443 termfreq 195141 != # of entries 195137 collfreq 8126093 != sum wdf 8126079 postlist table error...
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings, Using latest svn. I've noticed the following error when performing index merging: postlist: baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459 B-tree checked okay Tag containing meta information is corrupt. postlist table errors found: 1 I can still search on this index (I've only checked very small indexes), but merging is now a problem since I check
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information
2017 May 24
0
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
...an in-core issue, which points to a Xapian bug or > > memory issues. > > The output of xapian-check follows. > xapian-check ~/.recoll/xapiandb [...] > postlist: > baseB blocksize=8K items=8872525 lastblock=524452 revision=6207 levels=3 root=238 > B-tree checked okay > termfreq 197211 != # of entries 197210 > collfreq 10861536 != sum wdf 10861533 > termfreq 14189 != # of entries 14188 > collfreq 98354 != sum wdf 98344 > termfreq 9866 != # of entries 9865 > collfreq 56453 != sum wdf 56443 > termfreq 195141 != # of entries 195137 > collfreq 8126093 != s...
2013 Oct 13
2
trouble with user's right indexing with omega
...in "users" group. If file right are: -rw-r------ 1 ftp users 13 2013-10-06 16:26 test.txt # delve -t "I*" dbb/ term `I*' not in database That's ok, "other" cannot read this file. # delve -t "I#users" dbb/ Posting List for term `I#users' (termfreq 1, collfreq 0): 1 That's ok, group 'users' can read this file. # delve -t "I at ftp" dbb/ term `I at ftp' not in database That's wrong, user "ftp" can read this file. As this user is not in "users" group, this user cannot find this file. In omi...
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi, I have a user reporting the following error during recoll indexing: flush() failed: Db block overwritten - are there multiple writers? "flush() failed" is from recoll, the rest is, I think the text of the Xapian exception. This is with Xapian 1.4.3 on Linux (I asked for more details, should be coming). I don't think that I've ever seen this error, and I also
2012 Apr 15
1
Patch for Initial Prototype implementation of Unigram Langauage Modelling in xapian-core.
Hi, I have implemented initial prototype of the Xapian::Weight subclass for Unigram Language Modelling to support UnigramLM weighing in xapian.Other changes include adding collection_frequency to TermFreqs struct to store collection frequency of terms and some changes to support it xapian Framework,Changing simplesearch.cc to search using UnigramLMWeight class. Following issues have not being addressed in this patch(I am working on following issues): 1. Log trick for handling multiplication for LM...
2018 Jan 22
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
...ht() can not work on a single PostingSource. So some optimizing don't work, but waste time oppositely. How do you think about this? Also, We found the BM25 algorithm is fast in xapian, so we think if we can modify our get_weight() function to adjust the BM25 algorithm. If so, the type of termfreq of document should be double. I am wondering if it works just re-typedef Xapian::termcount to double? Does it has a negative impact on other place of xapian source. Thanks.
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
...riz del > corpus linguistico me da el siguiente error: > > > > > > *Error en UseMethod("meta", x) : no applicable method for 'meta' applied > to an object of class "character"Además: Mensajes de aviso perdidosIn > mclapply(unname(content(x)), termFreq, control) : all scheduled cores > encountered errors in user code* > > Os copio el script y os adjunto los datos por si acaso: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *T...
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining
2012 Jul 09
1
Question about Document and TermIterator.get_termfreq()
Hi, While porting the unit tests from perl for the node binding I noticed a test failed. I basically create a document, add a few terms, add the document to a database and then call doc->termlist_begin().get_termfreq(). This throws "Can't get term frequency from a document termlist which is not associated with a database." What I think this means is that I can not call get_termfreq from a TermIterator obtained from a document (this is only available for a TermIterator obtained from a Database obj...
2018 Jan 24
0
How to get the serialise score returned in Xapian::KeyMaker->operator().
...uilding for every matching document, like how the PostingSource will need to calculate the weight for every matching document. > Also, We found the BM25 algorithm is fast in xapian, so we think if we > can modify our get_weight() function to adjust the BM25 algorithm. If > so, the type of termfreq of document should be double. I am wondering > if it works just re-typedef Xapian::termcount to double? Does it has a > negative impact on other place of xapian source. It'll stop it compiling, which is fairly negative. Xapian::termcount needs to be an unsigned integer, and there are as...
2017 Oct 16
2
Current master unit test errors
...e() [...] Running test: uninitdb1... InvalidArgumentError: Can't make an Enquire object from an uninitialised Database object. Running test: rset4... FAILED Running test: valuesetmatchdecider1... FAILED Running test: emptymset1... InvalidOperationError: Can't get termfreq from an MSet which is not derived from a query. Running test: nonutf8docdesc1... FAILED Running test: orphaneddoctermitor1... Invalid read of size 8 Running test: serialise_query2... SIGSEGV at (nil) Running test: serialise_query3... SIGSEGV at (nil) ./apitest backend none: 134 t...
2011 Sep 12
1
findFreqTerms vs minDocFreq in Package 'tm'
I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I am getting drastically different results with both. I have given the results from both the
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
...gt;> > >> > >> > >> > >> *Error en UseMethod("meta", x) : no applicable method for 'meta' > applied > >> to an object of class "character"Además: Mensajes de aviso > >> perdidosIn mclapply(unname(content(x)), termFreq, control) : all > >> scheduled cores encountered errors in user code* > >> > >> Os copio el script y os adjunto los datos por si acaso: > >> > >> > >> > >> > >> > >> > >> > >> > >> > >...
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
...>> > > > >> *Error en UseMethod("meta", x) : no applicable method for 'meta' > > > applied > > > >> to an object of class "character"Además: Mensajes de aviso > > > >> perdidosIn mclapply(unname(content(x)), termFreq, control) : all > > > >> scheduled cores encountered errors in user code* > > > >> > > > >> Os copio el script y os adjunto los datos por si acaso: > > > >> > > > >> > > > >> > > > >> > >...
2017 Dec 15
5
How to get the serialise score returned in Xapian::KeyMaker->operator().
HI, all, I am a user of Xapian, and now I have a problem in using it. After using boolean terms to get some candidates of documents (still too much), we want sorted them by self-defined function which is used in Xapian::KeyMaker->operator(). But how can I get the serialise score in Xapian::MSetIterator object. c++ code likes this: class SortKeyMaker : public Xapian::KeyMaker { std::string
2020 Aug 23
2
MultiDatabase shard count limitations
....so [.] malloc 0.21% /mnt/btr/public libc-2.28.so [.] __memchr_sse2 0.21% /mnt/btr/public perl [.] Perl_pad_alloc 0.21% perl perl [.] Perl_Slab_Alloc 0.20% script/public-i libxapian.so.30.8.0 [.] OrPostList::get_termfreq_est 0.19% /mnt/btr/public perl [.] Perl_op_lvalue_flags 0.19% script/public-i libc-2.28.so [.] cfree at GLIBC_2.2.5 0.19% /mnt/btr/public libc-2.28.so [.] __libc_calloc 0.19% /mnt/btr/public perl [.] Perl_pad_leavemy 0...