search for: termfreqs

Displaying 20 results from an estimated 23 matches for "termfreqs".

Did you mean: termfreq
2007 Mar 06
1
Merging stats from multiple databases for expand
In matcher/expandweight.cc we have: OmExpandBits operator+(const OmExpandBits &bits1, const OmExpandBits &bits2) { OmExpandBits sum(bits1); sum.multiplier += bits2.multiplier; sum.rtermfreq += bits2.rtermfreq; // FIXME - try to share this information rather than pick half of it if (bits2.dbsize > sum.dbsize) { DEBUGLINE(WTCALC,
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)
2017 May 22
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Olly Betts writes: > On Wed, May 17, 2017 at 09:08:32PM +0200, Jean-Francois Dockes wrote: > > I have a user reporting the following error during recoll indexing: > > > > flush() failed: Db block overwritten - are there multiple writers? > > > > "flush() failed" is from recoll, the rest is, I think the text of the Xapian > > exception.
2010 Jan 18
3
postlist: Tag containing meta information is corrupt.
Greetings, Using latest svn. I've noticed the following error when performing index merging: postlist: baseB blocksize=8K items=33962 lastblock=534 revision=1 levels=2 root=459 B-tree checked okay Tag containing meta information is corrupt. postlist table errors found: 1 I can still search on this index (I've only checked very small indexes), but merging is now a problem since I check
2013 Sep 02
2
Backend for Lucene format indexes-How to get doclength
On Mon, Sep 02, 2013 at 09:21:48AM +0800, jiangwen jiang wrote: > TfIdfWeight and BM25(b=0) also need wdf_upper_bound, it is not exists in > Lucene backends. If you don't provide an implementation of wdf_upper_bound(), the default is to use the collection frequency of the term, so provided that information is available in the lucene files, the lack of wdf_upper_bound information
2017 May 24
0
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
On Mon, May 22, 2017 at 07:45:59AM +0200, Jean-Francois Dockes wrote: > Olly Betts writes: > > Assuming nobody deleted the log file, this could be a Xapian bug. This I meant "lock file" not "log file" here. > > isn't something we're drowning in reports of, so presumably it doesn't > > trigger easily, so finding a way to reproduce would be
2013 Oct 13
2
trouble with user's right indexing with omega
Hi, I'm using omindex to index files and I want make query with user/group boolean prefix (I*, I at ... and I#...). That work well with "other" and "group" right, but not in all case for "user" right. Here is an example: assume that we have an user "ftp" not in "users" group. If file right are: -rw-r------ 1 ftp users 13 2013-10-06
2017 May 17
2
Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
Hi, I have a user reporting the following error during recoll indexing: flush() failed: Db block overwritten - are there multiple writers? "flush() failed" is from recoll, the rest is, I think the text of the Xapian exception. This is with Xapian 1.4.3 on Linux (I asked for more details, should be coming). I don't think that I've ever seen this error, and I also
2012 Apr 15
1
Patch for Initial Prototype implementation of Unigram Langauage Modelling in xapian-core.
Hi, I have implemented initial prototype of the Xapian::Weight subclass for Unigram Language Modelling to support UnigramLM weighing in xapian.Other changes include adding collection_frequency to TermFreqs struct to store collection frequency of terms and some changes to support it xapian Framework,Changing simplesearch.cc to search using UnigramLMWeight class. Following issues have not being addressed in this patch(I am working on following issues): 1. Log trick for handling multiplication for LM...
2018 Jan 22
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
>A possible workaround (and perhaps a better approach) would be to >set BoolWeight as the weighting scheme, then feed in your score as >a weight using a PostingSource. Then it's available via get_weight() >on the MSetIterator object: > >https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/postingsource.html > >You may find that's faster because
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
No es un problema de tm ni de SnowfallC ni de mcapply (por el path utilizas linux, en windows mcapply según el manual no va bien) No defines bien los objetos que pasas. Pasas doc.corpus en lugar de corpus ( o asignas a corpus en lugar de a doc.corpus) . Depura los programas cuando salga un error de objeto, como te pone en el Error que pasas . Temporalmente lo tienes arreglado en
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining
2012 Jul 09
1
Question about Document and TermIterator.get_termfreq()
Hi, While porting the unit tests from perl for the node binding I noticed a test failed. I basically create a document, add a few terms, add the document to a database and then call doc->termlist_begin().get_termfreq(). This throws "Can't get term frequency from a document termlist which is not associated with a database." What I think this means is that I can not call
2018 Jan 24
0
How to get the serialise score returned in Xapian::KeyMaker->operator().
On Tue, Jan 23, 2018 at 12:55:31AM +0800, 张少华 wrote: > We realise our score function using PostingSource instead of using > KeyMaker, we reference your python example and source code of xapian, > the simple demo is here. > https://github.com/xiangqianzsh/xapian_leaning/blob/master/postingsource/ExternalWeightPostingSource.h I'd just put the get_weight() and get_maxweight()
2017 Oct 16
2
Current master unit test errors
I'm preparing a pull request for the master branch and noticed that `make check` on a clone of the xapian repository fails badly. I haven't merged my changes and built from e24cc6018de0. Is is just me or is there something broken in the master branch? Running test './apitest' under valgrind Running tests with backend "none"... Running test: defaultctor1...
2011 Sep 12
1
findFreqTerms vs minDocFreq in Package 'tm'
I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I am getting drastically different results with both. I have given the results from both the
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
Creo que lo que quieres hacer necesita esta línea de código justo después de cargar el paquete tm: inmortal = unlist(strsplit(inmortal, " ", fixed = T)) De esta forma, trabajas con palabras, y NO con las frases enteras... Un saludo Isidro Hidalgo Arellano Observatorio Regional de Empleo Consejería de Empleo y Economía http://www.jccm.es > -----Mensaje original----- > De:
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
Muchas gracias isidro, a la noche reinstalo R y os digo si me ha funcionado. Perdona mi ignorancia de novato pero no he entendido muy bien eso de avisar al desarrollador. Entiendo que es a los de los paquetes, no? un saludo! ruben El 18 de junio de 2014, 13:10, Isidro Hidalgo <ihidalgo@jccm.es> escribió: > Ya he visto que tampoco así funciona. > Sí te puedo decir que me ha dejado
2017 Dec 15
5
How to get the serialise score returned in Xapian::KeyMaker->operator().
HI, all, I am a user of Xapian, and now I have a problem in using it. After using boolean terms to get some candidates of documents (still too much), we want sorted them by self-defined function which is used in Xapian::KeyMaker->operator(). But how can I get the serialise score in Xapian::MSetIterator object. c++ code likes this: class SortKeyMaker : public Xapian::KeyMaker { std::string
2020 Aug 23
2
MultiDatabase shard count limitations
..._pp_aassign 0.01% script/public-i libxapian.so.30.8.0 [.] std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, TermFreqs>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, TermFreqs> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<st...