Displaying 20 results from an estimated 700 matches similar to: "Integrated Chinese tokenizer SCWS in xapian-core"
2011 Sep 19
2
New scws patch for xapian-core based on svn trunk
Hi, I have already re-make the patch file, based on the trunk codes in SVN respo.
But I can not build the xapian, because there are many errors when building 'languages/' for stemmer. Therefore, I could not test the new patching code.
Patching code can be accessed from:
http://www.xunsearch.com/download/xapian-scws-1.3.x-trunk.patch
Need to install SCWS first as below steps:
1.
2007 Sep 20
3
Incorrect get_matches_estimated() of Xapian::Mset
Hello, As I know, get_matches_estimated() return an estimate for the number of documents with matches the query.
But now, I found it get a disparity between the return value and real mathced number. For an example: the real matched number is 58, but the return value is 458; so when the users click the hinder page, get a blank page ... so they often complain to me.
I found that the main reason is
2012 Jan 04
2
[issue] The difference between QueryParser::FLAG_AUTO_SYNONYMS and QueryParser::FLAG_AUTO_MULTIWORD_SYNONYMS
I don't know whether this is a BUG or for special purpose...
According to the definition of "xapian/queryparser.h", FLAG_AUTO_MULTIWORD_SYNONYMS contains bit of
FLAG_AUTO_SYNONYMS .
Therefore, long as I set the parse flag with FLAG_AUTO_SYNONYMS, the query parser will automatically activate
the function of FLAG_AUTO_MULTIWORD_SYNONYMS. See the below source code part from
2007 Nov 14
1
Problem indexing text with spelling enabled in Perl
Hi All,
I'm using the TermGenerator::index_text() on version 1.0.4 with the
FLAG_SPELLING turned on, because the new spelling suggestion stuff
seems awesome, but I'm getting a segv.
(gdb) bt
#0 0xb7ae153c in Xapian::WritableDatabase::add_spelling
(this=0xa553988, word=@0xbff97724, freqinc=1) at ./include/xapian/
base.h:154
#1 0xb7becf47 in
2007 Dec 17
1
Crashes with spelling enabled and perl.
Hi Guys,
Here's a simple test case that causes a segfault with the perl
bindings patched to enable spelling correction:
use strict;
use warnings;
use Search::Xapian;
my $db = Search::Xapian::WritableDatabase->new("test.db",
Search::Xapian::DB_CREATE_OR_OPEN);
if (!defined($db)) {
die("Failed to open xapian_database: $!");
}
my $indexer =
2015 Jul 26
1
Get term from document by position
mple (see attachment).
>
> Attachments get stripped out by the mailing list, so I?ve made a private gist of the two files here: <https://gist.github.com/jaylett/ce8455b37e2b84422346>.
>
> Actually, when I run it I get 0 matches, which would explain why you?re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to
2007 Dec 29
3
Term-Flags
Hi,
Is it necessary to set the down below flag to the TermGenerator,
if I want the "Did you mean ..." spelling corrections?
Xapian::TermGenerator::flags::FLAG_SPELLING
Thank you very much
Markus
2015 Jun 10
1
make check xapian-bindings-1.2.21 & Search-Xapian-1.2.21.0
Eric Lindblad
http://www.ericlindblad.blogspot.com
- - -
Slackware-14.0
bash-4.2# make check
Making check in perl
make[1]: Entering directory `/home/eric/xapian-bindings-1.2.21/perl'
make check-am
make[2]: Entering directory `/home/eric/xapian-bindings-1.2.21/perl'
make check-TESTS
make[3]: Entering directory `/home/eric/xapian-bindings-1.2.21/perl'
./t/01use.t .. ok
All tests
2008 Mar 12
1
how can i use stopwords?
Hi,
I do not understand the stopword function...
I've set the termgenerator like this:
$self->{'Stemmer'} = new Search::Xapian::Stem(german2);
$self->{'Stopper'} = new Search::Xapian::SimpleStopper();
$self->{'TermGenerator'} = new Search::Xapian::TermGenerator;
$self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2010 Oct 24
1
Cannot index with dynamic spelling data (Perl/Search::Xapian)
This is my test case, what am I doing wrong? It seems that the API is used
incorrectly, but I cannot find the problem...
--- 8< ---
#!/usr/bin/perl
use Search::Xapian qw(:all);
use strict;
my $xa = new Search::Xapian::WritableDatabase ("/tmp/xapian",
DB_CREATE_OR_OVERWRITE);
my $indexer = Search::Xapian::TermGenerator->new();
2010 Jun 09
1
TermGenerator incorrectly tokenizes German text which contains special characters
Dear Xapian users,
I try to index some German text with Xapian using the xapian_php bindings. I
run Apache 2.2 on Windows using PHP 5.2.13 with the pre build xapian
bindings from Flax:
Xapian Support enabled Xapian
Compiled Version @PACKAGE_VERSION@
Xapian Linked Version 1.2.0
The problem is that after indexing text which contains special characters
like ?, ?, ? and ?, using
2018 Nov 30
1
Xapian Benchmark results
Hi,
I am currently trying to benchmark a multithreaded xapian implementation on
a chameleon baremetal instance written in C++. My workload is a 3 Gig
wikipedia xml dump consisting of ~286 file of different sizes. My results
are showing me that indexing on xapian is an order of magnitude faster than
my lucene and lucene plusplus implementations. This is a result that I did
not expect. Just want to
2014 Jan 27
4
Perl Search::Xapian
Hi,
Trying to learn Search::Xapian and be better at perl at the same time,
I'm stuck, at the DB_CREATE_OR_OPEN error. Perl says this:
~/dev/sandbox/Xapian-perl$ ./Index1-Xap.pl 100-objects-v1.csv db
"db" is not exported by the Search::Xapian module
Can't continue after import errors at ./Index1-Xap.pl line 7.
BEGIN failed--compilation aborted at ./Index1-Xap.pl line 7.
What I
2008 Sep 16
1
Some Questions From the beginner of Xapian
Dear, guys:
I am a beginner of Xapian, when reading the documents, I encountered follow questions.
(1) I see the Xapian::Document has a method
void add_value (Xapian::valueno valueno, const std::string &value)
What's the purpose of this method? Document will related to the terms, but what's the purpose of this?
(2) add_posting method will add term to a documents.
void
2011 Jul 27
3
Searching using prefixes
Hi guys
I'm trying to figure out how I can use probabilistic searching on a
given field within a document; I've written to the list about this
before, but haven't quite figured out what's required and, following a
little research, I think I understand what I need to do but I'd like a
clarification on this.
o We have a database of a number of documents, with fields: title,
2010 May 27
1
Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit:
> On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote:
> > I try to remove stop words during the index process
> and I have no stemming.
> I have tried with a simple example but it does not
> work at all.
>
> > I have my writableDatabase and my termGenerator
> (indexer) and they
2014 Feb 27
2
Summer of Code help
I think there is a development in the bug #616.
The exception obtained is:
Exception in thread "main" java.lang.IllegalArgumentException: No enum
class org.xapian.TermGenerator$flags with value 0
at org.xapian.TermGenerator$flags.swigToEnum(TermGenerator.java:143)
at org.xapian.TermGenerator.setFlags(TermGenerator.java:71)
at org.xapian.examples.SimpleIndex.main(SimpleIndex.java:54)
2012 Nov 26
1
Word missing after stemmed with Norwegian in Search::Xapian::TermGenerator
Hi all Xapian-devel,
Gist: https://gist.github.com/10d2222d8bffe8d7631d
I'm using Xapian-TermGenerator to extract Norwegian sentences to vsm
(vector space model) using TermGenerator. But when I test generating vsm
from 'Truet med ? stevne misforn?yd PC-kunde - PC-leverand?ren Asus likte
sv?rt d?rlig kundens misforn?yde leserbrev.' It doen't return 'asus' result
in vsm.
2019 Mar 07
3
Ask for advice on exact requirements to fix #699 mixed CJK numbers
I am working on "#699 Better tokenisation of mixed CJK numbers",
and have implemented a partial patch of Chinese for this ticket.
Current code works well with special test cases and
all tests in xapian-core could still pass.
But I'm confused with exact requirements of the question,
for how much we could pay with performance on enabling more cases,
and if there are better methods to
2012 Jun 04
1
Search not finding queries with stop words.
I have a search in perl that looks a bit like:
my $qp = new Search::Xapian::QueryParser();
$qp->set_stemmer(new Search::Xapian::Stem("english"));
$qp->set_stemming_strategy(STEM_SOME);
$qp->set_default_op($defaultop);
...
my $par = $qp->parse_query($query);
my $enq = $xDatabase->enquire( $par );
and in the db create script:
my $stopper =