On Tue, Sep 09, 2025 at 06:04:13PM +0300, ?????? wrote:> How to operate with non-english languages?
> I installed 1.4.29 version for Visual Studio 17. It works fine with
> English. Other languages ignore synonyms
> and?get_spelling_suggestion().
> Get_spelling_suggestion() function
> returns value only when?add_spelling() was apllied for particular non
> English?word.
Both should work for any language.
> I?ve also?tried to pass internal stemmer to?Xapian::Stem but g?t
> critical error.
You shouldn't try to do anything directly with internal stemmers -
they're internal implementation details, not part of the public
API.
If you want a Russian stemmer, use Xapian::Stem("ru") or
Xapian::Stem("russian") to create one.
> Code
Unhelpfully this isn't a complete program so I can't easily build it to
test it. My comments are just from reviewing the code.
> Xapian::WritableDatabase db("./index_data",
Xapian::DB_CREATE_OR_OPEN);
> db.add_synonym("?????", "?????");
> Xapian::TermGenerator indexer;
> indexer.set_database(db);
> indexer.set_flags(indexer.FLAG_SPELLING);
> std::string ru_doc_id1 = "id1";
> std::string ru_doc_content1 = "?????? ?????";
> std::string ru_doc_keylist1 = "?????? ?????";
> Xapian::Document doc1;
> doc1.add_term(ru_doc_id1);
Not directly relevant to your problems, but you really ought to prefix
your id term (prefix `Q` is the usual convention, so e.g. `Q1` not
`id1`). The problem with using `id1` is that a document containing the
word `id1` will also get indexed by term `id1` (terms are folded to lower
case so word `Q1` in a document gets indexed by term `q1` and there's
no collision with the id term.
Also better to use add_boolean_term() here as otherwise your id term
counts towards the document length.
> doc1.set_data(ru_doc_content1);
> indexer.set_document(doc1);
> indexer.index_text(ru_doc_keylist1);
> db.replace_document(ru_doc_id1, doc1);
> db.commit();
> db.close();
> Xapian::Database db1("./index_data");
> std::string word = "~?????";
> std::string corrected2 = db1.get_spelling_suggestion(word);
The get_spelling_suggestion() method takes a single word and the `~`
shouldn't be included here (it'll just be interpreted as part of the
misspelled word, so suggestions will need to be one edit closer to be
considered).
I'm guessing you're thinking of the syntax `~?????`? That's:
* A QueryParser syntax which is recognised in parsed user query strings
* A syntax for explicit expansion of *synonyms* not *spelling
correction* (the two are separate features)
* (Also it's only enabled if you specify FLAG_SYNONYM to the
QueryParser::parse_query(), as you do.)
If you want to check what the synonym(s) for term `?????` is/are then
you want:
std::string word = "?????";
for (auto t = db1.synonyms_begin(word); t != db1.synonyms_end(word); ++i) {
std::cout << "A synonym for " << word <<
" is " << *t << "\n";
}
If you want to test a spelling correction then you want a misspelling of
a word in the text you indexed:
// "??????" with first two characters transposed:
std::string word = "??????";
std::string corrected2 = db1.get_spelling_suggestion(word);
> Xapian::Query query = qp.parse_query(word,
Xapian::QueryParser::FLAG_SYNONYM);
Are you saying the synonym doesn't get expanded here? If it doesn't,
what does get_description() report:
std::cout << query.get_description() << "\n";
If you want spelling correction in the QueryParser you also want to
pass FLAG_SPELLING_CORRECTION. You probably also want FLAG_DEFAULT
assuming you also want to keep flags which are on by default, so:
Xapian::Query query = qp.parse_query(word,
Xapian::QueryParser::FLAG_DEFAULT |
Xapian::QueryParser::FLAG_SPELLING_CORRECTION |
Xapian::QueryParser::FLAG_SYNONYM);
Cheers,
Olly