Hi,
since trac doesn't respond to my subscription request, I'll try this
way.
TLDR: echo "krankenkassen" > i/input.txt; omindex i/
--stemmer=german;
quest "krankenkassen" -> not found. see
https://gist.github.com/anonymous/609f82a065f3d0ac6b1d077073be286f for
full script & output
LONG:
- create an omindex db with the lower case word "krankenkassen"
- create an 'omindex' with --stemmer=german
- try to find the words "Krankenkassen" and "krankenkassen
result: "Krankenkassen" is found, "krankenkassen" isn't.
------------------ rrrrrrrrrrrrrrrip -----------------
#!/bin/bash -x
# directory with txt files
INPUT=./testinput
LOWER=bewirtung
UPPER=Bewirtung
mkdir ${INPUT}
# store one lowercase word
echo ${LOWER} > ${INPUT}/lower.txt
# who am i
omindex --version
quest --version
# clean up database 8)
rm -rf testdb
# create omega index, url doesn't matter
omindex --verbose --db=testdb --url=/bla ${INPUT}
# query database for word in Upper and lower case
quest --db=testdb ${UPPER} | tee test-nostem.out
quest --db=testdb ${LOWER} | tee -a test-nostem.out
# should have been fine.
# now ... clean up the database 8)
rm -rf testdb
# create omega index, use german stemmer
omindex --verbose --db=testdb --url=/bla --stemmer=german ${INPUT}
# try again and query database for word in Upper and lower case
quest --db=testdb ${UPPER} | tee test-stem.out
quest --db=testdb ${LOWER} | tee -a test-stem.out
# the 'lower case' one should fail. which is weird.
diff test-nostem.out test-stem.out
------------------ rrrrrrrrrrrrrrrap -----------------
and the resulting output:
------------------ rrrrrrrrrrrrrrrip -----------------
+ INPUT=./testinput
+ LOWER=krankenkassen
+ UPPER=Krankenkassen
+ mkdir ./testinput
mkdir: cannot create directory ‘./testinput’: File exists
+ echo krankenkassen
+ omindex --version
omindex - xapian-omega 1.4.3
+ quest --version
quest - xapian-core 1.4.3
+ rm -rf testdb
+ omindex --verbose --db=testdb --url=/bla ./testinput
[Entering directory ""]
Indexing "lower.txt" as text/plain ... added
+ quest --db=testdb Krankenkassen
+ tee test-nostem.out
Parsed Query: Query(krankenkassen at 1)
MSet:
1: [0.154151]
url=/bla/lower.txt
sample=krankenkassen
type=text/plain
modtime=1491300443
size=14
+ quest --db=testdb krankenkassen
+ tee -a test-nostem.out
Parsed Query: Query(Zkrankenkassen at 1)
MSet:
1: [0.154151]
url=/bla/lower.txt
sample=krankenkassen
type=text/plain
modtime=1491300443
size=14
+ rm -rf testdb
+ omindex --verbose --db=testdb --url=/bla --stemmer=german ./testinput
[Entering directory ""]
Indexing "lower.txt" as text/plain ... added
+ quest --db=testdb Krankenkassen
+ tee test-stem.out
Parsed Query: Query(krankenkassen at 1)
MSet:
1: [0.154151]
url=/bla/lower.txt
sample=krankenkassen
type=text/plain
modtime=1491300443
size=14
+ quest --db=testdb krankenkassen
+ tee -a test-stem.out
Parsed Query: Query(Zkrankenkassen at 1)
MSet:
+ diff test-nostem.out test-stem.out
11,16d10
< 1: [0.154151]
< url=/bla/lower.txt
< sample=krankenkassen
< type=text/plain
< modtime=1491300443
< size=14
------------------ rrrrrrrrrrrrrrrap -----------------
and for completeness:
# xapian-delve -a testdb
All terms in database: D20170404 Etxt Flower I* J/bla M201704 Owwwutz
P/bla Ttext/plain U/bla/lower.txt Y2017 ZFlow Zkrankenkass krankenkassen
Funny enough singular ("Krankenkasse") works fine 8)
I'm a complete xapian noob, so what am I doing wrong ?
cheers,
Peter Marquardt