thr3ads.net - Xapian discuss - omega issues/notes [Sep 2016]

If this information is useful, please help other people find it:
Share via:

John Bankert

2016-Sep-27 20:32 UTC

omega issues/notes

All,

I've run into a couple of things using omega/omindex under cygwin. I
don't
think I'd attribute them to xapian, omega or omindex, but wanted to get
them out to the list so that if anyone else should run into these things
down the road, hopefully someone will remember and be able to help.

1) after compiling and building omega, and doing make install, I get a set
violation when trying to run omindex from it's installed location under
cygwin. I worked around this by copying various required windows dll files
into the same directory as omindex.exe and presto, success.
2) There appears to be some sort of weird path issue in using omindex in
the cywin bash shell. using the path /www/example/product, should in cygwin
bash, act as a fully defined path the directory to be indexed by omindex.
This is not the case. I had to product a relative path from where
omindex.exe was running in order to successfully index the files in
/www/example/product.

This next bit is me wondering about the output I've gotten.

John at win-7-test ~/xapian-omega-1.4.0
$ ls -al ../../../www/example/msproducts/
total 357
drwx------+ 1 John None      0 Sep 27 16:25 .
drwx------+ 1 John None      0 Sep 27 15:41 ..
-rwx------+ 1 John None  32476 Sep 14 15:18 100-objects-v1.csv
-rwx------+ 1 John None  32477 Sep 14 15:19 100-objects-v2.csv
-rwx------+ 1 John None  14228 Aug 31 11:41 burger.docx
-rwx------+ 1 John None  19034 Jun 30 12:15 hotdog.docx
-rwx------+ 1 John None  10538 Sep 14 15:30 index.html
-rwx------+ 1 John None 137728 Jun 30 12:15 sausage.doc
-rwx------+ 1 John None  71536 Sep 14 15:21 states.csv
-rwx------+ 1 John None    541 Sep 14 15:21 us_states_on_wikipedia.html
-rwx------+ 1 John None  29824 Aug 31 15:08 zlib_how.html

John at win-7-test ~/xapian-omega-1.4.0
$ ./omindex -v --db omtest.db --url msproducts
../../../www/example/msproducts/

John at win-7-test ~/xapian-omega-1.4.0
$ [Entering directory ""]
Indexing "100-objects-v1.csv" as text/csv ... added
Indexing "100-objects-v2.csv" as text/csv ... added
Indexing "burger.docx" as
application/vnd.openxmlformats-officedocument.wordproc
essingml.document ... The system cannot find the path specified.
Skipping - "unzip -p
"..\..\..\www\example\msproducts\burger.docx"
word/document
.xml 'word/header*.xml' 'word/footer*.xml' 2>/dev/null"
failed
Indexing "hotdog.docx" as
application/vnd.openxmlformats-officedocument.wordproc
essingml.document ... The system cannot find the path specified.
Skipping - "unzip -p
"..\..\..\www\example\msproducts\hotdog.docx"
word/document
.xml 'word/header*.xml' 'word/footer*.xml' 2>/dev/null"
failed
Indexing "index.html" as text/html ... added

omindex stops when it hits sausage.doc, and echo $? returns 0, so I've no
idea why it doesn't want to process an ms word .doc file, although I
suspect it may be related to the inability to process the .docx files. I
should note that I performing this work on a windows VM that does not have
MS office or open office installed, if that makes a difference.

Again, thanks to all who have been offering help

John

Olly Betts

2016-Oct-18 13:02 UTC

head link

omega issues/notes

On Tue, Sep 27, 2016 at 04:32:33PM -0400, John Bankert
wrote:> I've run into a couple of things using omega/omindex under cygwin. I
don't
> think I'd attribute them to xapian, omega or omindex, but wanted to get
> them out to the list so that if anyone else should run into these things
> down the road, hopefully someone will remember and be able to help.
> 
> 1) after compiling and building omega, and doing make install, I get a set
> violation when trying to run omindex from it's installed location under
> cygwin. I worked around this by copying various required windows dll files
> into the same directory as omindex.exe and presto, success.
I've no idea what a "set violation" is - is that a typo for
"seg violation"
(short for "segmentation violation")?

Not sure I can offer much insight into this though - I haven't had to
wrangle
DLLs for several decades.
> 2) There appears to be some sort of weird path issue in using omindex in
> the cywin bash shell. using the path /www/example/product, should in cygwin
> bash, act as a fully defined path the directory to be indexed by omindex.
> This is not the case. I had to product a relative path from where
> omindex.exe was running in order to successfully index the files in
> /www/example/product.
I tried to set up a build on appveyor to reproduce this, but it works for me:

https://ci.appveyor.com/project/ojwb/xapian/build/1.0.30

In particular:

    bash -c 'xapian-applications/omega/omindex -v --db omtest.db --url
msproducts /www/example/products/'
    [Entering directory ""]
    Indexing "example.docx" as
application/vnd.openxmlformats-officedocument.wordprocessingml.document ...
Skipping - "unzip -p '/www/example/products/example.docx'
word/document.xml 'word/header*.xml' 'word/footer*.xml'
2>/dev/null" failed
    Indexing "html.htm" as text/html ... added
    Indexing "sample.doc" as application/msword ... Skipping -
"antiword -mUTF-8.txt '/www/example/products/sample.doc'"
failed
    Indexing "text.txt" as text/plain ... added

I didn't install "unzip" or "antiword", so that's
what I'd expect to happen.
> This next bit is me wondering about the output I've gotten.
> 
> John at win-7-test ~/xapian-omega-1.4.0
> $ ls -al ../../../www/example/msproducts/
> total 357
> drwx------+ 1 John None      0 Sep 27 16:25 .
> drwx------+ 1 John None      0 Sep 27 15:41 ..
> -rwx------+ 1 John None  32476 Sep 14 15:18 100-objects-v1.csv
> -rwx------+ 1 John None  32477 Sep 14 15:19 100-objects-v2.csv
> -rwx------+ 1 John None  14228 Aug 31 11:41 burger.docx
> -rwx------+ 1 John None  19034 Jun 30 12:15 hotdog.docx
> -rwx------+ 1 John None  10538 Sep 14 15:30 index.html
> -rwx------+ 1 John None 137728 Jun 30 12:15 sausage.doc
> -rwx------+ 1 John None  71536 Sep 14 15:21 states.csv
> -rwx------+ 1 John None    541 Sep 14 15:21 us_states_on_wikipedia.html
> -rwx------+ 1 John None  29824 Aug 31 15:08 zlib_how.html
> 
> John at win-7-test ~/xapian-omega-1.4.0
> $ ./omindex -v --db omtest.db --url msproducts
> ../../../www/example/msproducts/
Hmm, I notice here you have "www/example/msproducts", but above you
said
"/www/example/product" - "msproducts" vs
"product".  Could that be the
problem, or was the earlier one just a typo or hypothetical example?
> John at win-7-test ~/xapian-omega-1.4.0
> $ [Entering directory ""]
What was the exact command line you used to run the indexer here?  It seems to
have got lost from the paste, and would be useful to know.
> Indexing "100-objects-v1.csv" as text/csv ... added
> Indexing "100-objects-v2.csv" as text/csv ... added
> Indexing "burger.docx" as
> application/vnd.openxmlformats-officedocument.wordproc
> essingml.document ... The system cannot find the path specified.
> Skipping - "unzip -p
"..\..\..\www\example\msproducts\burger.docx"
> word/document
> .xml 'word/header*.xml' 'word/footer*.xml'
2>/dev/null" failed
> Indexing "hotdog.docx" as
> application/vnd.openxmlformats-officedocument.wordproc
> essingml.document ... The system cannot find the path specified.
> Skipping - "unzip -p
"..\..\..\www\example\msproducts\hotdog.docx"
> word/document
> .xml 'word/header*.xml' 'word/footer*.xml'
2>/dev/null" failed
> Indexing "index.html" as text/html ... added
> 
> omindex stops when it hits sausage.doc, and echo $? returns 0, so I've
no
> idea why it doesn't want to process an ms word .doc file, although I
> suspect it may be related to the inability to process the .docx files. I
> should note that I performing this work on a windows VM that does not have
> MS office or open office installed, if that makes a difference.
That shouldn't matter.  By default omindex will try to use unzip (and
internal
XML parsing) for .docx and antiword for .doc.

I can't see why it shouldn't try to handle the other files in the
directory
- in my test it continues after both the .docx and .doc failures.

Cheers,
    Olly

Seemingly Similar Threads

Search for more seemingly similar threads

Xapian discuss - Sep 2016 - omega issues/notes

omega issues/notes

omega issues/notes

Seemingly Similar Threads