thr3ads.net - similar to: "omindex => Unknown extension"

Displaying 20 results from an estimated 400 matches similar to: "omindex => Unknown extension"

2009 May 19

omindex options

Hi. I am writing a python equivalent of omindex (we are using scriptindex currently - but I wanted to use omindex instead, and extend it to work with our internal file format.. BUT did not want to compile code if possible... so anyway). I have tried to keep the code as close to possible to the omindex native code, but am facing a bit of confusion: what exactly is the reason for omindex to take

XAPIAN_FLUSH_THRESHOLD

2009 Jul 15

XAPIAN_FLUSH_THRESHOLD

I'm playing around with a machine that has 2 GB of memory. Indexing about 5GB of data average of 2MB per document. The documents are plain text. I notice the omindex's memory fott print get's biger an bigger then the machine starts to swap and it all slows down to a crawl. In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000 Am I right in saying that for my setup

antiword

2009 Apr 29

antiword

Hi guys, I've been noticing more and more that antiword has trouble with many word documents. It may look like it's converted a document but leaves out headings and bits of text. I've been looking into getting openoffice to do it in headless mode but still have a way to go before it's stable. I was wondering if anyone else had any luck on this front? One quick fix I have found

omega issues/notes

2016 Sep 27

omega issues/notes

All, I've run into a couple of things using omega/omindex under cygwin. I don't think I'd attribute them to xapian, omega or omindex, but wanted to get them out to the list so that if anyone else should run into these things down the road, hopefully someone will remember and be able to help. 1) after compiling and building omega, and doing make install, I get a set violation when

omindex one file at a time?

2012 Dec 13

omindex one file at a time?

Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping between my documents' filenames and the URLs where I hope search users to find them is, uh..., strange. The simplest thing (to me) would be to run omindex for each document, e.g. omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt ... and so on...

patch proposal: omindex library or daemon

2011 Oct 18

patch proposal: omindex library or daemon

Olly (looking at commit logs, I think this is your dept :-) For apps which re/index files frequently and need format conversion, I'd like to propose a patch for one of... Omindex library (thread safe): Omindex::init(options) // struct Omindex::options { ... } initialize mime_map, store default options session = new Omindex::Session(db_pathname) user threads use different sessions

omindex hangs while scanning

2009 Jun 20

omindex hangs while scanning

Hello, I was looking for a search engine for a small internal documentation site and found xapian and omega. Downloaded and compiled it using msys and ming on a german windows xp system. Finally installed apache on the same box. Following the omega example I copied the book to .../apache/htdocs and startet the omindex which hang up on the first document found. Even on very short doc with

Ticket #282: omindex-assorted-enhancements.patch woes

2009 Feb 02

Ticket #282: omindex-assorted-enhancements.patch woes

I would really like to try out the features in the patch above. But I can't ever seem to get the resulting omindex.cc to "make". I tried updating to rev 10801 from the SVN then run /bootstrap but then I seem to get errors compiling everything when I try and do "make" (I'm using ubuntu 8.10). So I thought I'd try an apply the patch to the latest stable version

How to omindex some sub-directories?

2013 May 15

How to omindex some sub-directories?

Given a directory tree like ... /foo | +-- A | +-- B | +-- C ... what is the best way to index A and C into a single Xapian database? AFAIK the alternatives are: omindex --db /my_db --no-delete /foo /foo/A omindex --db /my_db --no-delete /foo /foo/B or omindex --db /my_A_db /foo /foo/A omindex --db /my_B_db /foo /foo/B xapian-compact /my_A_db /my_B_db /my_db The first alternative does not

omega: omindex behaviour with duplicate files

2007 Jul 12

omega: omindex behaviour with duplicate files

Hi all I need a little clarification with regard to Omega's behaviour with 'duplicate' files when running 'omindex'. How is a duplicate recognised? Is it simply by file path? How is an unmodified file detected, if at all? I would like to set up subversion post-commit hook to update my index. If possible I would like to just update the index with the newly commited files.

Question about the ticket #743 omindex: delay libmagic checks

2017 Apr 20

Question about the ticket #743 omindex: delay libmagic checks

Hi, I'm working on the ticket #743 omindex: delay libmagic checks <https://trac.xapian.org/ticket/743>. As the ticket's Description mention, the call to libmagic is expensive than call the stat, so we can check the size by call the stat to get size before call libmagic to get a mime type. But how about the timestamps check? since timestamps check need to iterate the DB to check if

excluding child folders in omindex search

2010 Dec 15

excluding child folders in omindex search

hi there, is there an option to exclude child folders when running omindex? For example: omindex -p --db /var/blah/default --url /something /var/www --exclude /var/www/ignore Thanks, Jeff

Extract text from Microsoft PowerPoint files

2008 Oct 15

Extract text from Microsoft PowerPoint files

Hello CentOS people, I'm wondering if there are command tools like antiword and docx2txt for Microsoft PowerPoint files (.ppt and .pptx). The idea is to extract text from PowerPoint files. Sorry this isn't exactly about CentOS, but I'd really like it if Yum has something. I tried xlhtml, but it hasn't been updated in a while and isn't exactly wanting to work on CentOS

omindex and scriptindex question

2005 Mar 31

omindex and scriptindex question

Hi, I was researching indexing of text in omindex and scriptindex. While indexing text with omindex.cc possition of terms is saved with gap. This is not happening with scriptindex.cc While this is happening ? Another question is why in omindex.cc the term possition starts with 0 while in scriptindex it starts from 1 ? Code snippet from omindex.cc // Add postings for terms to the document

Proposed changes to omindex

2006 Aug 11

Proposed changes to omindex

Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp

Omindex.cc BSD bug

2006 Oct 02

Omindex.cc BSD bug

Hi guys: I was trying to index a large set of PDF documents using omindex and the system started to run out of forks (sh: fork temporarily unavailable) making the system unusable and probably skipping documents. I'm using MAC Osx Server 10.4.3 (Darwin/BSD) and GCC 4.0. The problem: On function stdout_to_string a popen is called, but is not closed properly (according the popen

Text-Extraction Libraries for Omindex

2019 Jun 14

Text-Extraction Libraries for Omindex

This is a list with some libraries that I have been looking at. The idea is to discuss the advantages and disadvantages of adding some of these libraries to Xapian. If anyone knows another library that could be add to the list it would be great! Libfreexl: * For Excel (.xls) * Last release: 2018-02 * Info: gaia-gis.it/fossil/freexl/index * License: MPL tri-license

strange smbstatus output after update from 2.2.5 to 2.2.8a

2003 Aug 18

strange smbstatus output after update from 2.2.5 to 2.2.8a

Hi, I updated from Samba 2.2.5 to 2.2.8a. und Redhat 6.2 and since then, smbstatus shows strange values The dates and the pid's are wrong. Any ideas? Was there a change in a locking.tdb? Do I have to remove it? Greetings Hansj?rg 28643 DENY_NONE 0x3f407996 RDONLY LEVEL_II ?7??W? Tue Jan 6 22:38:12 1970 1633824626 0x706d622e NONE ?7??W? Thu Jan 1 02:00:00 1970 1836017711 0x56414143

[python indexer] add meta informations

2009 Nov 11

[python indexer] add meta informations

Hello, I'm trying to index some blog stuff through python bindings. I'd like to know how to add some informations (url, title, date, and so on) so that I can reach them through a xapian.Enquire object.. I believe it's something to be set in xapian.TermGenerator(), but... I can't manage to find which function. I'm waiting for something like : xtermgen.add_meta('url',

indexing mostly-binary documents (.ppt)

2007 Apr 01

indexing mostly-binary documents (.ppt)

Here''s an interesting problem: In my app, we are indexing various types of documents, including microsoft powerpoint. Powerpoint documents are mostly binary, but have a bunch of text (all of the text in the document?) as well. My thinking is that the binary will never get searched for, and the proper text will be indexed and queried as expected, so the indexed binary will never

similar to: omindex => Unknown extension