similar to: omindex options

Displaying 20 results from an estimated 3000 matches similar to: "omindex options"

2009 Apr 06
2
omindex => Unknown extension
Hi all, I'm having a recurrent problem with Omega's indexing. When I run omindex, it sometimes misses to recognize the extension of some files (.doc, .pdf) and skips them. In the same run, omindex is otherwise perfectly able to index other files with same extensions. The reason is not clear but it should occur before it selects a content converter since for example, if I manually run
2009 Jul 15
2
XAPIAN_FLUSH_THRESHOLD
I'm playing around with a machine that has 2 GB of memory. Indexing about 5GB of data average of 2MB per document. The documents are plain text. I notice the omindex's memory fott print get's biger an bigger then the machine starts to swap and it all slows down to a crawl. In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000 Am I right in saying that for my setup
2009 Feb 02
2
Ticket #282: omindex-assorted-enhancements.patch woes
I would really like to try out the features in the patch above. But I can't ever seem to get the resulting omindex.cc to "make". I tried updating to rev 10801 from the SVN then run /bootstrap but then I seem to get errors compiling everything when I try and do "make" (I'm using ubuntu 8.10). So I thought I'd try an apply the patch to the latest stable version
2008 Jul 30
3
Dealing with image PDF's
Guys, I was just playing around and added a bit of code to omindex.cc so I could ocr tiff and tif with gocr which seems to work. Here's what it looks like: // Tiff: } else if (startswith(mimetype, "image/tif")) { // Inspired by http://mjr.towers.org.uk/comp/sxw2text string safefile = shell_protect(file); string cmd = "tifftopnm " + safefile + "
2008 Jul 30
3
Dealing with image PDF's
Guys, I was just playing around and added a bit of code to omindex.cc so I could ocr tiff and tif with gocr which seems to work. Here's what it looks like: // Tiff: } else if (startswith(mimetype, "image/tif")) { // Inspired by http://mjr.towers.org.uk/comp/sxw2text string safefile = shell_protect(file); string cmd = "tifftopnm " + safefile + "
2011 Oct 18
2
patch proposal: omindex library or daemon
Olly (looking at commit logs, I think this is your dept :-) For apps which re/index files frequently and need format conversion, I'd like to propose a patch for one of... Omindex library (thread safe): Omindex::init(options) // struct Omindex::options { ... } initialize mime_map, store default options session = new Omindex::Session(db_pathname) user threads use different sessions
2013 May 15
1
How to omindex some sub-directories?
Given a directory tree like ... /foo | +-- A | +-- B | +-- C ... what is the best way to index A and C into a single Xapian database? AFAIK the alternatives are: omindex --db /my_db --no-delete /foo /foo/A omindex --db /my_db --no-delete /foo /foo/B or omindex --db /my_A_db /foo /foo/A omindex --db /my_B_db /foo /foo/B xapian-compact /my_A_db /my_B_db /my_db The first alternative does not
2017 Apr 20
2
Question about the ticket #743 omindex: delay libmagic checks
Hi, I'm working on the ticket #743 omindex: delay libmagic checks <https://trac.xapian.org/ticket/743>. As the ticket's Description mention, the call to libmagic is expensive than call the stat, so we can check the size by call the stat to get size before call libmagic to get a mime type. But how about the timestamps check? since timestamps check need to iterate the DB to check if
2009 Jun 20
3
omindex hangs while scanning
Hello, I was looking for a search engine for a small internal documentation site and found xapian and omega. Downloaded and compiled it using msys and ming on a german windows xp system. Finally installed apache on the same box. Following the omega example I copied the book to .../apache/htdocs and startet the omindex which hang up on the first document found. Even on very short doc with
2010 Dec 15
2
excluding child folders in omindex search
hi there, is there an option to exclude child folders when running omindex? For example: omindex -p --db /var/blah/default --url /something /var/www --exclude /var/www/ignore Thanks, Jeff
2009 Apr 29
1
"DatabaseCorruptError: Cannot open tables at consistent revisions"
Ocassionally when I'm searching using Omega I get: "DatabaseCorruptError: Cannot open tables at consistent revisions" If I click reload it's all ok, is this the database being updated?, is there a way to avoid the message? Frank
2007 Jul 12
1
omega: omindex behaviour with duplicate files
Hi all I need a little clarification with regard to Omega's behaviour with 'duplicate' files when running 'omindex'. How is a duplicate recognised? Is it simply by file path? How is an unmodified file detected, if at all? I would like to set up subversion post-commit hook to update my index. If possible I would like to just update the index with the newly commited files.
2012 Dec 13
1
omindex one file at a time?
Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping between my documents' filenames and the URLs where I hope search users to find them is, uh..., strange. The simplest thing (to me) would be to run omindex for each document, e.g. omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt ... and so on...
2009 Feb 04
2
wildcard support (left truncation)
Dose Xapian support wildcards (left truncation)? E.g. *ildcard.doc or *.doc or Wild*.doc I read a post from Olly in 2005 that said it wasn't supported yet, I was wonder if there had been any progress or easy work around since. I mainly need when users want to search by the filename extension. Thanks, Frank
2005 Mar 31
1
omindex and scriptindex question
Hi, I was researching indexing of text in omindex and scriptindex. While indexing text with omindex.cc possition of terms is saved with gap. This is not happening with scriptindex.cc While this is happening ? Another question is why in omindex.cc the term possition starts with 0 while in scriptindex it starts from 1 ? Code snippet from omindex.cc // Add postings for terms to the document
2019 Jun 14
2
Text-Extraction Libraries for Omindex
This is a list with some libraries that I have been looking at. The idea is to discuss the advantages and disadvantages of adding some of these libraries to Xapian. If anyone knows another library that could be add to the list it would be great! Libfreexl: * For Excel (.xls) * Last release: 2018-02 * Info: gaia-gis.it/fossil/freexl/index * License: MPL tri-license
2017 Apr 23
2
Question about the ticket #743 omindex: delay libmagic checks
> > I'd suggest to start with you just look at moving the libmagic check after > the filesize checks, so you don't need to get into whether libmagic or > the database check is cheaper on average. hi, Olly, I have moved the libmagic check after the filesize check directly, https://github.com/caiyulun/xapian/commit/3a97d9ee5397fa900a473aa9b3d8eeb720177a4e can you provide
2016 Sep 27
1
omega issues/notes
All, I've run into a couple of things using omega/omindex under cygwin. I don't think I'd attribute them to xapian, omega or omindex, but wanted to get them out to the list so that if anyone else should run into these things down the road, hopefully someone will remember and be able to help. 1) after compiling and building omega, and doing make install, I get a set violation when
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
Hi Parth, I?ve implemented SVMRanker class and also sorted out most of current Letor APIs. Now I?m trying to use INEX dataset to verify my implement. But I stuck in the indexing part. You said in the documentation that we have to add prefix when indexing. Also I notice that you set some metadata in omindex.cc of your version. But the omindex.cc has changed since 2011. I think that?s why my result