similar to: what about the efficiency of building indexes

Displaying 20 results from an estimated 20000 matches similar to: "what about the efficiency of building indexes"

2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
On Tue, Mar 11, 2014 at 12:02:15PM +0100, Parth Gupta wrote: > During the indexing with omindex, only you need to make sure is indexing > with prefix 'S' for title as explained here in Letor documentation: > xapian-letor/docs/letor.rst > > Previously when I edited omindex.cc it was modified as can be seen >
2014 Mar 17
2
[GSOC 2014] Indexing INEX dataset
Hi Olly, Wouldn't setting the weight of terms in title back to normal (e.g. 5 to 1) by below line, automatically adjust the wdfs and field lengths? indexer.index_text(title, 5, "S"); -> indexer.index_text(title, 1, "S"); if it does not then we should include that part in the patch too. I like to create a patch for xapian-letor for resolving common code of xapian.
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
On Tue, Mar 11, 2014 at 03:20:31PM +0100, Parth Gupta wrote: > > > > On current trunk, we index the title with prefix "S" by default in > > omindex, though with a wdf inc of 5 rather than 1: > > > > indexer.index_text(title, 5, "S"); > > > > So I don't think you need that change to omindex now. > > Yes, but please
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
Hi Parth, I?ve implemented SVMRanker class and also sorted out most of current Letor APIs. Now I?m trying to use INEX dataset to verify my implement. But I stuck in the indexing part. You said in the documentation that we have to add prefix when indexing. Also I notice that you set some metadata in omindex.cc of your version. But the omindex.cc has changed since 2011. I think that?s why my result
2016 Sep 27
1
omega issues/notes
All, I've run into a couple of things using omega/omindex under cygwin. I don't think I'd attribute them to xapian, omega or omindex, but wanted to get them out to the list so that if anyone else should run into these things down the road, hopefully someone will remember and be able to help. 1) after compiling and building omega, and doing make install, I get a set violation when
2004 Jun 28
2
[Fwd: Irix install of omega fails.]
OK, I'll try again. Thanks, Jim. -------------- next part -------------- An embedded message was scrubbed... From: Jim Lynch <jwl@sgi.com> Subject: Irix install of omega fails. Date: Mon, 28 Jun 2004 14:16:46 -0400 Size: 2057 Url: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20040628/212669c1/Irixinstallofomegafails.eml
2014 Mar 22
2
[GSOC 2014] Indexing INEX dataset
For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere
2009 Jun 20
3
omindex hangs while scanning
Hello, I was looking for a search engine for a small internal documentation site and found xapian and omega. Downloaded and compiled it using msys and ming on a german windows xp system. Finally installed apache on the same box. Following the omega example I copied the book to .../apache/htdocs and startet the omindex which hang up on the first document found. Even on very short doc with
2017 Apr 20
2
Question about the ticket #743 omindex: delay libmagic checks
Hi, I'm working on the ticket #743 omindex: delay libmagic checks <https://trac.xapian.org/ticket/743>. As the ticket's Description mention, the call to libmagic is expensive than call the stat, so we can check the size by call the stat to get size before call libmagic to get a mime type. But how about the timestamps check? since timestamps check need to iterate the DB to check if
2009 Feb 02
2
Ticket #282: omindex-assorted-enhancements.patch woes
I would really like to try out the features in the patch above. But I can't ever seem to get the resulting omindex.cc to "make". I tried updating to rev 10801 from the SVN then run /bootstrap but then I seem to get errors compiling everything when I try and do "make" (I'm using ubuntu 8.10). So I thought I'd try an apply the patch to the latest stable version
2009 Apr 06
2
omindex => Unknown extension
Hi all, I'm having a recurrent problem with Omega's indexing. When I run omindex, it sometimes misses to recognize the extension of some files (.doc, .pdf) and skips them. In the same run, omindex is otherwise perfectly able to index other files with same extensions. The reason is not clear but it should occur before it selects a content converter since for example, if I manually run
2013 Oct 13
2
trouble with user's right indexing with omega
Hi, I'm using omindex to index files and I want make query with user/group boolean prefix (I*, I at ... and I#...). That work well with "other" and "group" right, but not in all case for "user" right. Here is an example: assume that we have an user "ftp" not in "users" group. If file right are: -rw-r------ 1 ftp users 13 2013-10-06
2024 Apr 22
2
How to use Xapian Omega directly (i.e., without using `recoll` and `xapiandb`) ... Full Set Of Questions Below:
Dear senior ML members and developers of Xapian Omega, Mr. Olly has helped me cross the bump of the initial learning curve. (ref: https://lists.xapian.org/pipermail/xapian-discuss/2024-April/010034.html) How can I use Xapian Omega directly (i.e., without using `recoll` and `xapiandb`) to index a directory of text files with all strings greater than 3 characters, to create an index text file
2013 Feb 27
2
Reading a password-protected PDF
Hello respected developers, I was wondering if it is possible for xapian to read a password-protected PDF. Searches in the archives and google had yield 0 results. I also tried looking at the source code but I could not find the specific one related to this issue. The characteristic of the set of PDF is as: 1. a set of password protected PDF documents 2. all PDF is set with the same password. 3.
2009 Jul 24
2
redhat rpm install and quick start
I installed Xapian and Omega following the instructions for the RHEL 5 RPM package found at xapian.org. I was going to perform the quick start instructions, but there is not "omnidex" for the omindex --db DBPATH --url / WEBPATH command. The quick start also mentions running omega from usr/lib/omega/bin; however that was not created as well using the RHEL 5 RPM. Is there a walkthrough on
2013 May 15
1
How to omindex some sub-directories?
Given a directory tree like ... /foo | +-- A | +-- B | +-- C ... what is the best way to index A and C into a single Xapian database? AFAIK the alternatives are: omindex --db /my_db --no-delete /foo /foo/A omindex --db /my_db --no-delete /foo /foo/B or omindex --db /my_A_db /foo /foo/A omindex --db /my_B_db /foo /foo/B xapian-compact /my_A_db /my_B_db /my_db The first alternative does not
2011 Apr 27
2
Omindex: what are the default numbered indexes?
> -----Original Message----- > Date: Tue, 26 Apr 2011 13:35:20 +0100 > From: James Aylett <james-xapian at tartarus.org> > Subject: Re: [Xapian-discuss] Omindex: what are the default numbered > indexes? > To: <xapian at catcons.co.uk> <xapian at catcons.co.uk> > Cc: 'Xapian Discussion' <xapian-discuss at lists.xapian.org> > Message-ID:
2004 Jun 01
1
Searching without flush?
Hi, I am using the Xapian-0.8.0 snapshot from 15-Apr-2004 02:14, and I am using the same Xapian::WritableDatabase instance for indexing and searching. Currently each search causes a database flush, which is slow. How can I avoid this flush? It seems that I have to modify Xapian to either - search only the already flushed data (eventually missing some hits) or - search the un-flushed data, too.
2007 Feb 09
1
PHP Binding and dbi2omega questions
Hi All, I've installed Xapian and the php module. I've set up a script for use with scriptindex and dbi2omega for getting data from the db into the index easily, the script file is as follows: =============================== id : field=id title : index title: field=title description : index description : truncate=50 field=content ============================= However, when querying
2004 May 20
3
Debian stuff
I've now got working (but not necessarily policy compliant) debian packages for xapian-core, the xapian python bindings, and omega and omindex. I will be sorting out a public apt repository of these shortly. Is it appropriate to add the debian control files (ie, those files in the debian directories in CVS) to the distribution tarballs? I think yes - they don't take up much space,