thr3ads.net - similar to: "overlapping docids when searching on multiple databases?"

Displaying 20 results from an estimated 700 matches similar to: "overlapping docids when searching on multiple databases?"

xapian-check on "crashed" index?

2010 Oct 14

xapian-check on "crashed" index?

Hi. Is xapian-check aware of the uncommitted data that could be sitting in an xapian index if the indexer has crashed during indexing. Could errors be falsely reported by xapian-chek in this situation? -- Jesper

Are stub databases still supported in 1.0.21?

2010 Dec 01

Are stub databases still supported in 1.0.21?

I have the following setup: Databases: /var/lib/xapian-omega/data/db1 /var/lib/xapian-omega/data/db2 /var/lib/xapian-omega/data/db3 Stub: /var/lib/xapian-omega/data/default The stub file "default" is a text file that contains the following: auto /var/lib/xapian-omega/data/db1 auto /var/lib/xapian-omega/data/db2 auto /var/lib/xapian-omega/data/db3 Using the following returns nothing:

MSet order

2011 Mar 08

MSet order

Hello I defined a weighting scheme to simulate a king of "euclidean" distance. To test it, i used a database with 1000 documents. If I run : enquire.set_weighting_scheme(MyWeight()); Xapian::MSet matches = enquire.get_mset(0, 1000); I have a correct list of results. But if I run Xapian::MSet matches = enquire.get_mset(0, 10); I don't have the top-10 results. If I run Xapian::MSet

Is it possible to reset the parameters in BM25 each time a new query enters?

2011 Feb 18

Is it possible to reset the parameters in BM25 each time a new query enters?

Hi guys, I'm trying to improve the search results of our collection by tuning the parameters in the BM25 weighting schema. Since our collection includes several databases, such as for pictures, websites, etc., I would like to use different values of the same schema to calculate the weights. Yet, rebuilding each time after the change was done to the head file seems not an optimal approach and

hypens in words + NEAR + 3 terms + AND_MAYBE => crash

2010 Oct 28

hypens in words + NEAR + 3 terms + AND_MAYBE => crash

Probably an uncaught malformed query - the following form of search queries causes a crash for me (core 1.2.3, Perl API, 64bit Debian Lenny, self-compiled): x-y NEAR test NEAR test The first term can be anything with a hyphen in it but word characters at the beginning and end ("3--3" will do). The other 2 terms can be anything. "test NEAR x-y NEAR test" will not cause a

if condition doesn't evaluate to True/False

2009 Apr 29

if condition doesn't evaluate to True/False

Hi friends, Please help me with this bug. *Bug in my code:* In this variable sub_grp_whr_cls_data[sbgrp_no,1] I store the where clause.every sub group has a where condition linked with it. Database1 Where clause was not found for a particular subgroup, sub_grp_whr_cls_data[sbgrp_no,1] value was NULL So the condition (*sub_grp_whr_cls_data[sbgrp_no,1]=="NULL" ||

Search::Xapian add_database'd search results are odd?

2004 Dec 21

Search::Xapian add_database'd search results are odd?

Sorry if this is the wrong forum to discuss Search::Xapian issues -- this just seems like the best place.. Anyways, I've been testing out using $db->add_database() when searching, and it seems like the docids I'm getting out of it are incorrect, almost as though they're "double" what they should be (numerically)... the docids that exist should be around 950,000 and

stub-file and get_doccount

2015 Mar 11

stub-file and get_doccount

Hello, i switched from one big index to a stub file with many indexes and running into a problem. i have a tool to fetch a random document via: get_doccount random id up to get_doccount get_document with that id after changing to stub file this failes. Is there a nice way to get a random document from a stub file? ?MfG? Felix Ostmann

Xapian wiki: typo in docid to sub-db translation?

2013 Mar 26

Xapian wiki: typo in docid to sub-db translation?

On the Xapian wiki page: http://trac.xapian.org/wiki/FAQ/MultiDatabaseDocumentID It seems to me that: subdatabase_number = docid_combined % number_of_databases; Should read: subdatabase_number = (docid_combined - 1) % number_of_databases; Otherwise I'm seriously confused ... Cheers, jf

configure a rails app for multiple databases

2006 Nov 17

configure a rails app for multiple databases

Hello Rails community I cannot seem to find via Google what I had hoped would be a simple issue On a single DB system (currently, postgres 8.1.4), I have two databases, each containing multiple tables. I would like to configure my app and database.yml to recognize these two databases. What is the corrrect config for the database.yml ? Is it something like: > production: > adapter:

critical feature from version 1 not migrated to version 2 = authentication configuration database per IP

2011 Feb 09

critical feature from version 1 not migrated to version 2 = authentication configuration database per IP

not possible make operation with dovecot version 2.x as was possible in version 1.x: requisites description: connect to dovecot service on IP1 - dovecot must serve users that related to domain1 located in database1 connect to dovecot service on IP2 - dovecot must serve users that related to domain2 located in database2 login must be with username that form not as "user at domain" but

Project: Posting list encoding improvements

2012 Mar 31

Project: Posting list encoding improvements

Hi Xapianers: My name is Weixian Zhou, Computer Science student of University at Buffalo, State University of New York. I am interested in the project of posting list encoding improvements and weighting schemes. I have some questions toward them. 1) After read the comments in brass_postlist.cc, I am still not very clear about the detailed structure of postings list. If you can provide some simple

manual flushing thresholds for deletes?

2023 May 03

manual flushing thresholds for deletes?

On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > This will also effectively ignore boolean terms, assuming you're giving > > them wdf of 0 (because $3 here is the collection frequency, which is > > sum(wdf(term)) over all documents). > > Should boolean terms be ignored when estimating flushing >

some trouble when devising skiplist

2014 May 10

some trouble when devising skiplist

Hi, I was confronted with some trouble, I describe the trouble in my journal http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10 And corresponding code is in my git. Would you like to give me some help? ------------------ Shangtong Zhang,Second Year Undergraduate, School of Computer Science, Fudan University, China. -------------- next part

manual flushing thresholds for deletes?

2023 May 03

manual flushing thresholds for deletes?

Olly Betts <olly at survex.com> wrote: > On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > 10 seems too long. You want the mean word length weighted by frequency > > > of occurrence. For English that's typically around 5 characters, which > > > is 5 bytes. If we go for +1 that's:

Chow test(1960)/Structural change test

2009 May 17

Chow test(1960)/Structural change test

Hi, A question on something which normally should be easy ! I perform a linear regression using lm function: > reg1 <- lm (a b+c+d, data = database1) Then I try to perform the Chow (1960) test (structural change test) on my regression. I know the breakpoint date. I try the following code like it is described in the “Examples” section of the “strucchange” package : > sctest(reg1,

Compact databases and removing stale records at the same time

2013 Jun 19

Compact databases and removing stale records at the same time

On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum

problem adding curve/abline

2013 Jan 09

problem adding curve/abline

Hey, I'm stuck on something I already did before (just a different kind of database), and whatever I try, it doesn't work anymore. So thanks for your help. Here's how my data approximately looks like: year season replicate size freq weight 2000 summer ch1 6 1 45 2000 summer ch1

prioritizing aggregated DBs

2020 Feb 19

prioritizing aggregated DBs

Olly Betts <olly at survex.com> wrote: > On Sat, Feb 08, 2020 at 06:04:42PM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote: > > > > Or would I fiddle with wdf_inc for all ->index_text and ->add_term > > > > calls on a per-DB basis? > > > >

Logging the click data

2017 Jun 05

Logging the click data

Hi James, > ID: some identifier for each query > QUERY: text of the query (when the query is run) > URLs: every URL displayed (or alternatively, the Xapian docid — this > might be easier) > OFFSET: otherwise you'll have difficulty coping with result pages other > than the first page (when this happens, the query ID should probably > remain the same, and when you aggregate

similar to: overlapping docids when searching on multiple databases?