thr3ads.net - Xapian discuss - [Xapian-discuss] Returning "fresh" results only from multiple DBs [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Henry

2009-Jan-12 08:26 UTC

[Xapian-discuss] Returning "fresh" results only from multiple DBs

Greetings,

Let's say you have the following scenario:


DB1:  large corpus with rarely changing data (typically split across a cluster).

DB2:  small corpus with frequently changing data (to update pages in DB1).

DBn:  ditto.


Since DB1 is so large, and heavily accessed, we want to keep things simple and
foolproof, so it's contents are rarely changed, with newer, fresher, pages
for the same DB1 pages going into DB2..n.  Each duplicate page (but fresher, so
preferred) has a numeric field which increments for each refresh (1,2,3...),
which identifies the the most up-to-date page across all DBs.

How can I perform an enquiry, collapsing on a key (as currently done) to remove
duplicate pages, but yielding the freshest of those duplicate pages?

Similar to SQL:    SELECT MAX(freshness_num),*  FROM  table...


I know we can perform updates on DB1, but I don't want to go down that path
because of the volumes/sizes involved.

Any ideas?

Thanks
Henry


---- This message was sent via a PHP demo version of @Mail - http://atmail.com/

Henry

2009-Jan-14 08:02 UTC

head link

[Xapian-discuss] Returning "fresh" results only from multiple DBs

Crikey, my new webmail (atmail) which I've been testing doesn't  
word-wrap at 80...  apologies for that.  Here's a repost with nice,  
fresh, newlines:


Let's say you have the following scenario:


DB1:  large corpus with rarely changing data (typically split across a  
cluster).

DB2:  small corpus with frequently changing data (to update pages in DB1).

DBn:  ditto.


Since DB1 is so large, and heavily accessed, we want to keep things simple and
foolproof, so it's contents are rarely changed, with newer, fresher, pages
for
the same DB1 pages going into DB2..n.  Each duplicate page (but fresher, so
preferred) has a numeric field which increments for each refresh (1,2,3...),
which identifies the the most up-to-date page across all DBs.

How can I perform an enquiry, collapsing on a key (as currently done) to
remove duplicate pages, but yielding the freshest of those duplicate pages?

Similar to SQL:    SELECT MAX(freshness_num),*  FROM  table...


I know we can perform updates on DB1, but I don't want to go down that
path because of the volumes/sizes involved.

Any ideas?

Thanks
Henry

Henry

2009-Jan-16 06:26 UTC

head link

[Xapian-discuss] Returning "fresh" results only from multiple DBs

How about extending set_collapse_key() to accept two or more arguments (a-la
MultiValueSorter)?  or, more cleanly I suppose, code a new method to create a
"collapse_key" object which is a composite list of keys to collapse
on, which is used as an arg to set_collapse_key()?

Which one would require the least amount of coding (and API disruption)?

Thoughts?

Cheers
Henry

Msg sent via ZenMail - http://zen.co.za/

Henry

2009-Jan-18 13:17 UTC

head link

[Xapian-discuss] Returning "fresh" results only from multiple DBs

On Sun 18/01/09  1:06 PM , Olly Betts <olly at survex.com>
wrote:> We probably should support something like this.
We'll talk about sponsoring something along these lines later.
> > Which one would require the least amount of coding (and API
> disruption)?
> Xapian::Sorter could just be used as-is to build a key for
> collapsing
> too.  It's a shame that we didn't think about this possible reuse
> before
> adding it to the API as the name seems rather less good now.
> But this still won't change that you couldn't implement the
"fresh
> results" thing using collapsing.  Or is your question not related to
> that?
Not following you here, so I'll study Xapian::Sorter.

Xapian discuss - Jan 2009 - Returning "fresh" results only from multiple DBs

[Xapian-discuss] Returning "fresh" results only from multiple DBs

[Xapian-discuss] Returning "fresh" results only from multiple DBs

[Xapian-discuss] Returning "fresh" results only from multiple DBs

[Xapian-discuss] Returning "fresh" results only from multiple DBs