I was just having a look over the API notes: https://github.com/mtibeica/node-xapian/blob/master/docs Some feedback: I wouldn't bother wrapping WritableDatabase::flush(). It's only there for compatibility with older code, so for a new binding you can just wrap commit(). Generally, uint32 isn't necessarily the right type to use everywhere, and means things will go wrong if someone patches Xapian and rebuilds it to use (e.g. 64 bit document ids). Maybe it's hard to use the appropriate Xapian::docid, Xapian::doccount, Xapian::termcount, etc typedefs here though. A query consisting of two or more subqueries, opp-ed together. AND, OR, SYNONYM, NEAR and PHRASE can take any number of subqueries. Other operators take only the first two subqueries. { op: string, queries: [ object_querystructure1, ...] } XOR can also take any number of subqueries. And on trunk, OP_FILTER, OP_AND_NOT, and OP_AND_MAYBE can also take any number of subqueries (with OP(A, B, C) being interpreted as OP(OP(A, B), C) Also, it would be nice to support a mixture of strings and query objects as the subqueries (like we do in most of the dynamically typed languages). I'm dubious about wrapping the various iterators as methods which read all the entries from the iterator and return an array. That's potentially a huge amount of data to read and store in memory when the user may only want a small subset, or to be able to process it as a stream. Or are these actually implemented like Perl tied arrays? Cheers, Olly
On Mon, May 28, 2012 at 2:44 PM, Olly Betts <olly at survex.com> wrote:> > Generally, uint32 isn't necessarily the right type to use everywhere, > and means things will go wrong if someone patches Xapian and rebuilds > it to use (e.g. 64 bit document ids). >Javascript doesn't currently support int64. It goes up to 2^53. We should probably raise an error if the Xapian build we're running against uses int64 doc ids. XOR can also take any number of subqueries. And on trunk, OP_FILTER,> OP_AND_NOT, and OP_AND_MAYBE can also take any number of subqueries > (with OP(A, B, C) being interpreted as OP(OP(A, B), C) >XOR is missing from the online docs for Query(Query::op, Iterator, Iterator, termcount) We can include support for the other ops you mention and leave it commented out for now. Marius, that Query object is missing a parameter:uint32 member. Also, it would be nice to support a mixture of strings and query objects> as the subqueries (like we do in most of the dynamically typed languages). >You can include a term query in the list by writing {tname:'string'}, but certainly we could let 'string' be a shorthand for that. I'm dubious about wrapping the various iterators as methods which read> all the entries from the iterator and return an array. >We'll take optional start & count arguments for those guys. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120528/c758aed0/attachment.html>
On Mon, May 28, 2012 at 10:23:12PM -0700, Liam wrote:> On Mon, May 28, 2012 at 2:44 PM, Olly Betts <olly at survex.com> wrote: > > XOR can also take any number of subqueries. And on trunk, OP_FILTER, > > OP_AND_NOT, and OP_AND_MAYBE can also take any number of subqueries > > (with OP(A, B, C) being interpreted as OP(OP(A, B), C) > > XOR is missing from the online docs for Query(Query::op, Iterator, > Iterator, termcount)Thanks for pointing that out - it's been wrong for a while then (since r3194, 2001-02-26): * Some modifications to XOR handling: should now behave like OR and AND - doesn't need to be binary. (*untested*) Now fixed on the 1.2 branch. It was also missing that ELITE_SET can take any number of subqueries in that comment, though it clearly says it can elsewhere, and ELITE_SET would be rather useless if it only took 2 subqueries...> We can include support for the other ops you mention and leave it commented > out for now.I would strongly recommend developing against trunk at this point anyway. You don't want to be wrapping anything which has been deprecated in the C++ API, and it would be good to have wrappers done for new features. Once you have trunk wrapped, tweaking the wrappers to work against 1.2 should be a simple matter of disabling a few parts.> > Also, it would be nice to support a mixture of strings and query objects > > as the subqueries (like we do in most of the dynamically typed languages). > > You can include a term query in the list by writing {tname:'string'}, but > certainly we could let 'string' be a shorthand for that.It's largely syntactic sugar, but even syntactic sugar is still sweet. Cheers, Olly
> On Thu, May 31, 2012 at 1:34 AM, Liam <xapian at networkimprov.net> wrote: > >> On Tue, May 29, 2012 at 7:24 PM, Olly Betts <olly at survex.com> wrote: >> >>> If you change sizeof(Xapian::docid) (and/or the sizes of other types) >>> then that's an ABI change, so something built against xapian-core built >>> with one docid size simply won't work with xapian-core built with a >>> different docid size. >>> >> >> So what happens when our lib tries to load or invoke the incompatible >> Xapian? Is it possible to prevent a crash? >> >>> >> > In what context are int64 doc ids necessary? What % of installations use >>> > them? >>> >>> I doubt may people use them currently, quite possibly nobody does. But >>> that's likely to change in the foreseeable future. We're probably near >>> the point where you could conceivably build an index with this many >>> documents on commodity hardware. >>> >> >> We can support more than 2^32 values by converting to double (JS type >> Number) for 2^53. But beyond that the values stop converting correctly, >> meaning we'd throw an overflow and the user would have to hack the binding >> himself. >> >> Marius can you make a note to treat docid as a Number instead of uint32, >> and check the values from Xapian for overflow? >> > Will do.> >> >> > Seriously, lazy-loading is oversold from what I've seen. If you have >>> data >>> > from real-world Xapian sites that shows a material advantage for it, >>> I'd >>> > love to read... >>> >>> Any site searching a large Xapian database is relying heavily on lazy >>> loading. >>> >> >> For an array, it's necessary, so we'll take start & count args when >> building arrays. For objects, I question the value of lazy loading, save >> for very large fields. >> > For example we currently have (in quickstartsync.js) var mset=enquire.get_mset_sync(0, 10); I will update docs.md as i write the code for every method that requires this.> >> >> _______________________________________________ >> Xapian-devel mailing list >> Xapian-devel at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-devel >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120531/f9e695b3/attachment.html>