On Mon, Mar 18, 2013 at 11:50:31PM +0530, Ankit Bhatnagar
wrote:> *(1) Protocol usage :* The backend functions in two modes - prog and tcp.
> Prog is the mode where the entire program-space is spawned for every
> request and tcp functions normally as TCP sockets do.
I don't think that's a helpful way to think about it.
I would say that currently Xapian's remote backend works by passing
messages back and forth across a file descriptor. The difference
between the tcp and prog variants is simply in how the file descriptor
is created - for the tcp variant, a socket is opened, while for the prog
variant, a specified program is run in a subprocess, and it's stdin and
stdout connected to a pipe.
What the prog variant provides really provides is an easy way to do
things like tunnelling the remote protocol across an encrypted
connection - you just make the program to run something like:
ssh remotehost xapian-progsrv /path/to/db
> But both of them remain quite different from the standard HTTP
> framework/s. Any insight on how ZeroMQ would suffice?
The remote backend protocol passes messages back and forth across
a file descriptor. ZeroMQ allows you to pass messages back and forth,
so ideally you don't need to change the current messages and can just
replace the transport layer. If the current "conversations" don't
fit the message passing supported by ZeroMQ, then the messages may
need changing to suit, but I would suggest starting on the basis that
you're just trying to change the layers below that.
We'd probably lose the "prog" variant - I guess people would have
to set up an SSH tunnel or VPN first, then run the remote backend
over it.
> *(2) Object handling :* Within the scope of the project, object handling is
> yet another important feature to take care of.
>
> >>> What kind of objects are handled or not handled within Xapian?
>
> >>> Is there the need for serialization/deserialization of objects
before
> processing takes place?
This is all done already - the messages sent already contain serialised
objects in many cases.
> *(3) Idea Deliverables :* As most other ideas have a set of deliverables
> defined, what are those for this idea? Kindly enlist a few, if possible.
This idea is probably more of an experiment than most - if using ZeroMQ
works well, it would replace quite a lot of existing code (so less for
us to maintain) and give us IPv6 support, which the remote backend lacks
currently.
ZeroMQ claims to be faster than using TCP sockets, but even if that's a
true claim, it might not translate into a faster remote backend. If
performance is worse, I don't think we'd want to make the switch. If
they're pretty much the same, I think we'd want to look carefully at the
pros and cons of the two approaches.
So I think the deliverables are something like a version of Xapian where
the network layer used by the remote backend and by replication uses
ZeroMQ instead, plus a set of benchmark test results showing how it
compares with the existing code.
The next obvious step once you have benchmarks set up would be to
profile and see if you can identify hot spots which could be sped up.
There's not been much profiling work done on the remote backend or on
replication. So that would make a good stretch goal for the project I
think.
Dan may have some further thoughts (this was his idea originally).
One issue I just noticed is that ZeroMQ is LGPL - we're trying to
get to a position where we can relicense Xapian as MIT/X, so adding
an LGPL dependency to the core library wouldn't be ideal. This
isn't a show-stopper for the idea, but certainly it's a factor to
consider when deciding whether to merge at the end.
Cheers,
Olly