On Thu, Jul 23, 2009 at 05:19:17PM +0300, Sergey Kozhukhov
wrote:> I'm trying to use xapian from php script with TCP daemon.
> 
> I started TCP daemon: xapian-tcpsrv --port 5050 --writable /path-to-db
> 
> When I trying to get access to it within my php-script with
> remote_open_writable("localhost", 5050) or
remote_open("localhost",
> 5050) apache fails with signal 11:
> 
> dev kernel: pid 35923 (httpd), uid 80: exited on signal 11
11 is "Segmentation fault" on Linux.  I think it is on other Unix
platforms too, but you can probably check with "kill -l".
I can't see why you should get that though.  Perhaps if you can post
the smallest PHP test script which fails?
> TCP demon also shows me different errors:
> Connection from 127.0.0.1, port 62652
> Got exception NetworkError: write failed (context: /path-to-db) (Broken
pipe)
> Closing connection.
> Connection from 127.0.0.1, port 45245
> Got exception NetworkError: read failed (context: /path-to-db)
> (Connection reset by peer)
> Closing connection.
> Connection from 127.0.0.1, port 44209
> Got exception NetworkError: Received EOF (context: /path-to-db)
> Closing connection.
These suggest that the client is dying at different points
mid-conversation.
Actually, one possible cause of random segmentation faults is dodgy RAM
- it might be worth running a RAM tester to check for that possibility.
I've used memtest86+ before:
http://www.memtest.org/
> There are also another question: How it's better to organize xapian
> search while using cluster of 5 servers?
It depends what you're scaling for.  If each server is capable of
servicing single queries fast enough and you're adding more servers to
cope with the query load and provide redundancy, you can just give each
server a copy of the database.  1.1.x provides a new database
replication feature which helps such setups.  The documentation for
it also describes other possible approaches, and you could use those
with 1.0.x:
http://trac.xapian.org/browser/trunk/xapian-core/docs/replication.rst
If you're scaling to try to reduce the time a single query takes, then
you could give each server a subsection of the data and combine searches
over multiple remote databases.
Cheers,
    Olly