John Levon
2007-Jan-17 01:04 UTC
[Xen-devel] intermittent problems with legacy xmlrpc server in 3.0.4
I''ve been having intermittent problems with xm talking to xend over the legacy xmlrpc server. In theory it should be reproducable under Linux with an xm list loop, though you might need a heavy load. DTrace says: 0 24244 recv:entry 12954192948474 xend recv(8192) 0 24245 recv:return 12954192971001 xend recv() ret -1 errno 11 0 24250 send:entry 12954196092353 xm send(132, POST /RPC2 HTTP/1.0 Host: User-Agent: xmlrpclib.py/1.0.1 (by www.pythonware.com) Content-Type: text/xml Content-Length: 268 ) 0 24251 send:return 12954196113363 xm send() ret -1 errno 32 11 = EAGAIN: EWOULDBLOCK The socket is marked non-blocking and the requested operation would block. 32 = EPIPE: EPIPE The socket is shut down for writing, or the socket is connection-mode and is no longer connected. In the latter case, if the socket is of type SOCK_STREAM, the SIGPIPE signal is generated to the calling thread. So for some reason the server is trying to process a request before xm has sent it, and the EWOULDBLOCK is causing the EPIPE it seems. changeset 12062:5fe8e9ebcf5c made this change: + try: + self.server.socket.settimeout(1.0) + while self.running: + self.server.handle_request() which places xmlrpc.sock in non-blocking mode. SocketServer.py actually does this on init: def __init__(self, request, client_address, server): self.request = request self.client_address = client_address self.server = server try: self.setup() self.handle() self.finish() This self.handle() ends up as the recv() that craps itself when it gets EAGAIN. This doesn''t always happen, presumably the race is between creating the request thread in SocketServer and xm writing the data. I''ve hacked up SocketServer a bit to handle EAGAIN, but this obviously isn''t a good fix. Suggestions welcome, I''m not really familiar with all this server code. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2007-Jan-17 01:29 UTC
Re: [Xen-devel] intermittent problems with legacy xmlrpc server in 3.0.4
On Wed, Jan 17, 2007 at 01:04:01AM +0000, John Levon wrote:> So for some reason the server is trying to process a request before xm has sent it, and the > EWOULDBLOCK is causing the EPIPE it seems. > > changeset 12062:5fe8e9ebcf5c made this change: > > + try: > + self.server.socket.settimeout(1.0) > + while self.running: > + self.server.handle_request() > > which places xmlrpc.sock in non-blocking mode. SocketServer.py actually > does this on init:So from reading that changeset, it looks as if the socket is being put in no-blocking mode so that when XenD shuts down it doesn''t wait forever for active clients to finish. An alternate way to do this would be to simply set all the client connection handling threads to be daemonized threads and not bother calling join() on them at all - just rely on the automatic thread cleanup. This means that the leader process can just quit & any outstanding client handling threads will simply be killed off without delay.> def __init__(self, request, client_address, server): > self.request = request > self.client_address = client_address > self.server = server > try: > self.setup() > self.handle() > self.finish() > > This self.handle() ends up as the recv() that craps itself when it gets > EAGAIN. This doesn''t always happen, presumably the race is between > creating the request thread in SocketServer and xm writing the data. > > I''ve hacked up SocketServer a bit to handle EAGAIN, but this obviously > isn''t a good fix. Suggestions welcome, I''m not really familiar with all > this server code.Having had a cursory glance at the code, as you say, none of it is expecting the socket to be in non-blocking mode so it easily breaks. You''d probably see same thing if network congestion caused a data stall of > 1 second. IMHO the sockets should be put back to blocking mode & find another way of dealing with any possible shutdown issues. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alastair Tse
2007-Jan-18 15:55 UTC
Re: [Xen-devel] intermittent problems with legacy xmlrpc server in 3.0.4
On Wed, 2007-01-17 at 01:29 +0000, Daniel P. Berrange wrote:> So from reading that changeset, it looks as if the socket is being put > in no-blocking mode so that when XenD shuts down it doesn''t wait forever > for active clients to finish. An alternate way to do this would be to > simply set all the client connection handling threads to be daemonized > threads and not bother calling join() on them at all - just rely on > the automatic thread cleanup. This means that the leader process can just > quit & any outstanding client handling threads will simply be killed > off without delay. >I''ve just committed a patch based on your suggestion with setting all the threads to daemonic, which gets rid of the join() and settimeout() on the socket. Hopefully this should solve the problems John is seeing with intermittent failures. Thanks, Alastair _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel