Kirk Haines
2008-Jan-17 07:54 UTC
[Eventmachine-talk] Performance with lots of small send_data events
Backstory: the current client for Analogger was written with regular Ruby socket programming, because I wanted it to be usable without an EM loop. Works great. But I thought that a client implemented with EM might be faster for programs that are already using an EM event loop anyway. So, I built it and benchmarked it. To my surprise, while messages will queue to be sent at an insane rate of speed, they drained at a much slower rate -- about 36k to 50k per second, which is about 40%-60% of the speed of the pure ruby, regular sockets client. This, I thought, was quite strange. So I did some digging. It seems that EM could be more efficient when dealing with lots and lots of small sends. By changing the client so that it coalesces the messages itself, and sends everything once a second my message queuing rate dropped to 300k/second, but my throughput went up to 190k/second. So, that''s good. 190k messages per second is really fast. It''s a valid workaround for the problem. But, it begs the question -- what is the bottleneck? It seems like the EM side should be able to handle queuing of a bunch of short messages more efficiently than I can in the Ruby code. Since you are porting to C from C++, should I bother tinkering with this on the C++ side at all? Kirk Haines
Francis Cianfrocca
2008-Jan-17 22:16 UTC
[Eventmachine-talk] Performance with lots of small send_data
On Jan 17, 2008 10:54 AM, Kirk Haines <wyhaines at gmail.com> wrote:> Backstory: the current client for Analogger was written with regular > Ruby socket programming, because I wanted it to be usable without an > EM loop. > > Works great. But I thought that a client implemented with EM might be > faster for programs that are already using an EM event loop anyway. > > So, I built it and benchmarked it. > > To my surprise, while messages will queue to be sent at an insane rate > of speed, they drained at a much slower rate -- about 36k to 50k per > second, which is about 40%-60% of the speed of the pure ruby, regular > sockets client. > > This, I thought, was quite strange. So I did some digging. > > It seems that EM could be more efficient when dealing with lots and > lots of small sends. > > By changing the client so that it coalesces the messages itself, and > sends everything once a second my message queuing rate dropped to > 300k/second, but my throughput went up to 190k/second. > > So, that''s good. 190k messages per second is really fast. It''s a > valid workaround for the problem. > > But, it begs the question -- what is the bottleneck? It seems like > the EM side should be able to handle queuing of a bunch of short > messages more efficiently than I can in the Ruby code. > > Since you are porting to C from C++, should I bother tinkering with > this on the C++ side at all? > >It would be great if you could tinker with it in C++, but a couple of questions first. Are you using epoll, select or kqueue? How big on average are the small sends? The output handler actually does some coalescing of small writes but evidently not enough. It''s also pretty well known that small sends put a lot of pressure on the kernel. In the port I''m doing, I actually gave the output issue a lot of thought, but I punted for two reasons. First, I didn''t have much of a feel for whether the current approach is a problem (so your question is timely!), and second, it''s going to have to get implemented at least three different ways (Unix with writev, Unix without writev, and IOCP) and those will have different optimizations with regard to the small-send problem. This code is shaping up fairly well. Maybe analogger should be the first guinea pig. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080118/c15af717/attachment-0001.html
Kirk Haines
2008-Jan-18 07:55 UTC
[Eventmachine-talk] Performance with lots of small send_data
On Jan 17, 2008 11:16 PM, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> > It would be great if you could tinker with it in C++, but a couple of > questions first. > Are you using epoll, select or kqueue? How big on average are the small > sends?I tested with both select and epoll. The 36k/second rate I mentioned was with select, and the 50k/second rate was with epoll. By coalescing the small writes myself it jumps, for the smallest messages that I tested (about 30 bytes), to 190k/second with epoll, and about 185k/second with select. 30 bytes, and about 130 bytes -- 2 separate test package sizes. I tested with 500000 of each, over many runs.> The output handler actually does some coalescing of small writes but > evidently not enough. It''s also pretty well known that small sends put a lot > of pressure on the kernel. > > In the port I''m doing, I actually gave the output issue a lot of thought, > but I punted for two reasons. First, I didn''t have much of a feel for > whether the current approach is a problem (so your question is timely!), and > second, it''s going to have to get implemented at least three different ways > (Unix with writev, Unix without writev, and IOCP) and those will have > different optimizations with regard to the small-send problem. > > This code is shaping up fairly well. Maybe analogger should be the first > guinea pig.I have a fairly complete set of tests that exercise both the server and the two clients, with benchmarking, so it''s pretty easy to drop something into my test ruby instance and then test. Kirk Haines
Francis Cianfrocca
2008-Jan-18 18:26 UTC
[Eventmachine-talk] Performance with lots of small send_data
On Jan 18, 2008 10:55 AM, Kirk Haines <wyhaines at gmail.com> wrote:> On Jan 17, 2008 11:16 PM, Francis Cianfrocca <garbagecat10 at gmail.com> > wrote: > > > > It would be great if you could tinker with it in C++, but a couple of > > questions first. > > Are you using epoll, select or kqueue? How big on average are the small > > sends? > > I tested with both select and epoll. The 36k/second rate I mentioned > was with select, and the 50k/second rate was with epoll. > > By coalescing the small writes myself it jumps, for the smallest > messages that I tested (about 30 bytes), to 190k/second with epoll, > and about 185k/second with select. > > 30 bytes, and about 130 bytes -- 2 separate test package sizes. > > I tested with 500000 of each, over many runs. >And all of this difference is attributable to cutting down the number of calls to send(2)? Wow. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080118/ef33514a/attachment.html
Kirk Haines
2008-Jan-18 20:25 UTC
[Eventmachine-talk] Performance with lots of small send_data
On Jan 18, 2008 7:26 PM, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> And all of this difference is attributable to cutting down the number of > calls to send(2)?send_data, yeah. Now, given, it''s a substantial reduction in calls. I was testing with 500000 messages. The first API just called send_data for each one. The second API stuffs each one into a string buffer, and then once a second (scheduled with a periodic timer), the buffer is pushed in a single call to send_data. The first way I could queue messages at ridiculous rates of speed. Half a second to queue half a million short messages, give or take. But, the backlog drained slowly. The other way the net speed for queuing message was a lot lower -- 300k or so per second. But the total throughput was much higher. And that''s all I know at this point. :) Kirk