thr3ads.net - Xen devel - [Xen-devel] [PATCH] Netchannel2 optimizations [2/4] [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Santos, Jose Renato G

2009-Feb-17 03:23 UTC

[Xen-devel] [PATCH] Netchannel2 optimizations [2/4]

This applies to the latest netchannel2 tree.

This patch uses the new packet message flag created in the previous patch to
request an event only every N fragments. N needs to be less than the maximum
number of fragments that we can send or we may get stuck.
The default number of fragments in this patch is 192 while the maximum number of
fragments that we can send is 256.

There is a small issue with this code. If we have a single UDP stream and the
maximum TX socket buffer size limited by the kernel in the sender guest is not
sufficient to consume N fragments (192 for now) the communication may stop until
some other stream sends packets in either the TX or RX direction. This should
not be an issue with TCP since we willalway have ACKs being erceived what will
cause events to be generated. We will need to fix this sometime soon, but it is
an unlikely scenario in practice that we may let the code get into the
netchannel2 tree for now, especially because the code is still experimental. But
Steven has the final word on that.

A possible fix to this issue is to set the event request flag when we send a
packet and the sender socket buffer is full.  I just did not have the time to
look into the linux socket buffer code to figure out how to do that, but it
should not be difficult once we understand the code.

Renato


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Smith

2009-Feb-17 17:56 UTC

head link

[Xen-devel] Re: [PATCH] Netchannel2 optimizations [2/4]

> This patch uses the new packet message flag created in the previous
> patch to request an event only every N fragments. N needs to be less
> than the maximum number of fragments that we can send or we may get
> stuck.  The default number of fragments in this patch is 192 while
> the maximum number of fragments that we can send is 256.
>
> There is a small issue with this code. If we have a single UDP
> stream and the maximum TX socket buffer size limited by the kernel
> in the sender guest is not sufficient to consume N fragments (192
> for now) the communication may stop until some other stream sends
> packets in either the TX or RX direction. This should not be an
> issue with TCP since we willalway have ACKs being erceived what will
> cause events to be generated. We will need to fix this sometime
> soon, but it is an unlikely scenario in practice that we may let the
> code get into the netchannel2 tree for now, especially because the
> code is still experimental. But Steven has the final word on that.I''ve applied the patch, along with the others in the series.  As you
say, this isn''t really good enough for a final solution, as it stands,
but it''ll do for now.
> A possible fix to this issue is to set the event request flag when
> we send a packet and the sender socket buffer is full.  I just did
> not have the time to look into the linux socket buffer code to
> figure out how to do that, but it should not be difficult once we
> understand the code.I''m not convinced by this fix.  It''ll certainly solve the
particular
case of a UDP blast, but I''d be worried that there might be some other
buffering somewhere, in e.g. the queueing discipline or somewhere in
iptables.  Fixing any particular instance probably wouldn''t be very
tricky, but it''d be hard to be confident you''d got all of
them, and it
just sounds like a bit of a rat hole of complicated and
hard-to-reproduce bugs.

Since this is likely to be a rare case, I''d almost be happy just using
e.g. a 1Hz ticker to catch things when they look like they''ve gone
south.  Performance will suck, but this should be a very rare
workload, so that''s not too much of a problem.

Does that sound plausible?

Steven.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2009-Feb-17 18:16 UTC

head link

[Xen-devel] RE: [PATCH] Netchannel2 optimizations [2/4]

> -----Original Message-----
> From: Steven Smith [mailto:steven.smith@citrix.com] 
> Sent: Tuesday, February 17, 2009 9:56 AM
> To: Santos, Jose Renato G
> Cc: xen-devel@lists.xensource.com; Steven Smith
> Subject: Re: [PATCH] Netchannel2 optimizations [2/4]
> 
> > This patch uses the new packet message flag created in the previous 
> > patch to request an event only every N fragments. N needs 
> to be less 
> > than the maximum number of fragments that we can send or we may get 
> > stuck.  The default number of fragments in this patch is 
> 192 while the 
> > maximum number of fragments that we can send is 256.
> >
> > There is a small issue with this code. If we have a single 
> UDP stream 
> > and the maximum TX socket buffer size limited by the kernel in the 
> > sender guest is not sufficient to consume N fragments (192 for now) 
> > the communication may stop until some other stream sends packets in 
> > either the TX or RX direction. This should not be an issue with TCP 
> > since we willalway have ACKs being erceived what will cause 
> events to 
> > be generated. We will need to fix this sometime soon, but it is an 
> > unlikely scenario in practice that we may let the code get into the 
> > netchannel2 tree for now, especially because the code is still 
> > experimental. But Steven has the final word on that.
> I''ve applied the patch, along with the others in the series.  
> As you say, this isn''t really good enough for a final 
> solution, as it stands, but it''ll do for now.
> 
> > A possible fix to this issue is to set the event request 
> flag when we 
> > send a packet and the sender socket buffer is full.  I just did not 
> > have the time to look into the linux socket buffer code to 
> figure out 
> > how to do that, but it should not be difficult once we 
> understand the 
> > code.
> I''m not convinced by this fix.  It''ll certainly solve the
> particular case of a UDP blast, but I''d be worried that there 
> might be some other buffering somewhere, in e.g. the queueing 
> discipline or somewhere in iptables.  Fixing any particular 
> instance probably wouldn''t be very tricky, but it''d be
hard
> to be confident you''d got all of them, and it just sounds 
> like a bit of a rat hole of complicated and hard-to-reproduce bugs.
> 
> Since this is likely to be a rare case, I''d almost be happy 
> just using e.g. a 1Hz ticker to catch things when they look 
> like they''ve gone south.  Performance will suck, but this 
> should be a very rare workload, so that''s not too much of a
problem.
> 
> Does that sound plausible?
> 
  Yes, a low frequency periodic timer is a good idea.
  We could also make the number of fragments that generate an event a
configurable paramenter that it could be adjusted (right now it is an constant).
So a sys admin would have an option to configure it with a value compatible with
the default socket buffer. What about combining the timer with a configurable
parameter?

Renato
 > Steven.
> _______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Smith

2009-Feb-18 12:35 UTC

head link

[Xen-devel] Re: [PATCH] Netchannel2 optimizations [2/4]

> > > A possible fix to this issue is to set the event request 
> > flag when we 
> > > send a packet and the sender socket buffer is full.  I just did
not
> > > have the time to look into the linux socket buffer code to 
> > figure out 
> > > how to do that, but it should not be difficult once we 
> > understand the 
> > > code.
> > I''m not convinced by this fix.  It''ll certainly
solve the
> > particular case of a UDP blast, but I''d be worried that there
> > might be some other buffering somewhere, in e.g. the queueing 
> > discipline or somewhere in iptables.  Fixing any particular 
> > instance probably wouldn''t be very tricky, but it''d
be hard
> > to be confident you''d got all of them, and it just sounds 
> > like a bit of a rat hole of complicated and hard-to-reproduce bugs.
> > 
> > Since this is likely to be a rare case, I''d almost be happy 
> > just using e.g. a 1Hz ticker to catch things when they look 
> > like they''ve gone south.  Performance will suck, but this 
> > should be a very rare workload, so that''s not too much of a
problem.
> > 
> > Does that sound plausible?
>   Yes, a low frequency periodic timer is a good idea.Okay, there''s now a 1Hz ticker which just goes and prods the ring if
there are any messages outstanding.  As expected, performance is dire
if you''re relying on it to actually force packets out (~180 packets a
second), but it does avoid the deadlock.

I''ve also added a (very stupid) adaptation scheme which tries to
adjust the max_count_frags_no_event parameter to avoid hitting the
deadlock too often in the first place.  It seems to do broadly the
right thing for both UDP floods and TCP stream tests, but it probably
wouldn''t be very hard to come up with some workload for which it falls
over.
>   We could also make the number of fragments that generate an event
> a configurable paramenter that it could be adjusted (right now it is
> an constant). So a sys admin would have an option to configure it
> with a value compatible with the default socket buffer. What about
> combining the timer with a configurable parameter?I guess it wouldn''t hurt to make this stuff configurable, although I
think you may be overestimating the average sysadmin if you think
they''re going to know the default socket buffer size (hell, *I*
don''t
know the default socket buffer size).

Steven.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2009-Feb-19 07:32 UTC

head link

[Xen-devel] RE: [PATCH] Netchannel2 optimizations [2/4]

> 
> I''ve also added a (very stupid) adaptation scheme which tries 
> to adjust the max_count_frags_no_event parameter to avoid 
> hitting the deadlock too often in the first place.  It seems 
> to do broadly the right thing for both UDP floods and TCP 
> stream tests, but it probably wouldn''t be very hard to come 
> up with some workload for which it falls over.
> 
  OK, I will test how this work on 10 gig NICs when I have some time.
  I am currently doing some tests on Intel 10gig ixgbe NICs and I am seeing some
behaviour that I cannot explain (without this adaptation patch).
  Netperf is not able to saturate the link and at the same time both the guest
and dom0 cannot not saturate the CPU either ( I made sure the client is not the
bottleneck either). So some other factor is limiting throughput. (I disabled the
netchannel2 rate limiter but this did not seem to have any effect either). I
will spend some time looking into that.

  Regards

  Renato
   > >   We could also make the number of fragments that generate 
> an event a 
> > configurable paramenter that it could be adjusted (right 
> now it is an 
> > constant). So a sys admin would have an option to configure 
> it with a 
> > value compatible with the default socket buffer. What about 
> combining 
> > the timer with a configurable parameter?
> I guess it wouldn''t hurt to make this stuff configurable, 
> although I think you may be overestimating the average 
> sysadmin if you think they''re going to know the default 
> socket buffer size (hell, *I* don''t know the default socket 
> buffer size).
> 
> Steven.
> _______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Smith

2009-Feb-20 09:58 UTC

head link

[Xen-devel] Re: [PATCH] Netchannel2 optimizations [2/4]

> > I''ve also added a (very stupid) adaptation scheme which tries
> > to adjust the max_count_frags_no_event parameter to avoid 
> > hitting the deadlock too often in the first place.  It seems 
> > to do broadly the right thing for both UDP floods and TCP 
> > stream tests, but it probably wouldn''t be very hard to come 
> > up with some workload for which it falls over.
>   OK, I will test how this work on 10 gig NICs when I have some
>   time.  I am currently doing some tests on Intel 10gig ixgbe NICs
>   and I am seeing some behaviour that I cannot explain (without this
>   adaptation patch).  Netperf is not able to saturate the link and
>   at the same time both the guest and dom0 cannot not saturate the
>   CPU either ( I made sure the client is not the bottleneck
>   either). So some other factor is limiting throughput. (I disabled
>   the netchannel2 rate limiter but this did not seem to have any
>   effect either). I will spend some time looking into that.Is it possible that we''re seeing some kind of semi-synchronous
bouncing between the domU and dom0?  Something like this:

-- DomU sends some messages to dom0, wakes it up, and then goes to
   sleep.
-- Dom0 wakes up, processes the messages, sends the responses, wakes
   the domU, and then goes to sleep.
-- Repeat.

So that both domains are spending significant time just waiting for
the other one to do something, and neither can saturate their CPU.
That should be fairly obvious in a xentrace trace if you run it while
you''re observing the bad behaviour.

If that is the problem, there are a couple of easy-ish things we could
do which might help a bit:

-- Re-arrange the tasklet a bit so that it sends outgoing messages
   before checking for incoming ones.  The risk is that processing an
   incoming message is likely to generate further outgoing ones, so we
   risk splitting the messages into two flights.

-- Arrange to kick after N messages, even if we still have more
   messages to send, so that the domain which is receiving the
   messages runs in parallel with the sending one.

Both approaches would risk sending more batches of messages, and hence
more event channel notifications, trips through the Xen scheduler,
etc., and hence would only ever increase the number of cycles per
packet, but if they stop CPUs going idle then they might increase the
actual throughput.

Ideally, we''d only do this kind of thing if the receiving domain is
idle, but figuring that out from the transmitting domain in an
efficient way sounds tricky.  You could imagine some kind of
scoreboard showing which domains are running, maintained by Xen and
readable by all domains, but I really don''t think we want to go down
that route.

Steven.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Santos, Jose Renato G

2009-Feb-20 15:39 UTC

head link

[Xen-devel] RE: [PATCH] Netchannel2 optimizations [2/4]

> -----Original Message-----
> From: Steven Smith [mailto:steven.smith@citrix.com] 
> Sent: Friday, February 20, 2009 1:58 AM
> To: Santos, Jose Renato G
> Cc: Steven Smith; xen-devel@lists.xensource.com
> Subject: Re: [PATCH] Netchannel2 optimizations [2/4]
> 
> > > I''ve also added a (very stupid) adaptation scheme which
tries to
> > > adjust the max_count_frags_no_event parameter to avoid 
> hitting the 
> > > deadlock too often in the first place.  It seems to do 
> broadly the 
> > > right thing for both UDP floods and TCP stream tests, but it 
> > > probably wouldn''t be very hard to come up with some
workload for
> > > which it falls over.
> >   OK, I will test how this work on 10 gig NICs when I have some
> >   time.  I am currently doing some tests on Intel 10gig ixgbe NICs
> >   and I am seeing some behaviour that I cannot explain (without this
> >   adaptation patch).  Netperf is not able to saturate the link and
> >   at the same time both the guest and dom0 cannot not saturate the
> >   CPU either ( I made sure the client is not the bottleneck
> >   either). So some other factor is limiting throughput. (I disabled
> >   the netchannel2 rate limiter but this did not seem to have any
> >   effect either). I will spend some time looking into that.
> Is it possible that we''re seeing some kind of 
> semi-synchronous bouncing between the domU and dom0?  
> Something like this:
> 
> -- DomU sends some messages to dom0, wakes it up, and then goes to
>    sleep.
> -- Dom0 wakes up, processes the messages, sends the responses, wakes
>    the domU, and then goes to sleep.
> -- Repeat.
> 
> So that both domains are spending significant time just 
> waiting for the other one to do something, and neither can 
> saturate their CPU.
   Yes, that is what I thought as well. I still need to do a careful xentrace
analysis though.
> That should be fairly obvious in a xentrace trace if you run 
> it while you''re observing the bad behaviour.
> 
> If that is the problem, there are a couple of easy-ish things 
> we could do which might help a bit:
> 
> -- Re-arrange the tasklet a bit so that it sends outgoing messages
>    before checking for incoming ones.  The risk is that processing an
>    incoming message is likely to generate further outgoing ones, so we
>    risk splitting the messages into two flights.
> 
> -- Arrange to kick after N messages, even if we still have more
>    messages to send, so that the domain which is receiving the
>    messages runs in parallel with the sending one.
>
   I tried limiting the messages in each run and some other things but it did
not help. I can try doing TX before RX but I think there is something else going
on and I will need to spend more time analysing.  I will postpone this until
after the Xen summit as I will be busy in the next days writing my slides.

Thanks for the suggestions

Renato 
> Both approaches would risk sending more batches of messages, 
> and hence more event channel notifications, trips through the 
> Xen scheduler, etc., and hence would only ever increase the 
> number of cycles per packet, but if they stop CPUs going idle 
> then they might increase the actual throughput.
> 
> Ideally, we''d only do this kind of thing if the receiving 
> domain is idle, but figuring that out from the transmitting 
> domain in an efficient way sounds tricky.  You could imagine 
> some kind of scoreboard showing which domains are running, 
> maintained by Xen and readable by all domains, but I really 
> don''t think we want to go down that route.
> 
> Steven.
> _______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2009 - [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] Re: [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] RE: [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] Re: [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] RE: [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] Re: [PATCH] Netchannel2 optimizations [2/4]

[Xen-devel] RE: [PATCH] Netchannel2 optimizations [2/4]