thr3ads.net - Xen devel - [Xen-devel] xenconsoled CPU denial of service problem [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Daniel P. Berrange

2006-Aug-28 18:02 UTC

[Xen-devel] xenconsoled CPU denial of service problem

I was examining what would happen if I ballooned a guest down to 15 MB
in size, and besides the obvious (kernel kills everything in sight :-), I 
found a non-trivial CPU denial of service problem with xenconsoled.

When I ballooned the guest down the DomU kernel went mental printing out

   Out of Memory: Kill process 2 (migration/0) score 0 and children.
   Out of Memory: Kill process 2 (migration/0) score 0 and children.
   Out of Memory: Kill process 2 (migration/0) score 0 and children.
   Out of Memory: Kill process 2 (migration/0) score 0 and children.
  .....

over & over again, as fast as it could - using 100% of CPU time allocated
to that DomU by the schedular. This isn''t a problem in itself because
with
sensible schedular settings the impact on other DomU''s would be
minimal.

The problem is that at the same time, xenconsoled consumes 100% of the CPU
time available to Domain-0 trying to read the console data generated by the
DomU OS. It is processing the console data from DomU, even though there are
no (xm console) client sessions with the slave end of the Psuedo-TTY open.

Now the particular ballooning scenario I used falls under the
"don''t do that
then" category, but more generally this is still a problem. Any DomU OS can
inflict a CPU denial of service attack on Domain-0 (and thus other Dom-U) 
simply by generating a huge quantity of data on its (serial) console - either
accidentally, or maliciously

I''ve been looking at the the xenconsoled source to see if
there''s any way
this can be resolved. With no xm console clients attached, the xenconsoled
is spinning 

  select(10, [5 6 7 8 9], [6], NULL, NULL) = 1 (in [5])
  read(5, "\24\0\0\0", 4)                 = 4
  mremap(0x2aaaaaac0000, 1052672, 1052672, MREMAP_MAYMOVE) = 0x2aaaaaac0000
  ioctl(5, EVIOCGKEYCODE, 0x7fffc89a1f50) = 0
  mremap(0x2aaaaaac0000, 1052672, 1052672, MREMAP_MAYMOVE) = 0x2aaaaaac0000
  write(5, "\24\0\0\0", 4)                = 4

Which corresponds to handle_ring_read() in tools/console/daemon/io.c What
I''d like to do is figure out a way to avoid processing the ring buffer
if
there are is no client connected to the /dev/pts/NNN slave tty associated
with the domain. I tried to detect presence of a client by adding the master
TTY filehandle to the writefds set in select(), but it appears the master
end of a tty is signalled writable even if the other end is disconnected.

Does anyone know of any alternative approach to detecting whether the fd
for the master end of a psuedo-TTY, has a its end slave open / active ?
Without being able to detect this I don''t see any good way to avoid the
DOS
attack in the general case - only other option would be to start dropping
data once > a certain rate, but this isn''t really very desirable
because
there are (debug) scenarios in which you really do want the ability to 
capture all data.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-28 20:57 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

On 28/8/06 7:02 pm, "Daniel P. Berrange" <berrange@redhat.com>
wrote:
> Does anyone know of any alternative approach to detecting whether the fd
> for the master end of a psuedo-TTY, has a its end slave open / active ?
> Without being able to detect this I don''t see any good way to
avoid the DOS
> attack in the general case - only other option would be to start dropping
> data once > a certain rate, but this isn''t really very
desirable because
> there are (debug) scenarios in which you really do want the ability to
> capture all data.
The protocol has flow control. If we rate-limited xenconsoled consumption of
data from each domU ring, we would limit resource consumption in dom0 and
not lose any data (since the domU will simply buffer it internally).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel P. Berrange

2006-Aug-29 15:59 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

On Mon, Aug 28, 2006 at 09:57:22PM +0100, Keir Fraser
wrote:> On 28/8/06 7:02 pm, "Daniel P. Berrange"
<berrange@redhat.com> wrote:
> 
> > Does anyone know of any alternative approach to detecting whether the
fd
> > for the master end of a psuedo-TTY, has a its end slave open / active
?
> > Without being able to detect this I don''t see any good way to
avoid the DOS
> > attack in the general case - only other option would be to start
dropping
> > data once > a certain rate, but this isn''t really very
desirable because
> > there are (debug) scenarios in which you really do want the ability to
> > capture all data.
> 
> The protocol has flow control. If we rate-limited xenconsoled consumption
of
> data from each domU ring, we would limit resource consumption in dom0 and
> not lose any data (since the domU will simply buffer it internally).
Ah, that''s very handy. Looking at the xenconsoled code I think
I''ve worked
out how to slow things down - although I''m not entirely sure
I''m correctly
activating the throttling - basically I''m delaying the calls to the
methods
xc_evtchn_notify & xc_evtchn_unmask after consuming data - is this the
correct approach ?

I''m attaching a patch which implements this scheme. In the
buffer_append
method we keep a cumulative count of how much data we''ve consumed.
Every
time if exceeds N bytes, we calculate  the period over which those N bytes
were received. If this period is < than a threshold (ie fast) then I insert
a small delay before notifying & unmasking the event channel.

The patch sets

  - data size 5 kb
  - period 200 ms
  - delay 200 ms

ie, if we receive 5 kb of data in less than 200 ms, we delay for 200 ms
before allowing more. These constants are #define''d for easy tuning,
the
current values are fairly conservative / low data rate. Any thoughts on
what an appropriate data rate to allow from DomU is ?

Finaly question - in the handle_ring_read() method is the port returned
by xc_evtchn_pending guarrenteed to be same as port we already have a
reference to in ''struct domain'' local_port field ? If so I can
remove
the saved reference ''limited_port'' I added in ''struct
buffer''

For testing purposes just pick a DomU and run

  while /bin/true ; do echo t > /proc/sysrq-trigger ; done

Ordinarily this would cause Dom-0/xenconsoled to consume 100%. With this
patch applied there is negligable CPU time consumed by Dom-0/xenconsoled.

(If you increase delay to 2,000 ms you can visually see that no data is
 being lost as a result of the delays). 

  Signed-off by: Daniel P. Berrange <berrange@redhat.com> 

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-30 18:13 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com>
wrote:
> Ah, that''s very handy. Looking at the xenconsoled code I think
I''ve worked
> out how to slow things down - although I''m not entirely sure
I''m correctly
> activating the throttling - basically I''m delaying the calls to
the methods
> xc_evtchn_notify & xc_evtchn_unmask after consuming data - is this the
> correct approach ?
Yes, the essence of your approach is exactly right. Delaying the unmask is
the critical thing -- there''s not really any need to delay the
notification
(in fact I don''t think you should bother). But by delaying the unmask
you
avoid any possibility of a DoS attack on that interdomain link -- without
this the domU could potentially hose the dom0 with lots of pointless
event-channel notifications. This is theoretically a problem on every one of
our split drivers (xenstore, blk, net) so it''d be good to think of the
most
general possible solution we can think of here that we can apply to every
backend.
> The patch sets
>   - data size 5 kb
>   - period 200 ms
>   - delay 200 ms
A few comments:
 * I think the ''delay'' parameter is not really useful. Think
of this simply
as a simple credit-based scheduler that replenishes credit every
''period''.
So in this case you get 5kB every 200ms. If the domU offers more data in a
period, it must wait until its credit is replenished at the end of the
current 200ms period.
 * I''m not sure bytes of data is the right thing to limit here. The
main
thing that hoses domain0 is the repeated rescheduling of the console daemon,
and that is fundamentally caused by event-channel notifications. So it might
make sense to rate limit the number of times we read the event-channel port
from xc_evtchn_pending -- e.g., no more than 10 times a second (should be
plenty). This has a few advantages: 1. Looking just at data transferred
doesn''t stop a malicious domain from hosing you with no-op
event-channel
notifications; 2. This is a fundamental metric that can be measured and
rate-limited on all backend interfaces, so perhaps we can come up with some
common library of code that we apply to all backends/daemons.

It may turn out we need to rate limit on data *as well*, if it turns out
that sinking many kilobytes of data a second is prohibitively expensive, but
I doubt this will happen. For a start, the strict limiting of notifications
will encourage data to get queued up and improve batching of console data,
and batches of data should be processed quite efficiently. This same
argument extends to other backends (e.g., batching of requests in xenstored,
netback, blkback, etc).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-30 18:13 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com>
wrote:
> Finaly question - in the handle_ring_read() method is the port returned
> by xc_evtchn_pending guarrenteed to be same as port we already have a
> reference to in ''struct domain'' local_port field ? If so
I can remove
> the saved reference ''limited_port'' I added in
''struct buffer''
Yes, that is guaranteed.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel P. Berrange

2006-Sep-11 14:19 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser
wrote:> On 29/8/06 4:59 pm, "Daniel P. Berrange"
<berrange@redhat.com> wrote:
> 
> > The patch sets
> >   - data size 5 kb
> >   - period 200 ms
> >   - delay 200 ms
> 
> A few comments:
>  * I think the ''delay'' parameter is not really useful.
Think of this simply
> as a simple credit-based scheduler that replenishes credit every
''period''.
> So in this case you get 5kB every 200ms. If the domU offers more data in a
> period, it must wait until its credit is replenished at the end of the
> current 200ms period.
>  * I''m not sure bytes of data is the right thing to limit here.
The main
> thing that hoses domain0 is the repeated rescheduling of the console
daemon,
> and that is fundamentally caused by event-channel notifications. So it
might
> make sense to rate limit the number of times we read the event-channel port
> from xc_evtchn_pending -- e.g., no more than 10 times a second (should be
> plenty). This has a few advantages: 1. Looking just at data transferred
> doesn''t stop a malicious domain from hosing you with no-op
event-channel
> notifications; 2. This is a fundamental metric that can be measured and
> rate-limited on all backend interfaces, so perhaps we can come up with some
> common library of code that we apply to all backends/daemons.
I''ve re-worked the patch based on this principle of "n events
allowed
in each time-slice", setting n=30 & the time-slice = 200ms. The code
was actually much simpler than my previous patch so its definitely a
winning strategy.  Testing by running

 ''while /bin/true ; do echo t > /proc/sysrq-trigger; done''

..in one of the guest VMs on a  2.2 GHz Opteron, shows no significant CPU
utilization attributed to xenconsoled. I''ve not examined whether this
code
can be put into a common library - it was simple enough to integrate in
the xenconsoled event loop
> It may turn out we need to rate limit on data *as well*, if it turns out
> that sinking many kilobytes of data a second is prohibitively expensive,
but
> I doubt this will happen. For a start, the strict limiting of notifications
> will encourage data to get queued up and improve batching of console data,
> and batches of data should be processed quite efficiently. This same
> argument extends to other backends (e.g., batching of requests in
xenstored,
> netback, blkback, etc).
Based on initial testing it doesn''t look like the data rate itself was
causing
any significant overhead, once the event channel port reads were limited.


 Signed-off by: Daniel P. Berrange <berrange@redhat.com>

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel P. Berrange

2006-Oct-04 13:50 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

Hi Keir,

Any feedback on this rate limiting patch for DomU consoles ?

Regards,
Dan.

On Mon, Sep 11, 2006 at 03:19:05PM +0100, Daniel P. Berrange
wrote:> On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote:
> > On 29/8/06 4:59 pm, "Daniel P. Berrange"
<berrange@redhat.com> wrote:
> > 
> > > The patch sets
> > >   - data size 5 kb
> > >   - period 200 ms
> > >   - delay 200 ms
> > 
> > A few comments:
> >  * I think the ''delay'' parameter is not really
useful. Think of this simply
> > as a simple credit-based scheduler that replenishes credit every
''period''.
> > So in this case you get 5kB every 200ms. If the domU offers more data
in a
> > period, it must wait until its credit is replenished at the end of the
> > current 200ms period.
> >  * I''m not sure bytes of data is the right thing to limit
here. The main
> > thing that hoses domain0 is the repeated rescheduling of the console
daemon,
> > and that is fundamentally caused by event-channel notifications. So it
might
> > make sense to rate limit the number of times we read the event-channel
port
> > from xc_evtchn_pending -- e.g., no more than 10 times a second (should
be
> > plenty). This has a few advantages: 1. Looking just at data
transferred
> > doesn''t stop a malicious domain from hosing you with no-op
event-channel
> > notifications; 2. This is a fundamental metric that can be measured
and
> > rate-limited on all backend interfaces, so perhaps we can come up with
some
> > common library of code that we apply to all backends/daemons.
> 
> I''ve re-worked the patch based on this principle of "n events
allowed
> in each time-slice", setting n=30 & the time-slice = 200ms. The
code
> was actually much simpler than my previous patch so its definitely a
> winning strategy.  Testing by running
> 
>  ''while /bin/true ; do echo t > /proc/sysrq-trigger;
done''
> 
> ..in one of the guest VMs on a  2.2 GHz Opteron, shows no significant CPU
> utilization attributed to xenconsoled. I''ve not examined whether
this code
> can be put into a common library - it was simple enough to integrate in
> the xenconsoled event loop
> 
> > It may turn out we need to rate limit on data *as well*, if it turns
out
> > that sinking many kilobytes of data a second is prohibitively
expensive, but
> > I doubt this will happen. For a start, the strict limiting of
notifications
> > will encourage data to get queued up and improve batching of console
data,
> > and batches of data should be processed quite efficiently. This same
> > argument extends to other backends (e.g., batching of requests in
xenstored,
> > netback, blkback, etc).
> 
> Based on initial testing it doesn''t look like the data rate itself
was causing
> any significant overhead, once the event channel port reads were limited.
> 
> 
>  Signed-off by: Daniel P. Berrange <berrange@redhat.com>
> 
> Regards,
> Dan.
> -- 
> |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496
-=|
> |=-           Perl modules: http://search.cpan.org/~danberr/             
-=|
> |=-               Projects: http://freshmeat.net/~danielpb/              
-=|
> |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 
-=|
> diff -r bfd00b317815 tools/console/daemon/io.c
> --- a/tools/console/daemon/io.c	Mon Sep 11 01:55:03 2006 +0100
> +++ b/tools/console/daemon/io.c	Mon Sep 11 10:09:32 2006 -0400
> @@ -37,12 +37,18 @@
>  #include <termios.h>
>  #include <stdarg.h>
>  #include <sys/mman.h>
> +#include <sys/time.h>
>  
>  #define MAX(a, b) (((a) > (b)) ? (a) : (b))
>  #define MIN(a, b) (((a) < (b)) ? (a) : (b))
>  
>  /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul terminator.
*/
>  #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 + 2)
> +
> +/* How many events are allowed in each time period */
> +#define RATE_LIMIT_ALLOWANCE 30
> +/* Duration of each time period in ms */
> +#define RATE_LIMIT_PERIOD 200
>  
>  struct buffer
>  {
> @@ -65,6 +71,8 @@ struct domain
>  	evtchn_port_t local_port;
>  	int xce_handle;
>  	struct xencons_interface *interface;
> +	int event_count;
> +	long long next_period;
>  };
>  
>  static struct domain *dom_head;
> @@ -306,6 +314,13 @@ static struct domain *create_domain(int 
>  {
>  	struct domain *dom;
>  	char *s;
> +	struct timeval tv;
> +
> +	if (gettimeofday(&tv, NULL) < 0) {
> +		dolog(LOG_ERR, "Cannot get time of day %s:%s:L%d",
> +		      __FILE__, __FUNCTION__, __LINE__);
> +		return NULL;
> +	}
>  
>  	dom = (struct domain *)malloc(sizeof(struct domain));
>  	if (dom == NULL) {
> @@ -330,6 +345,8 @@ static struct domain *create_domain(int 
>  	dom->buffer.size = 0;
>  	dom->buffer.capacity = 0;
>  	dom->buffer.max_capacity = 0;
> +	dom->event_count = 0;
> +	dom->next_period = (tv.tv_sec * 1000) + (tv.tv_usec / 1000) +
RATE_LIMIT_PERIOD;
>  	dom->next = NULL;
>  
>  	dom->ring_ref = -1;
> @@ -512,9 +529,12 @@ static void handle_ring_read(struct doma
>  	if ((port = xc_evtchn_pending(dom->xce_handle)) == -1)
>  		return;
>  
> +	dom->event_count++;
> +
>  	buffer_append(dom);
>  
> -	(void)xc_evtchn_unmask(dom->xce_handle, port);
> +	if (dom->event_count < RATE_LIMIT_ALLOWANCE)
> +		(void)xc_evtchn_unmask(dom->xce_handle, port);
>  }
>  
>  static void handle_xs(void)
> @@ -549,6 +569,9 @@ void handle_io(void)
>  	do {
>  		struct domain *d, *n;
>  		int max_fd = -1;
> +		struct timeval timeout;
> +		struct timeval tv;
> +		long long now, next_timeout = 0;
>  
>  		FD_ZERO(&readfds);
>  		FD_ZERO(&writefds);
> @@ -556,8 +579,33 @@ void handle_io(void)
>  		FD_SET(xs_fileno(xs), &readfds);
>  		max_fd = MAX(xs_fileno(xs), max_fd);
>  
> +		if (gettimeofday(&tv, NULL) < 0)
> +			return;
> +		now = (tv.tv_sec * 1000) + (tv.tv_usec / 1000);
> +
> +		/* Re-calculate any event counter allowances & unblock
> +		   domains with new allowance */
>  		for (d = dom_head; d; d = d->next) {
> -			if (d->xce_handle != -1) {
> +			/* Add 5ms of fuzz since select() often returns
> +			   a couple of ms sooner than requested. Without
> +			   the fuzz we typically do an extra spin in select()
> +			   with a 1/2 ms timeout every other iteration */
> +			if ((now+5) > d->next_period) {
> +				d->next_period = now + RATE_LIMIT_PERIOD;
> +				if (d->event_count >= RATE_LIMIT_ALLOWANCE) {
> +					(void)xc_evtchn_unmask(d->xce_handle, d->local_port);
> +				}
> +				d->event_count = 0;
> +			}
> +		}
> +
> +		for (d = dom_head; d; d = d->next) {
> +			if (d->event_count >= RATE_LIMIT_ALLOWANCE) {
> +				/* Determine if we''re going to be the next time slice to
expire */
> +				if (!next_timeout ||
> +				    d->next_period < next_timeout)
> +					next_timeout = d->next_period;
> +			} else if (d->xce_handle != -1) {
>  				int evtchn_fd = xc_evtchn_fd(d->xce_handle);
>  				FD_SET(evtchn_fd, &readfds);
>  				max_fd = MAX(evtchn_fd, max_fd);
> @@ -573,16 +621,29 @@ void handle_io(void)
>  			}
>  		}
>  
> -		ret = select(max_fd + 1, &readfds, &writefds, 0, NULL);
> +		/* If any domain has been rate limited, we need to work
> +		   out what timeout to supply to select */
> +		if (next_timeout) {
> +			long long duration = (next_timeout - now);
> +			/* Shouldn''t happen, but sanity check force greater than 0 */
> +			if (duration <= 0)
> +				duration = 1;
> +			timeout.tv_sec = duration / 1000;
> +			timeout.tv_usec = (duration - (timeout.tv_sec * 1000)) * 1000;
> +		}
> +
> +		ret = select(max_fd + 1, &readfds, &writefds, 0, next_timeout ?
&timeout : NULL);
>  
>  		if (FD_ISSET(xs_fileno(xs), &readfds))
>  			handle_xs();
>  
>  		for (d = dom_head; d; d = n) {
>  			n = d->next;
> -			if (d->xce_handle != -1 &&
> -			    FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds))
> -				handle_ring_read(d);
> +			if (d->event_count < RATE_LIMIT_ALLOWANCE) {
> +				if (d->xce_handle != -1 &&
> +				    FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds))
> +					handle_ring_read(d);
> +			}
>  
>  			if (d->tty_fd != -1 && FD_ISSET(d->tty_fd, &readfds))
>  				handle_tty_read(d);
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Oct-04 14:19 UTC

head link

Re: [Xen-devel] xenconsoled CPU denial of service problem

On 4/10/06 14:50, "Daniel P. Berrange" <berrange@redhat.com>
wrote:
> Hi Keir,
> 
> Any feedback on this rate limiting patch for DomU consoles ?
I''ll look more closely when we re-open development. I think
we''ll get it in
fairly quickly when that happens.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Oct-04 16:49 UTC

head link

[Xen-devel] Re: xenconsoled CPU denial of service problem

Considering that today in Xen we have a default buffer size, it seems 
considerably easier to me to just get rid of xenconsoled completely and 
expand the domU-kernel ring queue to be the actual size of what we''re 
buffering today.

This eliminates all of these problems and gets rid of a dom0 daemon. 
Plus, the domU gets taxed for the buffer memory instead of dom0.

We would then change xenconsole to read the buffer directly.

How does this sound for 3.0.4?

Regards,

Anthony Liguori


Daniel P. Berrange wrote:> Hi Keir,
> 
> Any feedback on this rate limiting patch for DomU consoles ?
> 
> Regards,
> Dan.
> 
> On Mon, Sep 11, 2006 at 03:19:05PM +0100, Daniel P. Berrange wrote:
>> On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote:
>>> On 29/8/06 4:59 pm, "Daniel P. Berrange"
<berrange@redhat.com> wrote:
>>>
>>>> The patch sets
>>>>   - data size 5 kb
>>>>   - period 200 ms
>>>>   - delay 200 ms
>>> A few comments:
>>>  * I think the ''delay'' parameter is not really
useful. Think of this simply
>>> as a simple credit-based scheduler that replenishes credit every
''period''.
>>> So in this case you get 5kB every 200ms. If the domU offers more
data in a
>>> period, it must wait until its credit is replenished at the end of
the
>>> current 200ms period.
>>>  * I''m not sure bytes of data is the right thing to limit
here. The main
>>> thing that hoses domain0 is the repeated rescheduling of the
console daemon,
>>> and that is fundamentally caused by event-channel notifications. So
it might
>>> make sense to rate limit the number of times we read the
event-channel port
>>> from xc_evtchn_pending -- e.g., no more than 10 times a second
(should be
>>> plenty). This has a few advantages: 1. Looking just at data
transferred
>>> doesn''t stop a malicious domain from hosing you with no-op
event-channel
>>> notifications; 2. This is a fundamental metric that can be measured
and
>>> rate-limited on all backend interfaces, so perhaps we can come up
with some
>>> common library of code that we apply to all backends/daemons.
>> I''ve re-worked the patch based on this principle of "n
events allowed
>> in each time-slice", setting n=30 & the time-slice = 200ms.
The code
>> was actually much simpler than my previous patch so its definitely a
>> winning strategy.  Testing by running
>>
>>  ''while /bin/true ; do echo t > /proc/sysrq-trigger;
done''
>>
>> ..in one of the guest VMs on a  2.2 GHz Opteron, shows no significant
CPU
>> utilization attributed to xenconsoled. I''ve not examined
whether this code
>> can be put into a common library - it was simple enough to integrate in
>> the xenconsoled event loop
>>
>>> It may turn out we need to rate limit on data *as well*, if it
turns out
>>> that sinking many kilobytes of data a second is prohibitively
expensive, but
>>> I doubt this will happen. For a start, the strict limiting of
notifications
>>> will encourage data to get queued up and improve batching of
console data,
>>> and batches of data should be processed quite efficiently. This
same
>>> argument extends to other backends (e.g., batching of requests in
xenstored,
>>> netback, blkback, etc).
>> Based on initial testing it doesn''t look like the data rate
itself was causing
>> any significant overhead, once the event channel port reads were
limited.
>>
>>
>>  Signed-off by: Daniel P. Berrange <berrange@redhat.com>
>>
>> Regards,
>> Dan.
>> -- 
>> |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392
2496 -=|
>> |=-           Perl modules: http://search.cpan.org/~danberr/           
-=|
>> |=-               Projects: http://freshmeat.net/~danielpb/            
-=|
>> |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B
9505  -=|
> 
>> diff -r bfd00b317815 tools/console/daemon/io.c
>> --- a/tools/console/daemon/io.c	Mon Sep 11 01:55:03 2006 +0100
>> +++ b/tools/console/daemon/io.c	Mon Sep 11 10:09:32 2006 -0400
>> @@ -37,12 +37,18 @@
>>  #include <termios.h>
>>  #include <stdarg.h>
>>  #include <sys/mman.h>
>> +#include <sys/time.h>
>>  
>>  #define MAX(a, b) (((a) > (b)) ? (a) : (b))
>>  #define MIN(a, b) (((a) < (b)) ? (a) : (b))
>>  
>>  /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul
terminator. */
>>  #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 +
2)
>> +
>> +/* How many events are allowed in each time period */
>> +#define RATE_LIMIT_ALLOWANCE 30
>> +/* Duration of each time period in ms */
>> +#define RATE_LIMIT_PERIOD 200
>>  
>>  struct buffer
>>  {
>> @@ -65,6 +71,8 @@ struct domain
>>  	evtchn_port_t local_port;
>>  	int xce_handle;
>>  	struct xencons_interface *interface;
>> +	int event_count;
>> +	long long next_period;
>>  };
>>  
>>  static struct domain *dom_head;
>> @@ -306,6 +314,13 @@ static struct domain *create_domain(int 
>>  {
>>  	struct domain *dom;
>>  	char *s;
>> +	struct timeval tv;
>> +
>> +	if (gettimeofday(&tv, NULL) < 0) {
>> +		dolog(LOG_ERR, "Cannot get time of day %s:%s:L%d",
>> +		      __FILE__, __FUNCTION__, __LINE__);
>> +		return NULL;
>> +	}
>>  
>>  	dom = (struct domain *)malloc(sizeof(struct domain));
>>  	if (dom == NULL) {
>> @@ -330,6 +345,8 @@ static struct domain *create_domain(int 
>>  	dom->buffer.size = 0;
>>  	dom->buffer.capacity = 0;
>>  	dom->buffer.max_capacity = 0;
>> +	dom->event_count = 0;
>> +	dom->next_period = (tv.tv_sec * 1000) + (tv.tv_usec / 1000) +
RATE_LIMIT_PERIOD;
>>  	dom->next = NULL;
>>  
>>  	dom->ring_ref = -1;
>> @@ -512,9 +529,12 @@ static void handle_ring_read(struct doma
>>  	if ((port = xc_evtchn_pending(dom->xce_handle)) == -1)
>>  		return;
>>  
>> +	dom->event_count++;
>> +
>>  	buffer_append(dom);
>>  
>> -	(void)xc_evtchn_unmask(dom->xce_handle, port);
>> +	if (dom->event_count < RATE_LIMIT_ALLOWANCE)
>> +		(void)xc_evtchn_unmask(dom->xce_handle, port);
>>  }
>>  
>>  static void handle_xs(void)
>> @@ -549,6 +569,9 @@ void handle_io(void)
>>  	do {
>>  		struct domain *d, *n;
>>  		int max_fd = -1;
>> +		struct timeval timeout;
>> +		struct timeval tv;
>> +		long long now, next_timeout = 0;
>>  
>>  		FD_ZERO(&readfds);
>>  		FD_ZERO(&writefds);
>> @@ -556,8 +579,33 @@ void handle_io(void)
>>  		FD_SET(xs_fileno(xs), &readfds);
>>  		max_fd = MAX(xs_fileno(xs), max_fd);
>>  
>> +		if (gettimeofday(&tv, NULL) < 0)
>> +			return;
>> +		now = (tv.tv_sec * 1000) + (tv.tv_usec / 1000);
>> +
>> +		/* Re-calculate any event counter allowances & unblock
>> +		   domains with new allowance */
>>  		for (d = dom_head; d; d = d->next) {
>> -			if (d->xce_handle != -1) {
>> +			/* Add 5ms of fuzz since select() often returns
>> +			   a couple of ms sooner than requested. Without
>> +			   the fuzz we typically do an extra spin in select()
>> +			   with a 1/2 ms timeout every other iteration */
>> +			if ((now+5) > d->next_period) {
>> +				d->next_period = now + RATE_LIMIT_PERIOD;
>> +				if (d->event_count >= RATE_LIMIT_ALLOWANCE) {
>> +					(void)xc_evtchn_unmask(d->xce_handle, d->local_port);
>> +				}
>> +				d->event_count = 0;
>> +			}
>> +		}
>> +
>> +		for (d = dom_head; d; d = d->next) {
>> +			if (d->event_count >= RATE_LIMIT_ALLOWANCE) {
>> +				/* Determine if we''re going to be the next time slice to
expire */
>> +				if (!next_timeout ||
>> +				    d->next_period < next_timeout)
>> +					next_timeout = d->next_period;
>> +			} else if (d->xce_handle != -1) {
>>  				int evtchn_fd = xc_evtchn_fd(d->xce_handle);
>>  				FD_SET(evtchn_fd, &readfds);
>>  				max_fd = MAX(evtchn_fd, max_fd);
>> @@ -573,16 +621,29 @@ void handle_io(void)
>>  			}
>>  		}
>>  
>> -		ret = select(max_fd + 1, &readfds, &writefds, 0, NULL);
>> +		/* If any domain has been rate limited, we need to work
>> +		   out what timeout to supply to select */
>> +		if (next_timeout) {
>> +			long long duration = (next_timeout - now);
>> +			/* Shouldn''t happen, but sanity check force greater than 0
*/
>> +			if (duration <= 0)
>> +				duration = 1;
>> +			timeout.tv_sec = duration / 1000;
>> +			timeout.tv_usec = (duration - (timeout.tv_sec * 1000)) * 1000;
>> +		}
>> +
>> +		ret = select(max_fd + 1, &readfds, &writefds, 0,
next_timeout ? &timeout : NULL);
>>  
>>  		if (FD_ISSET(xs_fileno(xs), &readfds))
>>  			handle_xs();
>>  
>>  		for (d = dom_head; d; d = n) {
>>  			n = d->next;
>> -			if (d->xce_handle != -1 &&
>> -			    FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds))
>> -				handle_ring_read(d);
>> +			if (d->event_count < RATE_LIMIT_ALLOWANCE) {
>> +				if (d->xce_handle != -1 &&
>> +				    FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds))
>> +					handle_ring_read(d);
>> +			}
>>  
>>  			if (d->tty_fd != -1 && FD_ISSET(d->tty_fd,
&readfds))
>>  				handle_tty_read(d);
> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel P. Berrange

2006-Oct-04 17:19 UTC

head link

[Xen-devel] Re: xenconsoled CPU denial of service problem

On Wed, Oct 04, 2006 at 11:49:56AM -0500, Anthony Liguori
wrote:> Considering that today in Xen we have a default buffer size, it seems 
> considerably easier to me to just get rid of xenconsoled completely and 
> expand the domU-kernel ring queue to be the actual size of what
we''re
> buffering today.
> 
> This eliminates all of these problems and gets rid of a dom0 daemon. 
> Plus, the domU gets taxed for the buffer memory instead of dom0.
> 
> We would then change xenconsole to read the buffer directly.
Its very useful to be able to expose the data as a Psuedo-TTY, as
it lets people use standard toolset for dealing the DomU log data.
eg virt-manager can just connect up a VTE terminal widget straight
to the TTY for a terminal UI. Or tools like ttywatch can log the
data to file, or network, etc. Or minicom for a standard text based
interactive client, etc Forcing everything to use the custom
xenconsole client program would be a step backward.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Oct-04 17:42 UTC

head link

[Xen-devel] Re: xenconsoled CPU denial of service problem

On 4/10/06 18:19, "Daniel P. Berrange" <berrange@redhat.com>
wrote:
>> Considering that today in Xen we have a default buffer size, it seems
>> considerably easier to me to just get rid of xenconsoled completely and
>> expand the domU-kernel ring queue to be the actual size of what
we''re
>> buffering today.
>> 
>> This eliminates all of these problems and gets rid of a dom0 daemon.
>> Plus, the domU gets taxed for the buffer memory instead of dom0.
>> 
>> We would then change xenconsole to read the buffer directly.
> 
> Its very useful to be able to expose the data as a Psuedo-TTY, as
> it lets people use standard toolset for dealing the DomU log data.
> eg virt-manager can just connect up a VTE terminal widget straight
> to the TTY for a terminal UI. Or tools like ttywatch can log the
> data to file, or network, etc. Or minicom for a standard text based
> interactive client, etc Forcing everything to use the custom
> xenconsole client program would be a step backward.
There''s also the issue of backward compatibility with guests with a
small
inter-domain ring. And this doesn''t solve the fundamental problem of
dom0
getting slammed with lots of event-channel notification. We need rate
limiting anyhow, on all our inter-domain comms channels.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Oct-04 17:52 UTC

head link

[Xen-devel] Re: xenconsoled CPU denial of service problem

Daniel P. Berrange wrote:> On Wed, Oct 04, 2006 at 11:49:56AM -0500, Anthony Liguori wrote:
>   
>> Considering that today in Xen we have a default buffer size, it seems 
>> considerably easier to me to just get rid of xenconsoled completely and
>> expand the domU-kernel ring queue to be the actual size of what
we''re
>> buffering today.
>>
>> This eliminates all of these problems and gets rid of a dom0 daemon. 
>> Plus, the domU gets taxed for the buffer memory instead of dom0.
>>
>> We would then change xenconsole to read the buffer directly.
>>     
>
> Its very useful to be able to expose the data as a Psuedo-TTY, as
> it lets people use standard toolset for dealing the DomU log data.
> eg virt-manager can just connect up a VTE terminal widget straight
> to the TTY for a terminal UI. Or tools like ttywatch can log the
> data to file, or network, etc. Or minicom for a standard text based
> interactive client, etc Forcing everything to use the custom
> xenconsole client program would be a step backward.
>   
Xenconsole could still spit out on a PTY.  You don''t necessarily need a
daemon though (you could launch a xenconsole for each domain that was 
started).

That also gives you a bit more choice in how you expose the console (you 
could have a xenconsole that spit out via TCP).

Furthermore, we don''t have to break guest compatibility.  Older guests 
just have a very small buffer.

It doesn''t solve the event channel storming..  Maybe that''s
something to
do in the hypervisor?

Regards,

Anthony Liguori
> Dan.
>   

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel P. Berrange

2006-Oct-04 18:10 UTC

head link

[Xen-devel] Re: xenconsoled CPU denial of service problem

On Wed, Oct 04, 2006 at 12:52:52PM -0500, Anthony Liguori
wrote:> Daniel P. Berrange wrote:
> >On Wed, Oct 04, 2006 at 11:49:56AM -0500, Anthony Liguori wrote:
> >  
> >>Considering that today in Xen we have a default buffer size, it
seems
> >>considerably easier to me to just get rid of xenconsoled completely
and
> >>expand the domU-kernel ring queue to be the actual size of what
we''re
> >>buffering today.
> >>
> >>This eliminates all of these problems and gets rid of a dom0
daemon.
> >>Plus, the domU gets taxed for the buffer memory instead of dom0.
> >>
> >>We would then change xenconsole to read the buffer directly.
> >>    
> >
> >Its very useful to be able to expose the data as a Psuedo-TTY, as
> >it lets people use standard toolset for dealing the DomU log data.
> >eg virt-manager can just connect up a VTE terminal widget straight
> >to the TTY for a terminal UI. Or tools like ttywatch can log the
> >data to file, or network, etc. Or minicom for a standard text based
> >interactive client, etc Forcing everything to use the custom
> >xenconsole client program would be a step backward.
> >  
> 
> Xenconsole could still spit out on a PTY.  You don''t necessarily
need a
> daemon though (you could launch a xenconsole for each domain that was 
> started).
The xenconsole would still need the rate-limiting, and once you''re
launching
one xenconsole per domain, where''s the gain over the single xenconsoled
process ? 
> That also gives you a bit more choice in how you expose the console (you 
> could have a xenconsole that spit out via TCP).
Given a TTY, there are already tools which can do this & more. So I
don''t see
any point in writing such functionality again for Xen. If using HVM domains
one would already typically be exposing a serial console from the guest via
a pseudo-TTY, so doing all PV console stuff via a TTY gives parity in the
management toolset.

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Oct-04 18:19 UTC

head link

[Xen-devel] Re: xenconsoled CPU denial of service problem

Daniel P. Berrange wrote:>> Xenconsole could still spit out on a PTY.  You don''t
necessarily need a
>> daemon though (you could launch a xenconsole for each domain that was 
>> started).
>>     
>
> The xenconsole would still need the rate-limiting, and once you''re
launching
> one xenconsole per domain, where''s the gain over the single
xenconsoled
> process ? 
>   
When the pty is writable, try to read information from ring queue.  Only 
let the pty become readable when there is data in the queue ready to be 
ready.  You only need to listen for event channel notifications when the 
queue is empty (so you cannot be stormed).

Likewise, you don''t have to listen for event channel notifications when
the queue is full, and you have data to write.

The only time you could be DoS is when someone is attached to the PTY 
and actively reading/writing to it.  I would then argue that it''s not 
xenconsole''s responsibility to rate limit.  It''s whoever is
reading or
writing.
>> That also gives you a bit more choice in how you expose the console
(you
>> could have a xenconsole that spit out via TCP).
>>     
>
> Given a TTY, there are already tools which can do this & more. So I
don''t see
> any point in writing such functionality again for Xen. If using HVM domains
> one would already typically be exposing a serial console from the guest via
> a pseudo-TTY, so doing all PV console stuff via a TTY gives parity in the
> management toolset.
>   
I''m happy exposing a pty for the console.  I made those same arguments 
myself when we first did it :-)  I''m simply suggesting that we move the
buffering to domU.

Regards,

Anthony Liguori
> Regards,
> Dan.
>   

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2006 - xenconsoled CPU denial of service problem

[Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

Re: [Xen-devel] xenconsoled CPU denial of service problem

[Xen-devel] Re: xenconsoled CPU denial of service problem

[Xen-devel] Re: xenconsoled CPU denial of service problem

[Xen-devel] Re: xenconsoled CPU denial of service problem

[Xen-devel] Re: xenconsoled CPU denial of service problem

[Xen-devel] Re: xenconsoled CPU denial of service problem

[Xen-devel] Re: xenconsoled CPU denial of service problem