Daniel P. Berrange
2006-Aug-28 18:02 UTC
[Xen-devel] xenconsoled CPU denial of service problem
I was examining what would happen if I ballooned a guest down to 15 MB in size, and besides the obvious (kernel kills everything in sight :-), I found a non-trivial CPU denial of service problem with xenconsoled. When I ballooned the guest down the DomU kernel went mental printing out Out of Memory: Kill process 2 (migration/0) score 0 and children. Out of Memory: Kill process 2 (migration/0) score 0 and children. Out of Memory: Kill process 2 (migration/0) score 0 and children. Out of Memory: Kill process 2 (migration/0) score 0 and children. ..... over & over again, as fast as it could - using 100% of CPU time allocated to that DomU by the schedular. This isn''t a problem in itself because with sensible schedular settings the impact on other DomU''s would be minimal. The problem is that at the same time, xenconsoled consumes 100% of the CPU time available to Domain-0 trying to read the console data generated by the DomU OS. It is processing the console data from DomU, even though there are no (xm console) client sessions with the slave end of the Psuedo-TTY open. Now the particular ballooning scenario I used falls under the "don''t do that then" category, but more generally this is still a problem. Any DomU OS can inflict a CPU denial of service attack on Domain-0 (and thus other Dom-U) simply by generating a huge quantity of data on its (serial) console - either accidentally, or maliciously I''ve been looking at the the xenconsoled source to see if there''s any way this can be resolved. With no xm console clients attached, the xenconsoled is spinning select(10, [5 6 7 8 9], [6], NULL, NULL) = 1 (in [5]) read(5, "\24\0\0\0", 4) = 4 mremap(0x2aaaaaac0000, 1052672, 1052672, MREMAP_MAYMOVE) = 0x2aaaaaac0000 ioctl(5, EVIOCGKEYCODE, 0x7fffc89a1f50) = 0 mremap(0x2aaaaaac0000, 1052672, 1052672, MREMAP_MAYMOVE) = 0x2aaaaaac0000 write(5, "\24\0\0\0", 4) = 4 Which corresponds to handle_ring_read() in tools/console/daemon/io.c What I''d like to do is figure out a way to avoid processing the ring buffer if there are is no client connected to the /dev/pts/NNN slave tty associated with the domain. I tried to detect presence of a client by adding the master TTY filehandle to the writefds set in select(), but it appears the master end of a tty is signalled writable even if the other end is disconnected. Does anyone know of any alternative approach to detecting whether the fd for the master end of a psuedo-TTY, has a its end slave open / active ? Without being able to detect this I don''t see any good way to avoid the DOS attack in the general case - only other option would be to start dropping data once > a certain rate, but this isn''t really very desirable because there are (debug) scenarios in which you really do want the ability to capture all data. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Aug-28 20:57 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
On 28/8/06 7:02 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote:> Does anyone know of any alternative approach to detecting whether the fd > for the master end of a psuedo-TTY, has a its end slave open / active ? > Without being able to detect this I don''t see any good way to avoid the DOS > attack in the general case - only other option would be to start dropping > data once > a certain rate, but this isn''t really very desirable because > there are (debug) scenarios in which you really do want the ability to > capture all data.The protocol has flow control. If we rate-limited xenconsoled consumption of data from each domU ring, we would limit resource consumption in dom0 and not lose any data (since the domU will simply buffer it internally). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Aug-29 15:59 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
On Mon, Aug 28, 2006 at 09:57:22PM +0100, Keir Fraser wrote:> On 28/8/06 7:02 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > Does anyone know of any alternative approach to detecting whether the fd > > for the master end of a psuedo-TTY, has a its end slave open / active ? > > Without being able to detect this I don''t see any good way to avoid the DOS > > attack in the general case - only other option would be to start dropping > > data once > a certain rate, but this isn''t really very desirable because > > there are (debug) scenarios in which you really do want the ability to > > capture all data. > > The protocol has flow control. If we rate-limited xenconsoled consumption of > data from each domU ring, we would limit resource consumption in dom0 and > not lose any data (since the domU will simply buffer it internally).Ah, that''s very handy. Looking at the xenconsoled code I think I''ve worked out how to slow things down - although I''m not entirely sure I''m correctly activating the throttling - basically I''m delaying the calls to the methods xc_evtchn_notify & xc_evtchn_unmask after consuming data - is this the correct approach ? I''m attaching a patch which implements this scheme. In the buffer_append method we keep a cumulative count of how much data we''ve consumed. Every time if exceeds N bytes, we calculate the period over which those N bytes were received. If this period is < than a threshold (ie fast) then I insert a small delay before notifying & unmasking the event channel. The patch sets - data size 5 kb - period 200 ms - delay 200 ms ie, if we receive 5 kb of data in less than 200 ms, we delay for 200 ms before allowing more. These constants are #define''d for easy tuning, the current values are fairly conservative / low data rate. Any thoughts on what an appropriate data rate to allow from DomU is ? Finaly question - in the handle_ring_read() method is the port returned by xc_evtchn_pending guarrenteed to be same as port we already have a reference to in ''struct domain'' local_port field ? If so I can remove the saved reference ''limited_port'' I added in ''struct buffer'' For testing purposes just pick a DomU and run while /bin/true ; do echo t > /proc/sysrq-trigger ; done Ordinarily this would cause Dom-0/xenconsoled to consume 100%. With this patch applied there is negligable CPU time consumed by Dom-0/xenconsoled. (If you increase delay to 2,000 ms you can visually see that no data is being lost as a result of the delays). Signed-off by: Daniel P. Berrange <berrange@redhat.com> Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Aug-30 18:13 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote:> Ah, that''s very handy. Looking at the xenconsoled code I think I''ve worked > out how to slow things down - although I''m not entirely sure I''m correctly > activating the throttling - basically I''m delaying the calls to the methods > xc_evtchn_notify & xc_evtchn_unmask after consuming data - is this the > correct approach ?Yes, the essence of your approach is exactly right. Delaying the unmask is the critical thing -- there''s not really any need to delay the notification (in fact I don''t think you should bother). But by delaying the unmask you avoid any possibility of a DoS attack on that interdomain link -- without this the domU could potentially hose the dom0 with lots of pointless event-channel notifications. This is theoretically a problem on every one of our split drivers (xenstore, blk, net) so it''d be good to think of the most general possible solution we can think of here that we can apply to every backend.> The patch sets > - data size 5 kb > - period 200 ms > - delay 200 msA few comments: * I think the ''delay'' parameter is not really useful. Think of this simply as a simple credit-based scheduler that replenishes credit every ''period''. So in this case you get 5kB every 200ms. If the domU offers more data in a period, it must wait until its credit is replenished at the end of the current 200ms period. * I''m not sure bytes of data is the right thing to limit here. The main thing that hoses domain0 is the repeated rescheduling of the console daemon, and that is fundamentally caused by event-channel notifications. So it might make sense to rate limit the number of times we read the event-channel port from xc_evtchn_pending -- e.g., no more than 10 times a second (should be plenty). This has a few advantages: 1. Looking just at data transferred doesn''t stop a malicious domain from hosing you with no-op event-channel notifications; 2. This is a fundamental metric that can be measured and rate-limited on all backend interfaces, so perhaps we can come up with some common library of code that we apply to all backends/daemons. It may turn out we need to rate limit on data *as well*, if it turns out that sinking many kilobytes of data a second is prohibitively expensive, but I doubt this will happen. For a start, the strict limiting of notifications will encourage data to get queued up and improve batching of console data, and batches of data should be processed quite efficiently. This same argument extends to other backends (e.g., batching of requests in xenstored, netback, blkback, etc). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Aug-30 18:13 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote:> Finaly question - in the handle_ring_read() method is the port returned > by xc_evtchn_pending guarrenteed to be same as port we already have a > reference to in ''struct domain'' local_port field ? If so I can remove > the saved reference ''limited_port'' I added in ''struct buffer''Yes, that is guaranteed. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Sep-11 14:19 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote:> On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > The patch sets > > - data size 5 kb > > - period 200 ms > > - delay 200 ms > > A few comments: > * I think the ''delay'' parameter is not really useful. Think of this simply > as a simple credit-based scheduler that replenishes credit every ''period''. > So in this case you get 5kB every 200ms. If the domU offers more data in a > period, it must wait until its credit is replenished at the end of the > current 200ms period. > * I''m not sure bytes of data is the right thing to limit here. The main > thing that hoses domain0 is the repeated rescheduling of the console daemon, > and that is fundamentally caused by event-channel notifications. So it might > make sense to rate limit the number of times we read the event-channel port > from xc_evtchn_pending -- e.g., no more than 10 times a second (should be > plenty). This has a few advantages: 1. Looking just at data transferred > doesn''t stop a malicious domain from hosing you with no-op event-channel > notifications; 2. This is a fundamental metric that can be measured and > rate-limited on all backend interfaces, so perhaps we can come up with some > common library of code that we apply to all backends/daemons.I''ve re-worked the patch based on this principle of "n events allowed in each time-slice", setting n=30 & the time-slice = 200ms. The code was actually much simpler than my previous patch so its definitely a winning strategy. Testing by running ''while /bin/true ; do echo t > /proc/sysrq-trigger; done'' ..in one of the guest VMs on a 2.2 GHz Opteron, shows no significant CPU utilization attributed to xenconsoled. I''ve not examined whether this code can be put into a common library - it was simple enough to integrate in the xenconsoled event loop> It may turn out we need to rate limit on data *as well*, if it turns out > that sinking many kilobytes of data a second is prohibitively expensive, but > I doubt this will happen. For a start, the strict limiting of notifications > will encourage data to get queued up and improve batching of console data, > and batches of data should be processed quite efficiently. This same > argument extends to other backends (e.g., batching of requests in xenstored, > netback, blkback, etc).Based on initial testing it doesn''t look like the data rate itself was causing any significant overhead, once the event channel port reads were limited. Signed-off by: Daniel P. Berrange <berrange@redhat.com> Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Oct-04 13:50 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
Hi Keir, Any feedback on this rate limiting patch for DomU consoles ? Regards, Dan. On Mon, Sep 11, 2006 at 03:19:05PM +0100, Daniel P. Berrange wrote:> On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote: > > On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote: > > > > > The patch sets > > > - data size 5 kb > > > - period 200 ms > > > - delay 200 ms > > > > A few comments: > > * I think the ''delay'' parameter is not really useful. Think of this simply > > as a simple credit-based scheduler that replenishes credit every ''period''. > > So in this case you get 5kB every 200ms. If the domU offers more data in a > > period, it must wait until its credit is replenished at the end of the > > current 200ms period. > > * I''m not sure bytes of data is the right thing to limit here. The main > > thing that hoses domain0 is the repeated rescheduling of the console daemon, > > and that is fundamentally caused by event-channel notifications. So it might > > make sense to rate limit the number of times we read the event-channel port > > from xc_evtchn_pending -- e.g., no more than 10 times a second (should be > > plenty). This has a few advantages: 1. Looking just at data transferred > > doesn''t stop a malicious domain from hosing you with no-op event-channel > > notifications; 2. This is a fundamental metric that can be measured and > > rate-limited on all backend interfaces, so perhaps we can come up with some > > common library of code that we apply to all backends/daemons. > > I''ve re-worked the patch based on this principle of "n events allowed > in each time-slice", setting n=30 & the time-slice = 200ms. The code > was actually much simpler than my previous patch so its definitely a > winning strategy. Testing by running > > ''while /bin/true ; do echo t > /proc/sysrq-trigger; done'' > > ..in one of the guest VMs on a 2.2 GHz Opteron, shows no significant CPU > utilization attributed to xenconsoled. I''ve not examined whether this code > can be put into a common library - it was simple enough to integrate in > the xenconsoled event loop > > > It may turn out we need to rate limit on data *as well*, if it turns out > > that sinking many kilobytes of data a second is prohibitively expensive, but > > I doubt this will happen. For a start, the strict limiting of notifications > > will encourage data to get queued up and improve batching of console data, > > and batches of data should be processed quite efficiently. This same > > argument extends to other backends (e.g., batching of requests in xenstored, > > netback, blkback, etc). > > Based on initial testing it doesn''t look like the data rate itself was causing > any significant overhead, once the event channel port reads were limited. > > > Signed-off by: Daniel P. Berrange <berrange@redhat.com> > > Regards, > Dan. > -- > |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| > |=- Perl modules: http://search.cpan.org/~danberr/ -=| > |=- Projects: http://freshmeat.net/~danielpb/ -=| > |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|> diff -r bfd00b317815 tools/console/daemon/io.c > --- a/tools/console/daemon/io.c Mon Sep 11 01:55:03 2006 +0100 > +++ b/tools/console/daemon/io.c Mon Sep 11 10:09:32 2006 -0400 > @@ -37,12 +37,18 @@ > #include <termios.h> > #include <stdarg.h> > #include <sys/mman.h> > +#include <sys/time.h> > > #define MAX(a, b) (((a) > (b)) ? (a) : (b)) > #define MIN(a, b) (((a) < (b)) ? (a) : (b)) > > /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul terminator. */ > #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 + 2) > + > +/* How many events are allowed in each time period */ > +#define RATE_LIMIT_ALLOWANCE 30 > +/* Duration of each time period in ms */ > +#define RATE_LIMIT_PERIOD 200 > > struct buffer > { > @@ -65,6 +71,8 @@ struct domain > evtchn_port_t local_port; > int xce_handle; > struct xencons_interface *interface; > + int event_count; > + long long next_period; > }; > > static struct domain *dom_head; > @@ -306,6 +314,13 @@ static struct domain *create_domain(int > { > struct domain *dom; > char *s; > + struct timeval tv; > + > + if (gettimeofday(&tv, NULL) < 0) { > + dolog(LOG_ERR, "Cannot get time of day %s:%s:L%d", > + __FILE__, __FUNCTION__, __LINE__); > + return NULL; > + } > > dom = (struct domain *)malloc(sizeof(struct domain)); > if (dom == NULL) { > @@ -330,6 +345,8 @@ static struct domain *create_domain(int > dom->buffer.size = 0; > dom->buffer.capacity = 0; > dom->buffer.max_capacity = 0; > + dom->event_count = 0; > + dom->next_period = (tv.tv_sec * 1000) + (tv.tv_usec / 1000) + RATE_LIMIT_PERIOD; > dom->next = NULL; > > dom->ring_ref = -1; > @@ -512,9 +529,12 @@ static void handle_ring_read(struct doma > if ((port = xc_evtchn_pending(dom->xce_handle)) == -1) > return; > > + dom->event_count++; > + > buffer_append(dom); > > - (void)xc_evtchn_unmask(dom->xce_handle, port); > + if (dom->event_count < RATE_LIMIT_ALLOWANCE) > + (void)xc_evtchn_unmask(dom->xce_handle, port); > } > > static void handle_xs(void) > @@ -549,6 +569,9 @@ void handle_io(void) > do { > struct domain *d, *n; > int max_fd = -1; > + struct timeval timeout; > + struct timeval tv; > + long long now, next_timeout = 0; > > FD_ZERO(&readfds); > FD_ZERO(&writefds); > @@ -556,8 +579,33 @@ void handle_io(void) > FD_SET(xs_fileno(xs), &readfds); > max_fd = MAX(xs_fileno(xs), max_fd); > > + if (gettimeofday(&tv, NULL) < 0) > + return; > + now = (tv.tv_sec * 1000) + (tv.tv_usec / 1000); > + > + /* Re-calculate any event counter allowances & unblock > + domains with new allowance */ > for (d = dom_head; d; d = d->next) { > - if (d->xce_handle != -1) { > + /* Add 5ms of fuzz since select() often returns > + a couple of ms sooner than requested. Without > + the fuzz we typically do an extra spin in select() > + with a 1/2 ms timeout every other iteration */ > + if ((now+5) > d->next_period) { > + d->next_period = now + RATE_LIMIT_PERIOD; > + if (d->event_count >= RATE_LIMIT_ALLOWANCE) { > + (void)xc_evtchn_unmask(d->xce_handle, d->local_port); > + } > + d->event_count = 0; > + } > + } > + > + for (d = dom_head; d; d = d->next) { > + if (d->event_count >= RATE_LIMIT_ALLOWANCE) { > + /* Determine if we''re going to be the next time slice to expire */ > + if (!next_timeout || > + d->next_period < next_timeout) > + next_timeout = d->next_period; > + } else if (d->xce_handle != -1) { > int evtchn_fd = xc_evtchn_fd(d->xce_handle); > FD_SET(evtchn_fd, &readfds); > max_fd = MAX(evtchn_fd, max_fd); > @@ -573,16 +621,29 @@ void handle_io(void) > } > } > > - ret = select(max_fd + 1, &readfds, &writefds, 0, NULL); > + /* If any domain has been rate limited, we need to work > + out what timeout to supply to select */ > + if (next_timeout) { > + long long duration = (next_timeout - now); > + /* Shouldn''t happen, but sanity check force greater than 0 */ > + if (duration <= 0) > + duration = 1; > + timeout.tv_sec = duration / 1000; > + timeout.tv_usec = (duration - (timeout.tv_sec * 1000)) * 1000; > + } > + > + ret = select(max_fd + 1, &readfds, &writefds, 0, next_timeout ? &timeout : NULL); > > if (FD_ISSET(xs_fileno(xs), &readfds)) > handle_xs(); > > for (d = dom_head; d; d = n) { > n = d->next; > - if (d->xce_handle != -1 && > - FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) > - handle_ring_read(d); > + if (d->event_count < RATE_LIMIT_ALLOWANCE) { > + if (d->xce_handle != -1 && > + FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) > + handle_ring_read(d); > + } > > if (d->tty_fd != -1 && FD_ISSET(d->tty_fd, &readfds)) > handle_tty_read(d);> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-04 14:19 UTC
Re: [Xen-devel] xenconsoled CPU denial of service problem
On 4/10/06 14:50, "Daniel P. Berrange" <berrange@redhat.com> wrote:> Hi Keir, > > Any feedback on this rate limiting patch for DomU consoles ?I''ll look more closely when we re-open development. I think we''ll get it in fairly quickly when that happens. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Oct-04 16:49 UTC
[Xen-devel] Re: xenconsoled CPU denial of service problem
Considering that today in Xen we have a default buffer size, it seems considerably easier to me to just get rid of xenconsoled completely and expand the domU-kernel ring queue to be the actual size of what we''re buffering today. This eliminates all of these problems and gets rid of a dom0 daemon. Plus, the domU gets taxed for the buffer memory instead of dom0. We would then change xenconsole to read the buffer directly. How does this sound for 3.0.4? Regards, Anthony Liguori Daniel P. Berrange wrote:> Hi Keir, > > Any feedback on this rate limiting patch for DomU consoles ? > > Regards, > Dan. > > On Mon, Sep 11, 2006 at 03:19:05PM +0100, Daniel P. Berrange wrote: >> On Wed, Aug 30, 2006 at 07:13:30PM +0100, Keir Fraser wrote: >>> On 29/8/06 4:59 pm, "Daniel P. Berrange" <berrange@redhat.com> wrote: >>> >>>> The patch sets >>>> - data size 5 kb >>>> - period 200 ms >>>> - delay 200 ms >>> A few comments: >>> * I think the ''delay'' parameter is not really useful. Think of this simply >>> as a simple credit-based scheduler that replenishes credit every ''period''. >>> So in this case you get 5kB every 200ms. If the domU offers more data in a >>> period, it must wait until its credit is replenished at the end of the >>> current 200ms period. >>> * I''m not sure bytes of data is the right thing to limit here. The main >>> thing that hoses domain0 is the repeated rescheduling of the console daemon, >>> and that is fundamentally caused by event-channel notifications. So it might >>> make sense to rate limit the number of times we read the event-channel port >>> from xc_evtchn_pending -- e.g., no more than 10 times a second (should be >>> plenty). This has a few advantages: 1. Looking just at data transferred >>> doesn''t stop a malicious domain from hosing you with no-op event-channel >>> notifications; 2. This is a fundamental metric that can be measured and >>> rate-limited on all backend interfaces, so perhaps we can come up with some >>> common library of code that we apply to all backends/daemons. >> I''ve re-worked the patch based on this principle of "n events allowed >> in each time-slice", setting n=30 & the time-slice = 200ms. The code >> was actually much simpler than my previous patch so its definitely a >> winning strategy. Testing by running >> >> ''while /bin/true ; do echo t > /proc/sysrq-trigger; done'' >> >> ..in one of the guest VMs on a 2.2 GHz Opteron, shows no significant CPU >> utilization attributed to xenconsoled. I''ve not examined whether this code >> can be put into a common library - it was simple enough to integrate in >> the xenconsoled event loop >> >>> It may turn out we need to rate limit on data *as well*, if it turns out >>> that sinking many kilobytes of data a second is prohibitively expensive, but >>> I doubt this will happen. For a start, the strict limiting of notifications >>> will encourage data to get queued up and improve batching of console data, >>> and batches of data should be processed quite efficiently. This same >>> argument extends to other backends (e.g., batching of requests in xenstored, >>> netback, blkback, etc). >> Based on initial testing it doesn''t look like the data rate itself was causing >> any significant overhead, once the event channel port reads were limited. >> >> >> Signed-off by: Daniel P. Berrange <berrange@redhat.com> >> >> Regards, >> Dan. >> -- >> |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| >> |=- Perl modules: http://search.cpan.org/~danberr/ -=| >> |=- Projects: http://freshmeat.net/~danielpb/ -=| >> |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| > >> diff -r bfd00b317815 tools/console/daemon/io.c >> --- a/tools/console/daemon/io.c Mon Sep 11 01:55:03 2006 +0100 >> +++ b/tools/console/daemon/io.c Mon Sep 11 10:09:32 2006 -0400 >> @@ -37,12 +37,18 @@ >> #include <termios.h> >> #include <stdarg.h> >> #include <sys/mman.h> >> +#include <sys/time.h> >> >> #define MAX(a, b) (((a) > (b)) ? (a) : (b)) >> #define MIN(a, b) (((a) < (b)) ? (a) : (b)) >> >> /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul terminator. */ >> #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 + 2) >> + >> +/* How many events are allowed in each time period */ >> +#define RATE_LIMIT_ALLOWANCE 30 >> +/* Duration of each time period in ms */ >> +#define RATE_LIMIT_PERIOD 200 >> >> struct buffer >> { >> @@ -65,6 +71,8 @@ struct domain >> evtchn_port_t local_port; >> int xce_handle; >> struct xencons_interface *interface; >> + int event_count; >> + long long next_period; >> }; >> >> static struct domain *dom_head; >> @@ -306,6 +314,13 @@ static struct domain *create_domain(int >> { >> struct domain *dom; >> char *s; >> + struct timeval tv; >> + >> + if (gettimeofday(&tv, NULL) < 0) { >> + dolog(LOG_ERR, "Cannot get time of day %s:%s:L%d", >> + __FILE__, __FUNCTION__, __LINE__); >> + return NULL; >> + } >> >> dom = (struct domain *)malloc(sizeof(struct domain)); >> if (dom == NULL) { >> @@ -330,6 +345,8 @@ static struct domain *create_domain(int >> dom->buffer.size = 0; >> dom->buffer.capacity = 0; >> dom->buffer.max_capacity = 0; >> + dom->event_count = 0; >> + dom->next_period = (tv.tv_sec * 1000) + (tv.tv_usec / 1000) + RATE_LIMIT_PERIOD; >> dom->next = NULL; >> >> dom->ring_ref = -1; >> @@ -512,9 +529,12 @@ static void handle_ring_read(struct doma >> if ((port = xc_evtchn_pending(dom->xce_handle)) == -1) >> return; >> >> + dom->event_count++; >> + >> buffer_append(dom); >> >> - (void)xc_evtchn_unmask(dom->xce_handle, port); >> + if (dom->event_count < RATE_LIMIT_ALLOWANCE) >> + (void)xc_evtchn_unmask(dom->xce_handle, port); >> } >> >> static void handle_xs(void) >> @@ -549,6 +569,9 @@ void handle_io(void) >> do { >> struct domain *d, *n; >> int max_fd = -1; >> + struct timeval timeout; >> + struct timeval tv; >> + long long now, next_timeout = 0; >> >> FD_ZERO(&readfds); >> FD_ZERO(&writefds); >> @@ -556,8 +579,33 @@ void handle_io(void) >> FD_SET(xs_fileno(xs), &readfds); >> max_fd = MAX(xs_fileno(xs), max_fd); >> >> + if (gettimeofday(&tv, NULL) < 0) >> + return; >> + now = (tv.tv_sec * 1000) + (tv.tv_usec / 1000); >> + >> + /* Re-calculate any event counter allowances & unblock >> + domains with new allowance */ >> for (d = dom_head; d; d = d->next) { >> - if (d->xce_handle != -1) { >> + /* Add 5ms of fuzz since select() often returns >> + a couple of ms sooner than requested. Without >> + the fuzz we typically do an extra spin in select() >> + with a 1/2 ms timeout every other iteration */ >> + if ((now+5) > d->next_period) { >> + d->next_period = now + RATE_LIMIT_PERIOD; >> + if (d->event_count >= RATE_LIMIT_ALLOWANCE) { >> + (void)xc_evtchn_unmask(d->xce_handle, d->local_port); >> + } >> + d->event_count = 0; >> + } >> + } >> + >> + for (d = dom_head; d; d = d->next) { >> + if (d->event_count >= RATE_LIMIT_ALLOWANCE) { >> + /* Determine if we''re going to be the next time slice to expire */ >> + if (!next_timeout || >> + d->next_period < next_timeout) >> + next_timeout = d->next_period; >> + } else if (d->xce_handle != -1) { >> int evtchn_fd = xc_evtchn_fd(d->xce_handle); >> FD_SET(evtchn_fd, &readfds); >> max_fd = MAX(evtchn_fd, max_fd); >> @@ -573,16 +621,29 @@ void handle_io(void) >> } >> } >> >> - ret = select(max_fd + 1, &readfds, &writefds, 0, NULL); >> + /* If any domain has been rate limited, we need to work >> + out what timeout to supply to select */ >> + if (next_timeout) { >> + long long duration = (next_timeout - now); >> + /* Shouldn''t happen, but sanity check force greater than 0 */ >> + if (duration <= 0) >> + duration = 1; >> + timeout.tv_sec = duration / 1000; >> + timeout.tv_usec = (duration - (timeout.tv_sec * 1000)) * 1000; >> + } >> + >> + ret = select(max_fd + 1, &readfds, &writefds, 0, next_timeout ? &timeout : NULL); >> >> if (FD_ISSET(xs_fileno(xs), &readfds)) >> handle_xs(); >> >> for (d = dom_head; d; d = n) { >> n = d->next; >> - if (d->xce_handle != -1 && >> - FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) >> - handle_ring_read(d); >> + if (d->event_count < RATE_LIMIT_ALLOWANCE) { >> + if (d->xce_handle != -1 && >> + FD_ISSET(xc_evtchn_fd(d->xce_handle), &readfds)) >> + handle_ring_read(d); >> + } >> >> if (d->tty_fd != -1 && FD_ISSET(d->tty_fd, &readfds)) >> handle_tty_read(d); > >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Oct-04 17:19 UTC
[Xen-devel] Re: xenconsoled CPU denial of service problem
On Wed, Oct 04, 2006 at 11:49:56AM -0500, Anthony Liguori wrote:> Considering that today in Xen we have a default buffer size, it seems > considerably easier to me to just get rid of xenconsoled completely and > expand the domU-kernel ring queue to be the actual size of what we''re > buffering today. > > This eliminates all of these problems and gets rid of a dom0 daemon. > Plus, the domU gets taxed for the buffer memory instead of dom0. > > We would then change xenconsole to read the buffer directly.Its very useful to be able to expose the data as a Psuedo-TTY, as it lets people use standard toolset for dealing the DomU log data. eg virt-manager can just connect up a VTE terminal widget straight to the TTY for a terminal UI. Or tools like ttywatch can log the data to file, or network, etc. Or minicom for a standard text based interactive client, etc Forcing everything to use the custom xenconsole client program would be a step backward. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Oct-04 17:42 UTC
[Xen-devel] Re: xenconsoled CPU denial of service problem
On 4/10/06 18:19, "Daniel P. Berrange" <berrange@redhat.com> wrote:>> Considering that today in Xen we have a default buffer size, it seems >> considerably easier to me to just get rid of xenconsoled completely and >> expand the domU-kernel ring queue to be the actual size of what we''re >> buffering today. >> >> This eliminates all of these problems and gets rid of a dom0 daemon. >> Plus, the domU gets taxed for the buffer memory instead of dom0. >> >> We would then change xenconsole to read the buffer directly. > > Its very useful to be able to expose the data as a Psuedo-TTY, as > it lets people use standard toolset for dealing the DomU log data. > eg virt-manager can just connect up a VTE terminal widget straight > to the TTY for a terminal UI. Or tools like ttywatch can log the > data to file, or network, etc. Or minicom for a standard text based > interactive client, etc Forcing everything to use the custom > xenconsole client program would be a step backward.There''s also the issue of backward compatibility with guests with a small inter-domain ring. And this doesn''t solve the fundamental problem of dom0 getting slammed with lots of event-channel notification. We need rate limiting anyhow, on all our inter-domain comms channels. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Oct-04 17:52 UTC
[Xen-devel] Re: xenconsoled CPU denial of service problem
Daniel P. Berrange wrote:> On Wed, Oct 04, 2006 at 11:49:56AM -0500, Anthony Liguori wrote: > >> Considering that today in Xen we have a default buffer size, it seems >> considerably easier to me to just get rid of xenconsoled completely and >> expand the domU-kernel ring queue to be the actual size of what we''re >> buffering today. >> >> This eliminates all of these problems and gets rid of a dom0 daemon. >> Plus, the domU gets taxed for the buffer memory instead of dom0. >> >> We would then change xenconsole to read the buffer directly. >> > > Its very useful to be able to expose the data as a Psuedo-TTY, as > it lets people use standard toolset for dealing the DomU log data. > eg virt-manager can just connect up a VTE terminal widget straight > to the TTY for a terminal UI. Or tools like ttywatch can log the > data to file, or network, etc. Or minicom for a standard text based > interactive client, etc Forcing everything to use the custom > xenconsole client program would be a step backward. >Xenconsole could still spit out on a PTY. You don''t necessarily need a daemon though (you could launch a xenconsole for each domain that was started). That also gives you a bit more choice in how you expose the console (you could have a xenconsole that spit out via TCP). Furthermore, we don''t have to break guest compatibility. Older guests just have a very small buffer. It doesn''t solve the event channel storming.. Maybe that''s something to do in the hypervisor? Regards, Anthony Liguori> Dan. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2006-Oct-04 18:10 UTC
[Xen-devel] Re: xenconsoled CPU denial of service problem
On Wed, Oct 04, 2006 at 12:52:52PM -0500, Anthony Liguori wrote:> Daniel P. Berrange wrote: > >On Wed, Oct 04, 2006 at 11:49:56AM -0500, Anthony Liguori wrote: > > > >>Considering that today in Xen we have a default buffer size, it seems > >>considerably easier to me to just get rid of xenconsoled completely and > >>expand the domU-kernel ring queue to be the actual size of what we''re > >>buffering today. > >> > >>This eliminates all of these problems and gets rid of a dom0 daemon. > >>Plus, the domU gets taxed for the buffer memory instead of dom0. > >> > >>We would then change xenconsole to read the buffer directly. > >> > > > >Its very useful to be able to expose the data as a Psuedo-TTY, as > >it lets people use standard toolset for dealing the DomU log data. > >eg virt-manager can just connect up a VTE terminal widget straight > >to the TTY for a terminal UI. Or tools like ttywatch can log the > >data to file, or network, etc. Or minicom for a standard text based > >interactive client, etc Forcing everything to use the custom > >xenconsole client program would be a step backward. > > > > Xenconsole could still spit out on a PTY. You don''t necessarily need a > daemon though (you could launch a xenconsole for each domain that was > started).The xenconsole would still need the rate-limiting, and once you''re launching one xenconsole per domain, where''s the gain over the single xenconsoled process ?> That also gives you a bit more choice in how you expose the console (you > could have a xenconsole that spit out via TCP).Given a TTY, there are already tools which can do this & more. So I don''t see any point in writing such functionality again for Xen. If using HVM domains one would already typically be exposing a serial console from the guest via a pseudo-TTY, so doing all PV console stuff via a TTY gives parity in the management toolset. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Oct-04 18:19 UTC
[Xen-devel] Re: xenconsoled CPU denial of service problem
Daniel P. Berrange wrote:>> Xenconsole could still spit out on a PTY. You don''t necessarily need a >> daemon though (you could launch a xenconsole for each domain that was >> started). >> > > The xenconsole would still need the rate-limiting, and once you''re launching > one xenconsole per domain, where''s the gain over the single xenconsoled > process ? >When the pty is writable, try to read information from ring queue. Only let the pty become readable when there is data in the queue ready to be ready. You only need to listen for event channel notifications when the queue is empty (so you cannot be stormed). Likewise, you don''t have to listen for event channel notifications when the queue is full, and you have data to write. The only time you could be DoS is when someone is attached to the PTY and actively reading/writing to it. I would then argue that it''s not xenconsole''s responsibility to rate limit. It''s whoever is reading or writing.>> That also gives you a bit more choice in how you expose the console (you >> could have a xenconsole that spit out via TCP). >> > > Given a TTY, there are already tools which can do this & more. So I don''t see > any point in writing such functionality again for Xen. If using HVM domains > one would already typically be exposing a serial console from the guest via > a pseudo-TTY, so doing all PV console stuff via a TTY gives parity in the > management toolset. >I''m happy exposing a pty for the console. I made those same arguments myself when we first did it :-) I''m simply suggesting that we move the buffering to domU. Regards, Anthony Liguori> Regards, > Dan. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel