Stefano Stabellini
2012-Jan-27 18:20 UTC
[PATCH v3 0/6] prevent Qemu from waking up needlessly
Hi all, this small patch series prevents Qemu from waking up needlessly on Xen several times a second in order to check some timers. The first patch stops Qemu from emulating the PIT on Xen, the second patch disables the rtc_clock entirely. The third patch makes use of a new mechanism to receive buffered io event notifications from Xen, so that Qemu doesn''t need to check the buffered io page for data 10 times a sec for the entire life of the VM. The fourth patch fixes win32_rearm_timer and mm_rearm_timer that risk an overflow if INT64_MAX is passed as delta. The fifth patch changes qemu_next_alarm_deadline only to check the expire time of a clock if it is enabled. Finally the last patch increases the default select timeout to 1h: nothing should rely on the select timeout to be 1sec, so we might as well increase it to 1h. Changes in v3: - added a new patch to fix win32_rearm_timer and mm_rearm_timer, that risk an overflow if INT64_MAX is passed as delta. Shortlog and diffstat follow: Stefano Stabellini (6): xen: do not initialize the interval timer emulator xen: disable rtc_clock xen: introduce an event channel for buffered io event notifications timers: the rearm function should be able to handle delta = INT64_MAX qemu_next_alarm_deadline: check the expire time of a clock only if it is enabled qemu_calculate_timeout: increase minimum timeout to 1h hw/pc.c | 7 +++++-- qemu-timer.c | 30 ++++++++++++++++++------------ xen-all.c | 43 +++++++++++++++++++++++++++++++++++++------ 3 files changed, 60 insertions(+), 20 deletions(-) A git tree available here: git://xenbits.xen.org/people/sstabellini/qemu-dm.git timers-3 Cheers, Stefano
Stefano Stabellini
2012-Jan-27 18:21 UTC
[PATCH v3 1/6] xen: do not initialize the interval timer emulator
PIT is emulated by the hypervisor so we don''t need to emulate it in Qemu: this patch prevents Qemu from waking up needlessly at PIT_FREQ on Xen. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- hw/pc.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 85304cf..7a7ce98 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -43,6 +43,7 @@ #include "ui/qemu-spice.h" #include "memory.h" #include "exec-memory.h" +#include "arch_init.h" /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -1130,7 +1131,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, DriveInfo *fd[MAX_FD]; qemu_irq rtc_irq = NULL; qemu_irq *a20_line; - ISADevice *i8042, *port92, *vmmouse, *pit; + ISADevice *i8042, *port92, *vmmouse, *pit = NULL; qemu_irq *cpu_exit_irq; register_ioport_write(0x80, 1, 1, ioport80_write, NULL); @@ -1151,7 +1152,9 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, qemu_register_boot_set(pc_boot_set, *rtc_state); - pit = pit_init(isa_bus, 0x40, 0); + if (!xen_available()) { + pit = pit_init(isa_bus, 0x40, 0); + } pcspk_init(pit); for(i = 0; i < MAX_SERIAL_PORTS; i++) { -- 1.7.2.5
rtc_clock is only used by the RTC emulator (mc146818rtc.c), however Xen has its own RTC emulator in the hypervisor so we can disable it. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen-all.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/xen-all.c b/xen-all.c index d1fc597..bf183f7 100644 --- a/xen-all.c +++ b/xen-all.c @@ -513,6 +513,7 @@ void xen_vcpu_init(void) qemu_register_reset(xen_reset_vcpu, first_cpu); xen_reset_vcpu(first_cpu); } + qemu_clock_enable(rtc_clock, false); } /* get the ioreq packets from share mem */ -- 1.7.2.5
Stefano Stabellini
2012-Jan-27 18:21 UTC
[PATCH v3 3/6] xen: introduce an event channel for buffered io event notifications
Use the newly introduced HVM_PARAM_BUFIOREQ_EVTCHN to receive notifications for buffered io events. After the first notification is received leave the event channel masked and setup a timer to process the rest of the batch. Once we have completed processing the batch, unmask the event channel and delete the timer. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen-all.c | 42 ++++++++++++++++++++++++++++++++++++------ 1 files changed, 36 insertions(+), 6 deletions(-) diff --git a/xen-all.c b/xen-all.c index bf183f7..bfec4c1 100644 --- a/xen-all.c +++ b/xen-all.c @@ -77,6 +77,8 @@ typedef struct XenIOState { QEMUTimer *buffered_io_timer; /* the evtchn port for polling the notification, */ evtchn_port_t *ioreq_local_port; + /* evtchn local port for buffered io */ + evtchn_port_t bufioreq_local_port; /* the evtchn fd for polling */ XenEvtchn xce_handle; /* which vcpu we are serving */ @@ -545,6 +547,12 @@ static ioreq_t *cpu_get_ioreq(XenIOState *state) evtchn_port_t port; port = xc_evtchn_pending(state->xce_handle); + if (port == state->bufioreq_local_port) { + qemu_mod_timer(state->buffered_io_timer, + BUFFER_IO_MAX_DELAY + qemu_get_clock_ms(rt_clock)); + return NULL; + } + if (port != -1) { for (i = 0; i < smp_cpus; i++) { if (state->ioreq_local_port[i] == port) { @@ -693,16 +701,18 @@ static void handle_ioreq(ioreq_t *req) } } -static void handle_buffered_iopage(XenIOState *state) +static int handle_buffered_iopage(XenIOState *state) { buf_ioreq_t *buf_req = NULL; ioreq_t req; int qw; if (!state->buffered_io_page) { - return; + return 0; } + memset(&req, 0x00, sizeof(req)); + while (state->buffered_io_page->read_pointer != state->buffered_io_page->write_pointer) { buf_req = &state->buffered_io_page->buf_ioreq[ state->buffered_io_page->read_pointer % IOREQ_BUFFER_SLOT_NUM]; @@ -727,15 +737,21 @@ static void handle_buffered_iopage(XenIOState *state) xen_mb(); state->buffered_io_page->read_pointer += qw ? 2 : 1; } + + return req.count; } static void handle_buffered_io(void *opaque) { XenIOState *state = opaque; - handle_buffered_iopage(state); - qemu_mod_timer(state->buffered_io_timer, - BUFFER_IO_MAX_DELAY + qemu_get_clock_ms(rt_clock)); + if (handle_buffered_iopage(state)) { + qemu_mod_timer(state->buffered_io_timer, + BUFFER_IO_MAX_DELAY + qemu_get_clock_ms(rt_clock)); + } else { + qemu_del_timer(state->buffered_io_timer); + xc_evtchn_unmask(state->xce_handle, state->bufioreq_local_port); + } } static void cpu_handle_ioreq(void *opaque) @@ -865,7 +881,6 @@ static void xen_main_loop_prepare(XenIOState *state) state->buffered_io_timer = qemu_new_timer_ms(rt_clock, handle_buffered_io, state); - qemu_mod_timer(state->buffered_io_timer, qemu_get_clock_ms(rt_clock)); if (evtchn_fd != -1) { qemu_set_fd_handler(evtchn_fd, cpu_handle_ioreq, NULL, state); @@ -917,6 +932,7 @@ int xen_hvm_init(void) { int i, rc; unsigned long ioreq_pfn; + unsigned long bufioreq_evtchn; XenIOState *state; state = g_malloc0(sizeof (XenIOState)); @@ -966,6 +982,20 @@ int xen_hvm_init(void) state->ioreq_local_port[i] = rc; } + rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN, + &bufioreq_evtchn); + if (rc < 0) { + fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n"); + return -1; + } + rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid, + (uint32_t)bufioreq_evtchn); + if (rc == -1) { + fprintf(stderr, "bind interdomain ioctl error %d\n", errno); + return -1; + } + state->bufioreq_local_port = rc; + /* Init RAM management */ xen_map_cache_init(); xen_ram_init(ram_size); -- 1.7.2.5
Stefano Stabellini
2012-Jan-27 18:21 UTC
[PATCH v3 4/6] timers: the rearm function should be able to handle delta = INT64_MAX
Fix win32_rearm_timer and mm_rearm_timer: they should be able to handle INT64_MAX as a delta parameter without overflowing. Also, the next deadline in ms should be calculated rounding down rather than up (see unix_rearm_timer and dynticks_rearm_timer). Finally ChangeTimerQueueTimer takes an unsigned long and timeSetEvent takes an unsigned int as delta, so cast the ms delta to the appropriate unsigned integer. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- qemu-timer.c | 18 +++++++++++++----- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index cd026c6..a9ba0eb 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -725,13 +725,17 @@ static void mm_stop_timer(struct qemu_alarm_timer *t) static void mm_rearm_timer(struct qemu_alarm_timer *t, int64_t delta) { - int nearest_delta_ms = (delta + 999999) / 1000000; + int64_t nearest_delta_ms = delta / 1000000; if (nearest_delta_ms < 1) { nearest_delta_ms = 1; } + /* UINT_MAX can be 32 bit */ + if (nearest_delta_ms > UINT_MAX) { + nearest_delta_ms = UINT_MAX; + } timeKillEvent(mm_timer); - mm_timer = timeSetEvent(nearest_delta_ms, + mm_timer = timeSetEvent((unsigned int) nearest_delta_ms, mm_period, mm_alarm_handler, (DWORD_PTR)t, @@ -786,16 +790,20 @@ static void win32_rearm_timer(struct qemu_alarm_timer *t, int64_t nearest_delta_ns) { HANDLE hTimer = t->timer; - int nearest_delta_ms; + int64_t nearest_delta_ms; BOOLEAN success; - nearest_delta_ms = (nearest_delta_ns + 999999) / 1000000; + nearest_delta_ms = nearest_delta_ns / 1000000; if (nearest_delta_ms < 1) { nearest_delta_ms = 1; } + /* ULONG_MAX can be 32 bit */ + if (nearest_delta_ms > ULONG_MAX) { + nearest_delta_ms = ULONG_MAX; + } success = ChangeTimerQueueTimer(NULL, hTimer, - nearest_delta_ms, + (unsigned long) nearest_delta_ms, 3600000); if (!success) { -- 1.7.2.5
Stefano Stabellini
2012-Jan-27 18:21 UTC
[PATCH v3 5/6] qemu_next_alarm_deadline: check the expire time of a clock only if it is enabled
Also delta in qemu_next_alarm_deadline is a 64 bit value so set the default to INT64_MAX instead of INT32_MAX. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- qemu-timer.c | 10 ++++------ 1 files changed, 4 insertions(+), 6 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index a9ba0eb..8c8bbc3 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -106,23 +106,21 @@ static inline int alarm_has_dynticks(struct qemu_alarm_timer *t) static int64_t qemu_next_alarm_deadline(void) { - int64_t delta; + int64_t delta = INT64_MAX; int64_t rtdelta; - if (!use_icount && vm_clock->active_timers) { + if (!use_icount && vm_clock->enabled && vm_clock->active_timers) { delta = vm_clock->active_timers->expire_time - qemu_get_clock_ns(vm_clock); - } else { - delta = INT32_MAX; } - if (host_clock->active_timers) { + if (host_clock->enabled && host_clock->active_timers) { int64_t hdelta = host_clock->active_timers->expire_time - qemu_get_clock_ns(host_clock); if (hdelta < delta) { delta = hdelta; } } - if (rt_clock->active_timers) { + if (rt_clock->enabled && rt_clock->active_timers) { rtdelta = (rt_clock->active_timers->expire_time - qemu_get_clock_ns(rt_clock)); if (rtdelta < delta) { -- 1.7.2.5
Stefano Stabellini
2012-Jan-27 18:21 UTC
[PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
There is no reason why the minimum timeout should be 1sec, it could easily be 1h and we would save lots of cpu cycles. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- qemu-timer.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index 8c8bbc3..3207e40 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -852,6 +852,6 @@ fail: int qemu_calculate_timeout(void) { - return 1000; + return 1000*60*60; } -- 1.7.2.5
Jan Kiszka
2012-Jan-27 19:09 UTC
Re: [PATCH v3 1/6] xen: do not initialize the interval timer emulator
On 2012-01-27 19:21, Stefano Stabellini wrote:> PIT is emulated by the hypervisor so we don''t need to emulate it in Qemu: > this patch prevents Qemu from waking up needlessly at PIT_FREQ on Xen. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > --- > hw/pc.c | 7 +++++-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/hw/pc.c b/hw/pc.c > index 85304cf..7a7ce98 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -43,6 +43,7 @@ > #include "ui/qemu-spice.h" > #include "memory.h" > #include "exec-memory.h" > +#include "arch_init.h" > > /* output Bochs bios info messages */ > //#define DEBUG_BIOS > @@ -1130,7 +1131,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, > DriveInfo *fd[MAX_FD]; > qemu_irq rtc_irq = NULL; > qemu_irq *a20_line; > - ISADevice *i8042, *port92, *vmmouse, *pit; > + ISADevice *i8042, *port92, *vmmouse, *pit = NULL; > qemu_irq *cpu_exit_irq; > > register_ioport_write(0x80, 1, 1, ioport80_write, NULL); > @@ -1151,7 +1152,9 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, > > qemu_register_boot_set(pc_boot_set, *rtc_state); > > - pit = pit_init(isa_bus, 0x40, 0); > + if (!xen_available()) { > + pit = pit_init(isa_bus, 0x40, 0); > + } > pcspk_init(pit); > > for(i = 0; i < MAX_SERIAL_PORTS; i++) {Thus as guest accessing to port 0x61 will be able to crash qemu because pit is NULL? Or do you emulate that port in the kernel? If not, you likely want to move pcspk_init() under the same umbrella. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
On 01/27/2012 07:21 PM, Stefano Stabellini wrote:> rtc_clock is only used by the RTC emulator (mc146818rtc.c), however Xen > has its own RTC emulator in the hypervisor so we can disable it. > > Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > --- > xen-all.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/xen-all.c b/xen-all.c > index d1fc597..bf183f7 100644 > --- a/xen-all.c > +++ b/xen-all.c > @@ -513,6 +513,7 @@ void xen_vcpu_init(void) > qemu_register_reset(xen_reset_vcpu, first_cpu); > xen_reset_vcpu(first_cpu); > } > + qemu_clock_enable(rtc_clock, false); > } > > /* get the ioreq packets from share mem */I explained why this is wrong. Paolo
Stefano Stabellini
2012-Jan-30 11:39 UTC
Re: [PATCH v3 1/6] xen: do not initialize the interval timer emulator
On Fri, 27 Jan 2012, Jan Kiszka wrote:> On 2012-01-27 19:21, Stefano Stabellini wrote: > > PIT is emulated by the hypervisor so we don''t need to emulate it in Qemu: > > this patch prevents Qemu from waking up needlessly at PIT_FREQ on Xen. > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > --- > > hw/pc.c | 7 +++++-- > > 1 files changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/hw/pc.c b/hw/pc.c > > index 85304cf..7a7ce98 100644 > > --- a/hw/pc.c > > +++ b/hw/pc.c > > @@ -43,6 +43,7 @@ > > #include "ui/qemu-spice.h" > > #include "memory.h" > > #include "exec-memory.h" > > +#include "arch_init.h" > > > > /* output Bochs bios info messages */ > > //#define DEBUG_BIOS > > @@ -1130,7 +1131,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, > > DriveInfo *fd[MAX_FD]; > > qemu_irq rtc_irq = NULL; > > qemu_irq *a20_line; > > - ISADevice *i8042, *port92, *vmmouse, *pit; > > + ISADevice *i8042, *port92, *vmmouse, *pit = NULL; > > qemu_irq *cpu_exit_irq; > > > > register_ioport_write(0x80, 1, 1, ioport80_write, NULL); > > @@ -1151,7 +1152,9 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, > > > > qemu_register_boot_set(pc_boot_set, *rtc_state); > > > > - pit = pit_init(isa_bus, 0x40, 0); > > + if (!xen_available()) { > > + pit = pit_init(isa_bus, 0x40, 0); > > + } > > pcspk_init(pit); > > > > for(i = 0; i < MAX_SERIAL_PORTS; i++) { > > Thus as guest accessing to port 0x61 will be able to crash qemu because > pit is NULL? Or do you emulate that port in the kernel? If not, you > likely want to move pcspk_init() under the same umbrella.We already emulate both pit and port 0x61 in xen so a guest won''t be able to crash qemu that easily :) But now that you make me think about it, it makes sense to move pcspk_init under the same if, like you suggested. Thanks, Stefano
On Fri, 27 Jan 2012, Paolo Bonzini wrote:> On 01/27/2012 07:21 PM, Stefano Stabellini wrote: > > rtc_clock is only used by the RTC emulator (mc146818rtc.c), however Xen > > has its own RTC emulator in the hypervisor so we can disable it. > > > > Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com> > > --- > > xen-all.c | 1 + > > 1 files changed, 1 insertions(+), 0 deletions(-) > > > > diff --git a/xen-all.c b/xen-all.c > > index d1fc597..bf183f7 100644 > > --- a/xen-all.c > > +++ b/xen-all.c > > @@ -513,6 +513,7 @@ void xen_vcpu_init(void) > > qemu_register_reset(xen_reset_vcpu, first_cpu); > > xen_reset_vcpu(first_cpu); > > } > > + qemu_clock_enable(rtc_clock, false); > > } > > > > /* get the ioreq packets from share mem */ > > I explained why this is wrong.Thanks for reminding me, I lost your previous email as I wasn''t CC''ed...
Jan Kiszka
2012-Jan-30 15:13 UTC
Re: [PATCH v3 1/6] xen: do not initialize the interval timer emulator
On 2012-01-30 12:39, Stefano Stabellini wrote:> On Fri, 27 Jan 2012, Jan Kiszka wrote: >> On 2012-01-27 19:21, Stefano Stabellini wrote: >>> PIT is emulated by the hypervisor so we don''t need to emulate it in Qemu: >>> this patch prevents Qemu from waking up needlessly at PIT_FREQ on Xen. >>> >>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> >>> --- >>> hw/pc.c | 7 +++++-- >>> 1 files changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/pc.c b/hw/pc.c >>> index 85304cf..7a7ce98 100644 >>> --- a/hw/pc.c >>> +++ b/hw/pc.c >>> @@ -43,6 +43,7 @@ >>> #include "ui/qemu-spice.h" >>> #include "memory.h" >>> #include "exec-memory.h" >>> +#include "arch_init.h" >>> >>> /* output Bochs bios info messages */ >>> //#define DEBUG_BIOS >>> @@ -1130,7 +1131,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, >>> DriveInfo *fd[MAX_FD]; >>> qemu_irq rtc_irq = NULL; >>> qemu_irq *a20_line; >>> - ISADevice *i8042, *port92, *vmmouse, *pit; >>> + ISADevice *i8042, *port92, *vmmouse, *pit = NULL; >>> qemu_irq *cpu_exit_irq; >>> >>> register_ioport_write(0x80, 1, 1, ioport80_write, NULL); >>> @@ -1151,7 +1152,9 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, >>> >>> qemu_register_boot_set(pc_boot_set, *rtc_state); >>> >>> - pit = pit_init(isa_bus, 0x40, 0); >>> + if (!xen_available()) { >>> + pit = pit_init(isa_bus, 0x40, 0); >>> + } >>> pcspk_init(pit); >>> >>> for(i = 0; i < MAX_SERIAL_PORTS; i++) { >> >> Thus as guest accessing to port 0x61 will be able to crash qemu because >> pit is NULL? Or do you emulate that port in the kernel? If not, you >> likely want to move pcspk_init() under the same umbrella. > > We already emulate both pit and port 0x61 in xen so a guest won''t be > able to crash qemu that easily :)Which, btw, most likely breaks sound output via the speaker. We used to fake 0x61 in the kernel as well, but now we properly emulated it in user space again (well, upcoming qemu patches will, qemu-kvm is broken in this regard).> But now that you make me think about it, it makes sense to move > pcspk_init under the same if, like you suggested.Provided there is no use for user space, this would be consistent. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Paul Brook
2012-Feb-10 00:26 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
> There is no reason why the minimum timeout should be 1sec, it could > easily be 1h and we would save lots of cpu cycles.No. The reason we have this is because there are bits of code that rely on polling. IIRC slirp and the floppy DMA engine were the main culprits. qemu_calculate_timeout is an ugly hack to poll at least once a second, allowing the guest to make forward progress when we miss an event. If you think you''ve fixed all those polling places then you should remove the timeout altogether and block indefinitely. A 1h timeout is almost certainly not the right answer. Paul
Paolo Bonzini
2012-Feb-10 08:03 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On 02/10/2012 01:26 AM, Paul Brook wrote:> The reason we have this is because there are bits of code that rely on > polling. IIRC slirp and the floppy DMA engine were the main culprits. > qemu_calculate_timeout is an ugly hack to poll at least once a second, > allowing the guest to make forward progress when we miss an event.At least the floppy DMA engine is fine with it, it uses idle bottom halves (which are a hack and could be replaced by timers, but that''s not relevant now). Slirp''s timeouts indeed require polling. if (time_fasttimo && ((curtime - time_fasttimo) >= 2)) { tcp_fasttimo(slirp); time_fasttimo = 0; } if (do_slowtimo && ((curtime - last_slowtimo) >= 499)) { ip_slowtimo(slirp); tcp_slowtimo(slirp); last_slowtimo = curtime; } Paolo
Paul Brook
2012-Feb-10 09:52 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
> On 02/10/2012 01:26 AM, Paul Brook wrote: > > The reason we have this is because there are bits of code that rely on > > polling. IIRC slirp and the floppy DMA engine were the main culprits. > > qemu_calculate_timeout is an ugly hack to poll at least once a second, > > allowing the guest to make forward progress when we miss an event. > > At least the floppy DMA engine is fine with it, it uses idle bottom > halves (which are a hack and could be replaced by timers, but that''s not > relevant now).I thought idle bottom halves were one of the things that made this timout necessary. How else are they going to get run? Paul
Paolo Bonzini
2012-Feb-10 10:45 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On 02/10/2012 10:52 AM, Paul Brook wrote:>> > At least the floppy DMA engine is fine with it, it uses idle bottom >> > halves (which are a hack and could be replaced by timers, but that''s not >> > relevant now). > I thought idle bottom halves were one of the things that made this timout > necessary. How else are they going to get run?The timeout is reduced to 10 ms when an idle bottom half is scheduled. See qemu_bh_update_timeout in async.c. Paolo
Paul Brook
2012-Feb-10 11:09 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
> >> > At least the floppy DMA engine is fine with it, it uses idle bottom > >> > halves (which are a hack and could be replaced by timers, but that''s > >> > not relevant now). > > > > I thought idle bottom halves were one of the things that made this timout > > necessary. How else are they going to get run? > > The timeout is reduced to 10 ms when an idle bottom half is scheduled. > See qemu_bh_update_timeout in async.c.Ah, I see. Idle BH are indeed a nasty hack that should be removed, but not directly relevant to this 1s timeout. I don''t think this changes my overall conlusion: Either we need this timeout to poll below the user-thinks-qemu-died threshold, or we should be blocking indefinitely. Paul
Paolo Bonzini
2012-Feb-10 11:18 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On 02/10/2012 12:19 PM, Stefano Stabellini wrote:> I think you are right and the right thing to do would be blocking > indefinitely. > However if slirp doesn''t support it, we could have a timeout of 1000 if > CONFIG_SLIRP, otherwise block indefinitely.You could add a similar hack to qemu_bh_update_timeout for slirp_update_timeout. Paolo
Stefano Stabellini
2012-Feb-10 11:19 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On Fri, 10 Feb 2012, Paul Brook wrote:> > >> > At least the floppy DMA engine is fine with it, it uses idle bottom > > >> > halves (which are a hack and could be replaced by timers, but that''s > > >> > not relevant now). > > > > > > I thought idle bottom halves were one of the things that made this timout > > > necessary. How else are they going to get run? > > > > The timeout is reduced to 10 ms when an idle bottom half is scheduled. > > See qemu_bh_update_timeout in async.c. > > Ah, I see. Idle BH are indeed a nasty hack that should be removed, but not > directly relevant to this 1s timeout. > > I don''t think this changes my overall conlusion: Either we need this timeout > to poll below the user-thinks-qemu-died threshold, or we should be blocking > indefinitely.I think you are right and the right thing to do would be blocking indefinitely. However if slirp doesn''t support it, we could have a timeout of 1000 if CONFIG_SLIRP, otherwise block indefinitely.
Jan Kiszka
2012-Feb-10 11:32 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On 2012-02-10 12:18, Paolo Bonzini wrote:> On 02/10/2012 12:19 PM, Stefano Stabellini wrote: >> I think you are right and the right thing to do would be blocking >> indefinitely. >> However if slirp doesn''t support it, we could have a timeout of 1000 if >> CONFIG_SLIRP, otherwise block indefinitely. > > You could add a similar hack to qemu_bh_update_timeout for > slirp_update_timeout.Real solutions would be preferred, but I know that the code is hairy. In any case, please no CONFIG_SLIRP code forks. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Stefano Stabellini
2012-Feb-10 17:07 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On Fri, 10 Feb 2012, Jan Kiszka wrote:> On 2012-02-10 12:18, Paolo Bonzini wrote: > > On 02/10/2012 12:19 PM, Stefano Stabellini wrote: > >> I think you are right and the right thing to do would be blocking > >> indefinitely. > >> However if slirp doesn''t support it, we could have a timeout of 1000 if > >> CONFIG_SLIRP, otherwise block indefinitely. > > > > You could add a similar hack to qemu_bh_update_timeout for > > slirp_update_timeout. > > Real solutions would be preferred, but I know that the code is hairy. In > any case, please no CONFIG_SLIRP code forks.I tried to follow your suggestions, what do you guys think about the approach of the patch below? --- main_loop_wait: block indefinitely - remove qemu_calculate_timeout; - explicitly size timeout to uint32_t; - introduce slirp_update_timeout; - pass NULL as timeout argument to select in case timeout is the maximum value; Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> diff --git a/async.c b/async.c index 332d511..ecdaf15 100644 --- a/async.c +++ b/async.c @@ -120,7 +120,7 @@ void qemu_bh_delete(QEMUBH *bh) bh->deleted = 1; } -void qemu_bh_update_timeout(int *timeout) +void qemu_bh_update_timeout(uint32_t *timeout) { QEMUBH *bh; diff --git a/main-loop.c b/main-loop.c index 692381c..4a105e9 100644 --- a/main-loop.c +++ b/main-loop.c @@ -366,7 +366,7 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque) } } -static void os_host_main_loop_wait(int *timeout) +static void os_host_main_loop_wait(uint32_t *timeout) { int ret, ret2, i; PollingEntry *pe; @@ -410,7 +410,7 @@ static void os_host_main_loop_wait(int *timeout) *timeout = 0; } #else -static inline void os_host_main_loop_wait(int *timeout) +static inline void os_host_main_loop_wait(uint32_t *timeout) { } #endif @@ -419,21 +419,17 @@ int main_loop_wait(int nonblocking) { fd_set rfds, wfds, xfds; int ret, nfds; - struct timeval tv; - int timeout; + struct timeval tv, *tvarg = NULL; + uint32_t timeout = UINT32_MAX; if (nonblocking) { timeout = 0; } else { - timeout = qemu_calculate_timeout(); qemu_bh_update_timeout(&timeout); } os_host_main_loop_wait(&timeout); - tv.tv_sec = timeout / 1000; - tv.tv_usec = (timeout % 1000) * 1000; - /* poll any events */ /* XXX: separate device handlers from system ones */ nfds = -1; @@ -442,16 +438,23 @@ int main_loop_wait(int nonblocking) FD_ZERO(&xfds); #ifdef CONFIG_SLIRP + slirp_update_timeout(&timeout); slirp_select_fill(&nfds, &rfds, &wfds, &xfds); #endif qemu_iohandler_fill(&nfds, &rfds, &wfds, &xfds); glib_select_fill(&nfds, &rfds, &wfds, &xfds, &tv); + if (timeout < UINT32_MAX) { + tvarg = &tv; + tv.tv_sec = timeout / 1000; + tv.tv_usec = (timeout % 1000) * 1000; + } + if (timeout > 0) { qemu_mutex_unlock_iothread(); } - ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); + ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg); if (timeout > 0) { qemu_mutex_lock_iothread(); diff --git a/main-loop.h b/main-loop.h index f971013..22c0dc9 100644 --- a/main-loop.h +++ b/main-loop.h @@ -352,6 +352,6 @@ void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int rc void qemu_bh_schedule_idle(QEMUBH *bh); int qemu_bh_poll(void); -void qemu_bh_update_timeout(int *timeout); +void qemu_bh_update_timeout(uint32_t *timeout); #endif diff --git a/qemu-timer.c b/qemu-timer.c index de20852..3e1ce08 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -821,8 +821,3 @@ fail: return err; } -int qemu_calculate_timeout(void) -{ - return 1000; -} - diff --git a/qemu-timer.h b/qemu-timer.h index de17f3b..f1386d5 100644 --- a/qemu-timer.h +++ b/qemu-timer.h @@ -62,7 +62,6 @@ uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts); void qemu_run_all_timers(void); int qemu_alarm_pending(void); void configure_alarms(char const *opt); -int qemu_calculate_timeout(void); void init_clocks(void); int init_timer_alarm(void); diff --git a/slirp/libslirp.h b/slirp/libslirp.h index 890fd86..6af7534 100644 --- a/slirp/libslirp.h +++ b/slirp/libslirp.h @@ -42,4 +42,13 @@ void slirp_socket_recv(Slirp *slirp, struct in_addr guest_addr, size_t slirp_socket_can_recv(Slirp *slirp, struct in_addr guest_addr, int guest_port); +#ifdef CONFIG_SLIRP +static inline void slirp_update_timeout(uint32_t *timeout) +{ + *timeout = MIN(1000, *timeout); +} +#else +static inline void slirp_update_timeout(uint32_t *timeout) { } +#endif + #endif
Paul Brook
2012-Feb-10 23:34 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
> +#ifdef CONFIG_SLIRP > +static inline void slirp_update_timeout(uint32_t *timeout) > +{ > + *timeout = MIN(1000, *timeout); > +} > +#else > +static inline void slirp_update_timeout(uint32_t *timeout) { } > +#endifShouldn''t we be testing whether slirp is actually in use? I doubt many people go to the effort of rebuilding without SLIRP support. Paul
Stefano Stabellini
2012-Feb-13 11:52 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On Fri, 10 Feb 2012, Paul Brook wrote:> > +#ifdef CONFIG_SLIRP > > +static inline void slirp_update_timeout(uint32_t *timeout) > > +{ > > + *timeout = MIN(1000, *timeout); > > +} > > +#else > > +static inline void slirp_update_timeout(uint32_t *timeout) { } > > +#endif > > Shouldn''t we be testing whether slirp is actually in use? I doubt many people > go to the effort of rebuilding without SLIRP support.Yes, you are right. Also considering that we are only calling slirp_update_timeout if CONFIG_SLIRP is defined, there is no need for the !CONFIG_SLIRP dummy version of the function. --- commit 3a89477edc7e551c93b016940d2fdad9ebc22a84 Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Date: Mon Feb 13 11:25:03 2012 +0000 main_loop_wait: block indefinitely - remove qemu_calculate_timeout; - explicitly size timeout to uint32_t; - introduce slirp_update_timeout; - pass NULL as timeout argument to select in case timeout is the maximum value; Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> diff --git a/async.c b/async.c index 332d511..ecdaf15 100644 --- a/async.c +++ b/async.c @@ -120,7 +120,7 @@ void qemu_bh_delete(QEMUBH *bh) bh->deleted = 1; } -void qemu_bh_update_timeout(int *timeout) +void qemu_bh_update_timeout(uint32_t *timeout) { QEMUBH *bh; diff --git a/main-loop.c b/main-loop.c index 692381c..4a105e9 100644 --- a/main-loop.c +++ b/main-loop.c @@ -366,7 +366,7 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque) } } -static void os_host_main_loop_wait(int *timeout) +static void os_host_main_loop_wait(uint32_t *timeout) { int ret, ret2, i; PollingEntry *pe; @@ -410,7 +410,7 @@ static void os_host_main_loop_wait(int *timeout) *timeout = 0; } #else -static inline void os_host_main_loop_wait(int *timeout) +static inline void os_host_main_loop_wait(uint32_t *timeout) { } #endif @@ -419,21 +419,17 @@ int main_loop_wait(int nonblocking) { fd_set rfds, wfds, xfds; int ret, nfds; - struct timeval tv; - int timeout; + struct timeval tv, *tvarg = NULL; + uint32_t timeout = UINT32_MAX; if (nonblocking) { timeout = 0; } else { - timeout = qemu_calculate_timeout(); qemu_bh_update_timeout(&timeout); } os_host_main_loop_wait(&timeout); - tv.tv_sec = timeout / 1000; - tv.tv_usec = (timeout % 1000) * 1000; - /* poll any events */ /* XXX: separate device handlers from system ones */ nfds = -1; @@ -442,16 +438,23 @@ int main_loop_wait(int nonblocking) FD_ZERO(&xfds); #ifdef CONFIG_SLIRP + slirp_update_timeout(&timeout); slirp_select_fill(&nfds, &rfds, &wfds, &xfds); #endif qemu_iohandler_fill(&nfds, &rfds, &wfds, &xfds); glib_select_fill(&nfds, &rfds, &wfds, &xfds, &tv); + if (timeout < UINT32_MAX) { + tvarg = &tv; + tv.tv_sec = timeout / 1000; + tv.tv_usec = (timeout % 1000) * 1000; + } + if (timeout > 0) { qemu_mutex_unlock_iothread(); } - ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); + ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg); if (timeout > 0) { qemu_mutex_lock_iothread(); diff --git a/main-loop.h b/main-loop.h index f971013..22c0dc9 100644 --- a/main-loop.h +++ b/main-loop.h @@ -352,6 +352,6 @@ void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int rc void qemu_bh_schedule_idle(QEMUBH *bh); int qemu_bh_poll(void); -void qemu_bh_update_timeout(int *timeout); +void qemu_bh_update_timeout(uint32_t *timeout); #endif diff --git a/qemu-timer.c b/qemu-timer.c index de20852..3e1ce08 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -821,8 +821,3 @@ fail: return err; } -int qemu_calculate_timeout(void) -{ - return 1000; -} - diff --git a/qemu-timer.h b/qemu-timer.h index de17f3b..f1386d5 100644 --- a/qemu-timer.h +++ b/qemu-timer.h @@ -62,7 +62,6 @@ uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts); void qemu_run_all_timers(void); int qemu_alarm_pending(void); void configure_alarms(char const *opt); -int qemu_calculate_timeout(void); void init_clocks(void); int init_timer_alarm(void); diff --git a/qemu-tool.c b/qemu-tool.c index 6b69668..76830b7 100644 --- a/qemu-tool.c +++ b/qemu-tool.c @@ -90,6 +90,10 @@ static void __attribute__((constructor)) init_main_loop(void) qemu_clock_enable(vm_clock, false); } +void slirp_update_timeout(uint32_t *timeout) +{ +} + void slirp_select_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set *xfds) { diff --git a/slirp/libslirp.h b/slirp/libslirp.h index 890fd86..77527ad 100644 --- a/slirp/libslirp.h +++ b/slirp/libslirp.h @@ -15,6 +15,7 @@ Slirp *slirp_init(int restricted, struct in_addr vnetwork, struct in_addr vnameserver, void *opaque); void slirp_cleanup(Slirp *slirp); +void slirp_update_timeout(uint32_t *timeout); void slirp_select_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set *xfds); diff --git a/slirp/slirp.c b/slirp/slirp.c index 19d69eb..418f8d0 100644 --- a/slirp/slirp.c +++ b/slirp/slirp.c @@ -255,6 +255,13 @@ void slirp_cleanup(Slirp *slirp) #define CONN_CANFRCV(so) (((so)->so_state & (SS_FCANTRCVMORE|SS_ISFCONNECTED)) == SS_ISFCONNECTED) #define UPD_NFDS(x) if (nfds < (x)) nfds = (x) +void slirp_update_timeout(uint32_t *timeout) +{ + if (!QTAILQ_EMPTY(&slirp_instances)) { + *timeout = MIN(1000, *timeout); + } +} + void slirp_select_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set *xfds) {
Paul Brook
2012-Feb-14 10:52 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
> Yes, you are right. Also considering that we are only calling > slirp_update_timeout if CONFIG_SLIRP is defined, there is no need for > the !CONFIG_SLIRP dummy version of the function.Looks reasonable to me. Unfortunately I can''t remember which combination of headless VM and timer configs caused hangs when this was originally added. If anyone has a setup that suffered timeout-related hangs last time we made this change, please retest with this patch. Otherwise I guess we apply the patch and hope we didn''t miss anything.> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > Date: Mon Feb 13 11:25:03 2012 +0000 > > main_loop_wait: block indefinitely > > - remove qemu_calculate_timeout; > > - explicitly size timeout to uint32_t; > > - introduce slirp_update_timeout; > > - pass NULL as timeout argument to select in case timeout is the > maximum value; > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>Acked-by: Paul Brook <paul@codesourcery.com>
Stefano Stabellini
2012-Feb-14 11:55 UTC
Re: [PATCH v3 6/6] qemu_calculate_timeout: increase minimum timeout to 1h
On Tue, 14 Feb 2012, Paul Brook wrote:> > Yes, you are right. Also considering that we are only calling > > slirp_update_timeout if CONFIG_SLIRP is defined, there is no need for > > the !CONFIG_SLIRP dummy version of the function. > > Looks reasonable to me. Unfortunately I can''t remember which combination of > headless VM and timer configs caused hangs when this was originally added. > > If anyone has a setup that suffered timeout-related hangs last time we made > this change, please retest with this patch. Otherwise I guess we apply the > patch and hope we didn''t miss anything.OK, thanks. I''ll resend the patch as part of the series and ask for testing.> > Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > Date: Mon Feb 13 11:25:03 2012 +0000 > > > > main_loop_wait: block indefinitely > > > > - remove qemu_calculate_timeout; > > > > - explicitly size timeout to uint32_t; > > > > - introduce slirp_update_timeout; > > > > - pass NULL as timeout argument to select in case timeout is the > > maximum value; > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > Acked-by: Paul Brook <paul@codesourcery.com> >