thr3ads.net - libvirt users - Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT

If this information is useful, please help other people find it:
Share via:

Daniel P. Berrange

2017-Feb-15 09:40 UTC

Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

On Wed, Feb 15, 2017 at 10:27:46AM +0100, Michal Privoznik
wrote:> On 02/15/2017 03:43 AM, Blair Bethwaite wrote:
> > On 15 February 2017 at 00:57, Daniel P. Berrange
<berrange@redhat.com> wrote:
> >> What is the actual error you're getting during startup.
> > 
> > # virsh -d0 start instance-0000037c
> > start: domain(optdata): instance-0000037c
> > start: found option <domain>: instance-0000037c
> > start: <domain> trying as domain NAME
> > error: Failed to start domain instance-0000037c
> > error: monitor socket did not show up: No such file or directory
> > 
> > Full libvirtd debug log at
> > https://gist.github.com/bmb/08fbb6b6136c758d027e90ff139d5701
> > 
> > On 15 February 2017 at 00:47, Michal Privoznik
<mprivozn@redhat.com> wrote:
> >> I don't think I understand this. Who is running the other job?
I mean,
> >> I'd expect qemu fail to create the socket and thus hitting 30s
timeout
> >> in qemuMonitorOpenUnix().
> > 
> > Yes you're right, I just blindly started looking for 30s constants
in
> > the code and that one seemed the most obvious but I had not tried to
> > trace it all the way back to the domain start job or checked the debug
> > logs yet, sorry. So looking a bit more carefully I see the real issue
> > is in src/qemu/qemu_monitor.c:
> > 
> > 321 static int
> > 322 qemuMonitorOpenUnix(const char *monitor, pid_t cpid)
> > 323 {
> > 324     struct sockaddr_un addr;
> > 325     int monfd;
> > 326     int timeout = 30; /* In seconds */
> > 
> > Is this safe to increase? Is there any reason to keep it at 30s given
> > (from what I'm seeing on a fast 2-socket Haswell system) that
hugepage
> > backed guests larger than ~160GB memory will not be able to start in
> > that time?
> > 
> 
> I recall some similar discussion took place in the past. But I just
> cannot find it now. I think the problem was that kernel is zeroing the
> pages on huge page allocation. Anyway, this timeout used to be 3 seconds
> and inly in fe89b687a0 it has been changed to 30 seconds.
> 
> We can increase the limit, but that would solve just this case until
> somebody tries to assign even more RAM to their domain. What if we would
> instead make this configurable? Have yet another variable living inside
> qemu.conf that by default has value of 30 and specifies how long should
> libvirt wait for qemu monitor to show up?
> 
> But frankly, on one hand I like this approach. But on the other I
> dislike it at the same time - we have just too much variables in
> qemu.conf because that's our answer to problems like these. We
don't
> know so we offload the setting to the sys admin.
Honestly it is well overdue for us to come up with an improvement to
QEMU that lets us start QEMU & open the monitor in a race-free manner.
The obvious answer to this is to allow us to pass down a pre-opened
UNIX listener socket FD to QEMU. We can thus connect() immediately
with no race and then simply away the QMP greeting with no timeout,
safely getting EOF if QEMU fails to start.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Blair Bethwaite

2017-Feb-15 13:03 UTC

head link

Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

On 15 February 2017 at 20:40, Daniel P. Berrange <berrange@redhat.com>
wrote:> On Wed, Feb 15, 2017 at 10:27:46AM +0100, Michal Privoznik wrote:
>> On 02/15/2017 03:43 AM, Blair Bethwaite wrote:
>> > On 15 February 2017 at 00:57, Daniel P. Berrange
<berrange@redhat.com> wrote:
>> >> What is the actual error you're getting during startup.
>> >
>> > # virsh -d0 start instance-0000037c
>> > start: domain(optdata): instance-0000037c
>> > start: found option <domain>: instance-0000037c
>> > start: <domain> trying as domain NAME
>> > error: Failed to start domain instance-0000037c
>> > error: monitor socket did not show up: No such file or directory
>> >
>> > Full libvirtd debug log at
>> > https://gist.github.com/bmb/08fbb6b6136c758d027e90ff139d5701
>> >
>> > On 15 February 2017 at 00:47, Michal Privoznik
<mprivozn@redhat.com> wrote:
>> >> I don't think I understand this. Who is running the other
job? I mean,
>> >> I'd expect qemu fail to create the socket and thus hitting
30s timeout
>> >> in qemuMonitorOpenUnix().
>> >
>> > Yes you're right, I just blindly started looking for 30s
constants in
>> > the code and that one seemed the most obvious but I had not tried
to
>> > trace it all the way back to the domain start job or checked the
debug
>> > logs yet, sorry. So looking a bit more carefully I see the real
issue
>> > is in src/qemu/qemu_monitor.c:
>> >
>> > 321 static int
>> > 322 qemuMonitorOpenUnix(const char *monitor, pid_t cpid)
>> > 323 {
>> > 324     struct sockaddr_un addr;
>> > 325     int monfd;
>> > 326     int timeout = 30; /* In seconds */
>> >
>> > Is this safe to increase? Is there any reason to keep it at 30s
given
>> > (from what I'm seeing on a fast 2-socket Haswell system) that
hugepage
>> > backed guests larger than ~160GB memory will not be able to start
in
>> > that time?
>> >
>>
>> I recall some similar discussion took place in the past. But I just
>> cannot find it now. I think the problem was that kernel is zeroing the
>> pages on huge page allocation. Anyway, this timeout used to be 3
seconds
>> and inly in fe89b687a0 it has been changed to 30 seconds.
>>
>> We can increase the limit, but that would solve just this case until
>> somebody tries to assign even more RAM to their domain. What if we
would
>> instead make this configurable? Have yet another variable living inside
>> qemu.conf that by default has value of 30 and specifies how long should
>> libvirt wait for qemu monitor to show up?
>>
>> But frankly, on one hand I like this approach. But on the other I
>> dislike it at the same time - we have just too much variables in
>> qemu.conf because that's our answer to problems like these. We
don't
>> know so we offload the setting to the sys admin.
>
> Honestly it is well overdue for us to come up with an improvement to
> QEMU that lets us start QEMU & open the monitor in a race-free manner.
> The obvious answer to this is to allow us to pass down a pre-opened
> UNIX listener socket FD to QEMU. We can thus connect() immediately
> with no race and then simply away the QMP greeting with no timeout,
> safely getting EOF if QEMU fails to start.
Wish I could volunteer to work on that but am afraid my day job has me
now thinking about building a custom package to work around this for
the moment, or even attempting to find the right hexedit against the
existing shared object o_0... probably a line of thinking I should
squash now. Would it be helpful to have this registered as a customer
RFE with Red Hat?

Cheers,
~Blairo

Daniel P. Berrange

2017-Feb-15 13:06 UTC

head link

Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

On Thu, Feb 16, 2017 at 12:03:28AM +1100, Blair Bethwaite
wrote:> On 15 February 2017 at 20:40, Daniel P. Berrange
<berrange@redhat.com> wrote:
> > On Wed, Feb 15, 2017 at 10:27:46AM +0100, Michal Privoznik wrote:
> >> On 02/15/2017 03:43 AM, Blair Bethwaite wrote:
> >> > On 15 February 2017 at 00:57, Daniel P. Berrange
<berrange@redhat.com> wrote:
> >> >> What is the actual error you're getting during
startup.
> >> >
> >> > # virsh -d0 start instance-0000037c
> >> > start: domain(optdata): instance-0000037c
> >> > start: found option <domain>: instance-0000037c
> >> > start: <domain> trying as domain NAME
> >> > error: Failed to start domain instance-0000037c
> >> > error: monitor socket did not show up: No such file or
directory
> >> >
> >> > Full libvirtd debug log at
> >> > https://gist.github.com/bmb/08fbb6b6136c758d027e90ff139d5701
> >> >
> >> > On 15 February 2017 at 00:47, Michal Privoznik
<mprivozn@redhat.com> wrote:
> >> >> I don't think I understand this. Who is running the
other job? I mean,
> >> >> I'd expect qemu fail to create the socket and thus
hitting 30s timeout
> >> >> in qemuMonitorOpenUnix().
> >> >
> >> > Yes you're right, I just blindly started looking for 30s
constants in
> >> > the code and that one seemed the most obvious but I had not
tried to
> >> > trace it all the way back to the domain start job or checked
the debug
> >> > logs yet, sorry. So looking a bit more carefully I see the
real issue
> >> > is in src/qemu/qemu_monitor.c:
> >> >
> >> > 321 static int
> >> > 322 qemuMonitorOpenUnix(const char *monitor, pid_t cpid)
> >> > 323 {
> >> > 324     struct sockaddr_un addr;
> >> > 325     int monfd;
> >> > 326     int timeout = 30; /* In seconds */
> >> >
> >> > Is this safe to increase? Is there any reason to keep it at
30s given
> >> > (from what I'm seeing on a fast 2-socket Haswell system)
that hugepage
> >> > backed guests larger than ~160GB memory will not be able to
start in
> >> > that time?
> >> >
> >>
> >> I recall some similar discussion took place in the past. But I
just
> >> cannot find it now. I think the problem was that kernel is zeroing
the
> >> pages on huge page allocation. Anyway, this timeout used to be 3
seconds
> >> and inly in fe89b687a0 it has been changed to 30 seconds.
> >>
> >> We can increase the limit, but that would solve just this case
until
> >> somebody tries to assign even more RAM to their domain. What if we
would
> >> instead make this configurable? Have yet another variable living
inside
> >> qemu.conf that by default has value of 30 and specifies how long
should
> >> libvirt wait for qemu monitor to show up?
> >>
> >> But frankly, on one hand I like this approach. But on the other I
> >> dislike it at the same time - we have just too much variables in
> >> qemu.conf because that's our answer to problems like these. We
don't
> >> know so we offload the setting to the sys admin.
> >
> > Honestly it is well overdue for us to come up with an improvement to
> > QEMU that lets us start QEMU & open the monitor in a race-free
manner.
> > The obvious answer to this is to allow us to pass down a pre-opened
> > UNIX listener socket FD to QEMU. We can thus connect() immediately
> > with no race and then simply away the QMP greeting with no timeout,
> > safely getting EOF if QEMU fails to start.
> 
> Wish I could volunteer to work on that but am afraid my day job has me
> now thinking about building a custom package to work around this for
> the moment, or even attempting to find the right hexedit against the
> existing shared object o_0... probably a line of thinking I should
> squash now. Would it be helpful to have this registered as a customer
> RFE with Red Hat?
By all means file a bug report about this against RHEL if that's what
you're using. It'll help track & priortize the issue for future
updates.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Seemingly Similar Threads

Search for more apparently analagous threads

libvirt users - Feb 2017 - Re: high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

Re: [libvirt-users] high memory guest issues - virsh start and QEMU_JOB_WAIT_TIME

Seemingly Similar Threads