Michael Warnecke
2014-Aug-25 20:56 UTC
[libvirt-users] Progressively worsening unresponsiveness
Hello all, I'm new to the list, and hoping someone can point me in the right direction. I've got an Ubuntu 14 dom0 with, what I think, are excellent specs. 12 cores, hyperthreaded, VT, 64GB RAM, gobs of disk, etc... all to run 8 virtual machines (at the moment). My problem is each of the guests - Windows, Ubuntu, FreeBSD, starts out working fine. After several minutes, the guest becomes unresponsive for a moment, and comes back. As time goes on, this happens more, and more frequently, and longer and longer until after about a day the guest is completely useless, and needs a shutdown, and start. The dom0 is NEVER unresponsive. I've tried to track this down through every means I have available, and have come up empty handed. Here is some of what I've done so far, please let me know what other information would help to debug this: 1. I've set guest OS processor to be default, and copied from host. Neither made any noticable difference. 2. I've switched all disks and network to be VirtIO. Big improvement over IDE, but the unresponsive problem persists. 3. Linux guests have the following in dmesg: hrtimer: interrupt took 10109276 ns [sched_delayed] sched: RT throttling activated 4. I found nothing suspicious in the dom0's dmesg. 5. Unresponsiveness does not correlate with disk usage. 6. Host uses 4 disks in software RAID-5 (yes I know I'm bad, but there are reasons) using BTRFS. 7. Guest disks are all raw. Please let me know if there is some other useful information, or if you have an idea where I should look next. Thanks!
Martin Kletzander
2014-Aug-26 04:59 UTC
Re: [libvirt-users] Progressively worsening unresponsiveness
On Mon, Aug 25, 2014 at 02:56:57PM -0600, Michael Warnecke wrote:>Hello all, I'm new to the list, and hoping someone can point me in the >right direction. > >I've got an Ubuntu 14 dom0 with, what I think, are excellent specs. 12 >cores, hyperthreaded, VT, 64GB RAM, gobs of disk, etc... all to run 8 >virtual machines (at the moment). > >My problem is each of the guests - Windows, Ubuntu, FreeBSD, starts out >working fine. After several minutes, the guest becomes unresponsive for a >moment, and comes back. As time goes on, this happens more, and more >frequently, and longer and longer until after about a day the guest is >completely useless, and needs a shutdown, and start. > >The dom0 is NEVER unresponsive. > >I've tried to track this down through every means I have available, and >have come up empty handed. Here is some of what I've done so far, please >let me know what other information would help to debug this: > >1. I've set guest OS processor to be default, and copied from host. >Neither made any noticable difference. >2. I've switched all disks and network to be VirtIO. Big improvement over >IDE, but the unresponsive problem persists. >3. Linux guests have the following in dmesg: >hrtimer: interrupt took 10109276 ns >[sched_delayed] sched: RT throttling activated >4. I found nothing suspicious in the dom0's dmesg. >5. Unresponsiveness does not correlate with disk usage. >6. Host uses 4 disks in software RAID-5 (yes I know I'm bad, but there are >reasons) using BTRFS. >7. Guest disks are all raw. > >Please let me know if there is some other useful information, or if you >have an idea where I should look next. >I would suggest looking for a bottleneck on the host and then on guests as well. I like using atop for this for example. virt-top can show you how much each machine eats up. Check the memory and processor usage. What are the settings (CPU, MEM, disks) for the machines? It still might be just a minor issue like for example that everything starts swapping and you're gone then of course. Martin>Thanks!>_______________________________________________ >libvirt-users mailing list >libvirt-users@redhat.com >https://www.redhat.com/mailman/listinfo/libvirt-users
Michael Warnecke
2014-Aug-27 14:40 UTC
Re: [libvirt-users] Progressively worsening unresponsiveness
atop Is pretty great. Guest machines are mostly 1 processor, 2GB RAM running some old 16-bit apps. Guests were only using 15% of their RAM as shown by Task Manager in Windows, and very little swap. The host had 48GB of RAM free, and 48k of swap in use, so it certainly isn't swapping. atop Only showed disk going into red when starting all domains at the same time. I did have one guest configured to use the wrong CPU... It slipped past me when moving it from the old host to the new one. I will post a reply if that seems to have corrected all the problems. Thanks for the help! On Mon, Aug 25, 2014 at 10:59 PM, Martin Kletzander <mkletzan@redhat.com> wrote:> On Mon, Aug 25, 2014 at 02:56:57PM -0600, Michael Warnecke wrote: > >> Hello all, I'm new to the list, and hoping someone can point me in the >> right direction. >> >> I've got an Ubuntu 14 dom0 with, what I think, are excellent specs. 12 >> cores, hyperthreaded, VT, 64GB RAM, gobs of disk, etc... all to run 8 >> virtual machines (at the moment). >> >> My problem is each of the guests - Windows, Ubuntu, FreeBSD, starts out >> working fine. After several minutes, the guest becomes unresponsive for a >> moment, and comes back. As time goes on, this happens more, and more >> frequently, and longer and longer until after about a day the guest is >> completely useless, and needs a shutdown, and start. >> >> The dom0 is NEVER unresponsive. >> >> I've tried to track this down through every means I have available, and >> have come up empty handed. Here is some of what I've done so far, please >> let me know what other information would help to debug this: >> >> 1. I've set guest OS processor to be default, and copied from host. >> Neither made any noticable difference. >> 2. I've switched all disks and network to be VirtIO. Big improvement over >> IDE, but the unresponsive problem persists. >> 3. Linux guests have the following in dmesg: >> hrtimer: interrupt took 10109276 ns >> [sched_delayed] sched: RT throttling activated >> 4. I found nothing suspicious in the dom0's dmesg. >> 5. Unresponsiveness does not correlate with disk usage. >> 6. Host uses 4 disks in software RAID-5 (yes I know I'm bad, but there are >> reasons) using BTRFS. >> 7. Guest disks are all raw. >> >> Please let me know if there is some other useful information, or if you >> have an idea where I should look next. >> >> > I would suggest looking for a bottleneck on the host and then on > guests as well. I like using atop for this for example. virt-top can > show you how much each machine eats up. Check the memory and > processor usage. What are the settings (CPU, MEM, disks) for the > machines? > > It still might be just a minor issue like for example that everything > starts swapping and you're gone then of course. > > Martin > > Thanks! >> > > _______________________________________________ >> libvirt-users mailing list >> libvirt-users@redhat.com >> https://www.redhat.com/mailman/listinfo/libvirt-users >> >
Seemingly Similar Threads
- Re: Progressively worsening unresponsiveness
- Abort due to systemic unresponsiveness
- Server unresponsive until reboot, memory exhausted
- plockstat: processing aborted: Abort due to systemic unresponsiveness
- C6 server responding extremely slow on ssh interactive