Hi all, I am trying to diagnose a lagging issue on one of my domains. Right now I have 3 DomU running + Dom0 and lagging is really bad. Even Dom0 responsiveness is really low (sometimes it takes minutes to authenticate over SSH). I fiddled with sched-credit without success: ID Weight Domain-0 0 6000 0 dom1 12 256 0 dom2 11 256 0 dom3 2 256 0 The bad side is that dom1 is Windows and lagging is causing the kernel/drivers to crash. The machine is a single Xeon 1260L (4 Cores, 8 threads) with 20GB of RAM. Any hints? Thanks -e _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Enzo, 2012/1/18 Enzo Lombardi <enzinol@gmail.com>:> Hi all, > I am trying to diagnose a lagging issue on one of my domains. > Right now I have 3 DomU running + Dom0 and lagging is really bad. Even Dom0 > responsiveness is really low (sometimes it takes minutes to authenticate > over SSH). > I fiddled with sched-credit without success:What are the highest consuming processes in dom0''s top output? Can you check/add a sar -dp 30 2? (runs 60s like that) Also look at xentop (sort by cpu%) Florian
Quoting Enzo Lombardi <enzinol@gmail.com>:> Hi all, > I am trying to diagnose a lagging issue on one of my domains. > Right now I have 3 DomU running + Dom0 and lagging is really bad. Even Dom0 > responsiveness is really low (sometimes it takes minutes to authenticate > over SSH). > I fiddled with sched-credit without success: > ID Weight > Domain-0 0 6000 0 > dom1 12 256 0 > dom2 11 256 0 > dom3 2 256 0 > > > The bad side is that dom1 is Windows and lagging is causing the > kernel/drivers to crash. > The machine is a single Xeon 1260L (4 Cores, 8 threads) with 20GB of RAM. > Any hints? > Thanks > -e >SSH lags are not only caused by Xen time managment, but, by poorly constructed (or non-existent) DNS for the server and the guest OS. Which version of Windows? I had a Windows 2008 system that was not responsive enough for the users (even though I knew the hardware was working well). It turns out a hotfix from Microsoft was required to clear up the non-responsive Windows 2008 server. Ken Cobler
On Wed, Jan 18, 2012 at 09:12:02AM -0800, Enzo Lombardi wrote:> Hi all, > I am trying to diagnose a lagging issue on one of my domains. > Right now I have 3 DomU running + Dom0 and lagging is really bad. Even Dom0 > responsiveness is really low (sometimes it takes minutes to authenticate > over SSH).My experience? 90% of the time when something is that slow, it''s disk I/O that is the problem, not CPU. (as someone else pointed out DNS problems can cause big problems with network operations, which accounts for most of the other 10%, but it sounds like more than just network stuff is slow) Run ''top'' or ''sar'' on the dom0 and look for IOwait. If I''m right, if you run top in one window while you try to ssh in from another, while the SSH is being slow, the IOwait (I think in top, it says %wa) will be really high, like above 70%. When that happens to me? the first thing I look for is a guest that is swapping a lot. The next thing I look for is a failing disk. Especially if you use consumer-grade disk rather than ''enterprise sata'' when a disk starts failing, it slows waay down. The ''enterprise sata'' stuff tends to fail outright before that, which helps immensely.
Thank you all, I will report back when I find out what is going on. -e On Wed, Jan 18, 2012 at 12:57 PM, Luke S. Crawford <lsc@prgmr.com> wrote:> On Wed, Jan 18, 2012 at 09:12:02AM -0800, Enzo Lombardi wrote: > > Hi all, > > I am trying to diagnose a lagging issue on one of my domains. > > Right now I have 3 DomU running + Dom0 and lagging is really bad. Even > Dom0 > > responsiveness is really low (sometimes it takes minutes to authenticate > > over SSH). > > My experience? 90% of the time when something is that slow, it''s disk > I/O that is the problem, not CPU. (as someone else pointed out > DNS problems can cause big problems with network operations, which > accounts for most of the other 10%, but it sounds like more than just > network stuff is slow) > > Run ''top'' or ''sar'' on the dom0 and look for IOwait. If I''m right, > if you run top in one window while you try to ssh in from another, while > the SSH is being slow, the IOwait (I think in top, it says %wa) will > be really high, like above 70%. > > When that happens to me? the first thing I look for is a guest that is > swapping a lot. The next thing I look for is a failing disk. Especially > if you use consumer-grade disk rather than ''enterprise sata'' when a > disk starts failing, it slows waay down. The ''enterprise sata'' stuff > tends to fail outright before that, which helps immensely. > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ok, one more data point, this happened yesterday. Basically one of the process in the Windows DomU started hogging the CPU; all other domains were fine, but that specific DomU became not responsive, like it used to happen on single core machines. I have a Xeon 1260L (4 cores 2 threads) and I assigned the DomU 8 VCPUs. I see them all in task manager, but I couldn''t "use" them. After I killed the CPU hogging process (Chrome) everything started to work flawlessly. What are the best practices? Is it ok to assign all VCPUs to a DomU? Is this a possible bug in Xen scheduler or should I take into account VCPUs in the sched-credit weights? I couldn''t find any good documentation about it. Thanks in advance. -e On Thu, Jan 19, 2012 at 4:03 PM, Enzo Lombardi <enzinol@gmail.com> wrote:> Thank you all, I will report back when I find out what is going on. > -e > > > On Wed, Jan 18, 2012 at 12:57 PM, Luke S. Crawford <lsc@prgmr.com> wrote: > >> On Wed, Jan 18, 2012 at 09:12:02AM -0800, Enzo Lombardi wrote: >> > Hi all, >> > I am trying to diagnose a lagging issue on one of my domains. >> > Right now I have 3 DomU running + Dom0 and lagging is really bad. Even >> Dom0 >> > responsiveness is really low (sometimes it takes minutes to authenticate >> > over SSH). >> >> My experience? 90% of the time when something is that slow, it''s disk >> I/O that is the problem, not CPU. (as someone else pointed out >> DNS problems can cause big problems with network operations, which >> accounts for most of the other 10%, but it sounds like more than just >> network stuff is slow) >> >> Run ''top'' or ''sar'' on the dom0 and look for IOwait. If I''m right, >> if you run top in one window while you try to ssh in from another, while >> the SSH is being slow, the IOwait (I think in top, it says %wa) will >> be really high, like above 70%. >> >> When that happens to me? the first thing I look for is a guest that is >> swapping a lot. The next thing I look for is a failing disk. Especially >> if you use consumer-grade disk rather than ''enterprise sata'' when a >> disk starts failing, it slows waay down. The ''enterprise sata'' stuff >> tends to fail outright before that, which helps immensely. >> >> >> >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users