Hi, I have the issue of my virtual machines becoming extremely slow when even only one of them is creating a lot of I/O. Is there a way to prioritize disk access? I can''t seem to find any. The Xen host in question is: - Quad core Xeon X3430 @ 2.40GHz - 3Ware 9650SE RAID6 array, Seagate 2 TB disks. - Xen 4.0.1-5.8 on Debian 6 (upgrade planned) - 15 DomU''s - All VM''s have noop as disk scheduler (cat /sys/block/xvda2/queue/scheduler) - VM''s are prioritized with ''xm sched-cred'', but that doesn''t help the disk much. - Dom-0 has significantly more credits (10000) because it needs to service IO''s. - Dom-0 doesn''t do anything else. - All virtual disks are logical volumes, exposed to the VM through xen-blkfront So, what can I do to improve disk performance or priority? Regards, Wiebe
> -----Original Message----- > From: xen-users-bounces@lists.xen.org [mailto:xen-users- > bounces@lists.xen.org] On Behalf Of Wiebe Cazemier > Sent: 10 June 2013 09:11 > To: xen-users@lists.xen.org > Subject: [Xen-users] Disk starvation between DomU''s > > Hi, > > I have the issue of my virtual machines becoming extremely slow when > even only one of them is creating a lot of I/O. Is there a way to > prioritize disk access? I can''t seem to find any. > > The Xen host in question is: > > - Quad core Xeon X3430 @ 2.40GHz > - 3Ware 9650SE RAID6 array, Seagate 2 TB disks. > - Xen 4.0.1-5.8 on Debian 6 (upgrade planned) > - 15 DomU''s > - All VM''s have noop as disk scheduler (cat > /sys/block/xvda2/queue/scheduler) > - VM''s are prioritized with ''xm sched-cred'', but that doesn''t help the > disk much. > - Dom-0 has significantly more credits (10000) because it needs to > service IO''s. > - Dom-0 doesn''t do anything else. > - All virtual disks are logical volumes, exposed to the VM through xen- > blkfront > > So, what can I do to improve disk performance or priority?Can you try using ionice to set the disk priority of the corresponding tapdisk/qemu process?> > Regards, > > Wiebe > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
On Mon, 2013-06-10 at 08:24 +0000, Thanos Makatos wrote:> > > -----Original Message----- > > From: xen-users-bounces@lists.xen.org [mailto:xen-users- > > bounces@lists.xen.org] On Behalf Of Wiebe Cazemier > > Sent: 10 June 2013 09:11 > > To: xen-users@lists.xen.org > > Subject: [Xen-users] Disk starvation between DomU''s > > > > Hi, > > > > I have the issue of my virtual machines becoming extremely slow when > > even only one of them is creating a lot of I/O. Is there a way to > > prioritize disk access? I can''t seem to find any. > > > > The Xen host in question is: > > > > - Quad core Xeon X3430 @ 2.40GHz > > - 3Ware 9650SE RAID6 array, Seagate 2 TB disks. > > - Xen 4.0.1-5.8 on Debian 6 (upgrade planned) > > - 15 DomU''s > > - All VM''s have noop as disk scheduler (cat > > /sys/block/xvda2/queue/scheduler) > > - VM''s are prioritized with ''xm sched-cred'', but that doesn''t help the > > disk much. > > - Dom-0 has significantly more credits (10000) because it needs to > > service IO''s. > > - Dom-0 doesn''t do anything else. > > - All virtual disks are logical volumes, exposed to the VM through xen- > > blkfront > > > > So, what can I do to improve disk performance or priority? > > Can you try using ionice to set the disk priority of the corresponding tapdisk/qemu process?....Or if using blkback the relevant kernel thread. Ian.
----- Original Message -----> From: "Ian Campbell" <Ian.Campbell@citrix.com> > To: "Thanos Makatos" <thanos.makatos@citrix.com> > Cc: "Wiebe Cazemier" <wiebe@halfgaar.net>, xen-users@lists.xen.org > Sent: Tuesday, 11 June, 2013 12:41:39 PM > Subject: Re: [Xen-users] Disk starvation between DomU''s > > ....Or if using blkback the relevant kernel thread. > > Ian.That''s what I ended up doing. After first having a certain Domu "best effort, 0", I now put it in the real-time class, with prio 3. I can''t say I notice any ''real-time'' performance now. It still hangs occasionally. Additionally, when I do the following on the virtual machine in question: dd if=/dev/zero of=dummy bs=1M I hardly see any disk activity on the Dom0 with iostat. I see the blkback popping up occasionally with a few kb/s, but I would expect tens of MB''s per second. The file ''dummy'' is several GB''s big in a short while, so it does write. Why don''t I see the traffic popping up in iostat on the Dom0?
> -----Original Message----- > From: Wiebe Cazemier [mailto:wiebe@halfgaar.net] > Sent: 17 June 2013 16:24 > To: Ian Campbell > Cc: Thanos Makatos; xen-users@lists.xen.org > Subject: Re: [Xen-users] Disk starvation between DomU''s > > ----- Original Message ----- > > From: "Ian Campbell" <Ian.Campbell@citrix.com> > > To: "Thanos Makatos" <thanos.makatos@citrix.com> > > Cc: "Wiebe Cazemier" <wiebe@halfgaar.net>, xen-users@lists.xen.org > > Sent: Tuesday, 11 June, 2013 12:41:39 PM > > Subject: Re: [Xen-users] Disk starvation between DomU''s > > > > ....Or if using blkback the relevant kernel thread. > > > > Ian. > > That''s what I ended up doing. After first having a certain Domu "best > effort, 0", I now put it in the real-time class, with prio 3. I can''t > say I notice any ''real-time'' performance now. It still hangs > occasionally.I''m not sure whether this will work. AFAIK actual I/O is performed by tapdisk/qemu, so could you experiment with that instead? Also, keep in mind that there is CPU processing in the data path, so have a look at the dom0 CPU usage when executing the I/O test.> > Additionally, when I do the following on the virtual machine in > question: > > dd if=/dev/zero of=dummy bs=1M > > I hardly see any disk activity on the Dom0 with iostat. I see the > blkback popping up occasionally with a few kb/s, but I would expect > tens of MB''s per second. The file ''dummy'' is several GB''s big in a > short while, so it does write. > > Why don''t I see the traffic popping up in iostat on the Dom0?This is inexplicable. Either you''ve found a bug, or there''s something wrong in the I/O test. Could you post more details? (E.g. total I/O performed, domU memory size, dom0 memory size, average CPU usage, etc.) What''s the array''s I/O scheduler? I think since it''s a RAID controller the "suggested" value is NOOP. If your backend is tapdisk, then CFQ *might* do the trick since each domU is served by a different tapdisk process (it may be the same with qemu).
----- Original Message -----> From: "Thanos Makatos" <thanos.makatos@citrix.com> > To: "Wiebe Cazemier" <wiebe@halfgaar.net>, "Ian Campbell" <Ian.Campbell@citrix.com> > Cc: xen-users@lists.xen.org > Sent: Wednesday, 19 June, 2013 10:53:52 AM > Subject: RE: [Xen-users] Disk starvation between DomU''s > > > > That''s what I ended up doing. After first having a certain Domu > > "best > > effort, 0", I now put it in the real-time class, with prio 3. I > > can''t > > say I notice any ''real-time'' performance now. It still hangs > > occasionally. > > I''m not sure whether this will work. AFAIK actual I/O is performed by > tapdisk/qemu, so could you experiment with that instead? Also, keep > in mind that there is CPU processing in the data path, so have a > look at the dom0 CPU usage when executing the I/O test.Tapdisk? I use the phy backend, with the DomU being on a logical volume. I don''t even have processes with tap or qemu in their name. As for the CPU usage; see below.> > > > > Additionally, when I do the following on the virtual machine in > > question: > > > > dd if=/dev/zero of=dummy bs=1M > > > > I hardly see any disk activity on the Dom0 with iostat. I see the > > blkback popping up occasionally with a few kb/s, but I would expect > > tens of MB''s per second. The file ''dummy'' is several GB''s big in a > > short while, so it does write. > > > > Why don''t I see the traffic popping up in iostat on the Dom0? > > This is inexplicable. Either you''ve found a bug, or there''s something > wrong in the I/O test. Could you post more details? (E.g. total I/O > performed, domU memory size, dom0 memory size, average CPU usage, > etc.)I have a DomU with ID 9 in "xm list", and the processes "[blkback.9.xvda2]" and "[blkback.9.xvda1]" have RT/3 priority. The DomU has 2 GB of RAM, no swap, 800 MB free (without cache). The Dom0 has 512 MB of RAM (288 free without cache, 30 in use on swap). It''s mem is limited with a boot param. When I do this on the DomU: dd if=/dev/zero of=bla2.img bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 14.6285 s, 71.7 MB/s I see the [blkback.9.xvda2] popping up at the top of "iotop" on the Dom0, hanging between 50 and 300 kB/s. Nowhere near the 70 MB/s. There is hardly any other process performing IO. "iostat 2" does show a high blocks/s count for its logical volume, dm-4. The Dom0 uses about 30% CPU according to "xm top" while dd''ing. It has 4 cores available.> > What''s the array''s I/O scheduler? I think since it''s a RAID > controller the "suggested" value is NOOP. If your backend is > tapdisk, then CFQ *might* do the trick since each domU is served by > a different tapdisk process (it may be the same with qemu).The host has a 3Ware RAID6 array. Dom0 has CFQ, all DomU''s have noop. Are you saying that when using a hardware RAID, the Dom0 should use noop as well? Specs Debian 6 Linux 2.6.32-5-xen-amd64 xen-hypervisor-4.0-amd64: 4.0.1-5.8 CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz 16 GB RAM
> The DomU has 2 GB of RAM, no swap, 800 MB free (without cache). > The Dom0 has 512 MB of RAM (288 free without cache, 30 in use on swap). > It''s mem is limited with a boot param. > > When I do this on the DomU: > > dd if=/dev/zero of=bla2.img bs=1M count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 14.6285 s, 71.7 MB/sYou''re generating 1 GB of I/O while domU''s memory is 2 GB, so the entire workload fits in domU buffer cache. If you wait a bit longer after the dd has finished you should see in iostat 1 GB of I/O traffic, you can make this happen sooner by executing "sync" right after the dd. I''d suggest increasing the number of blocks written (I''d say at least 2xRAM size) and/or use oflag=direct in dd.> > What''s the array''s I/O scheduler? I think since it''s a RAID > controller > > the "suggested" value is NOOP. If your backend is tapdisk, then CFQ > > *might* do the trick since each domU is served by a different tapdisk > > process (it may be the same with qemu). > > The host has a 3Ware RAID6 array. Dom0 has CFQ, all DomU''s have noop. > Are you saying that when using a hardware RAID, the Dom0 should use > noop as well?The RAID6 array should be present in dom0 as a block device, and IIUC on top of it you''ve created logical volumes (one per domU), is this correct? AFAIK the raid controller provides sufficient scheduling so it''s usually suggested to use NOOP, however you could experiment setting the I/O scheduler of the RAI6 array block device to CFQ. I''m not sure what the I/O scheduler of /dev/xvd* should be in each domU, but I suspect it''s irrelevant to the issue you''re facing.
----- Original Message -----> From: "Thanos Makatos" <thanos.makatos@citrix.com> > To: "Wiebe Cazemier" <wiebe@halfgaar.net> > Cc: xen-users@lists.xen.org > Sent: Wednesday, 19 June, 2013 4:28:47 PM > Subject: RE: [Xen-users] Disk starvation between DomU''s > > > The DomU has 2 GB of RAM, no swap, 800 MB free (without cache). > > The Dom0 has 512 MB of RAM (288 free without cache, 30 in use on > > swap). > > It''s mem is limited with a boot param. > > > > When I do this on the DomU: > > > > dd if=/dev/zero of=bla2.img bs=1M count=1000 > > 1000+0 records in > > 1000+0 records out > > 1048576000 bytes (1.0 GB) copied, 14.6285 s, 71.7 MB/s > > You''re generating 1 GB of I/O while domU''s memory is 2 GB, so the > entire workload fits in domU buffer cache. If you wait a bit longer > after the dd has finished you should see in iostat 1 GB of I/O > traffic, you can make this happen sooner by executing "sync" right > after the dd. > > I''d suggest increasing the number of blocks written (I''d say at least > 2xRAM size) and/or use oflag=direct in dd.Hmm. That didn''t make a difference, but something else did. I was looking at the read column... Stupid mistake, but not so stupid as you might think. My eye was drawn to the changing figures, and now I see that write IO is always 0, according to iotop. When I do "iotop -oa" (accumulated, leave out non-active processes), all blkback processes that are appearing all accumulate 0 bytes written. I don''t understand that...> > > > What''s the array''s I/O scheduler? I think since it''s a RAID > > controller > > > the "suggested" value is NOOP. If your backend is tapdisk, then > > > CFQ > > > *might* do the trick since each domU is served by a different > > > tapdisk > > > process (it may be the same with qemu). > > > > The host has a 3Ware RAID6 array. Dom0 has CFQ, all DomU''s have > > noop. > > Are you saying that when using a hardware RAID, the Dom0 should use > > noop as well? > > The RAID6 array should be present in dom0 as a block device, and IIUC > on top of it you''ve created logical volumes (one per domU), is this > correct? AFAIK the raid controller provides sufficient scheduling so > it''s usually suggested to use NOOP, however you could experiment > setting the I/O scheduler of the RAI6 array block device to CFQ. I''m > not sure what the I/O scheduler of /dev/xvd* should be in each domU, > but I suspect it''s irrelevant to the issue you''re facing. >That''s correct. Currently, the RAID array is CFQ. I would seem weird to me to change that into noop. The RAID controller might schedule, but it can''t receive instructions from the OS what should have priority. I''ll look into it, though. I do know that the recommended DomU scheduler is noop. It''s also the default for all my machines without configuring it. I guess they know they''re virtual.
> Hmm. That didn''t make a difference, but something else did. I was > looking at the read column... Stupid mistake, but not so stupid as you > might think. My eye was drawn to the changing figures, and now I see > that write IO is always 0, according to iotop. When I do "iotop -oa" > (accumulated, leave out non-active processes), all blkback processes > that are appearing all accumulate 0 bytes written. I don''t understand > that...Can you try "iostat -x 1" instead of iotop?> That''s correct. Currently, the RAID array is CFQ. I would seem weird to > me to change that into noop. The RAID controller might schedule, but it > can''t receive instructions from the OS what should have priority. I''ll > look into it, though.The rule of thumb is to let the RAID controller do the scheduling; otherwise the two schedulers may end "competing" with each other. Of course this depends on the RAID controller, the I/O workload etc. so it may make no difference in your particular case.> I do know that the recommended DomU scheduler is noop. It''s also the > default for all my machines without configuring it. I guess they know > they''re virtual.Not necessarily: using CFQ inside a VM would still make sense if you want to enforce I/O fairness among the applications running inside it, although this could potentially lead to weird interactions with the OS''s/controller''s I/O scheduler.