Hi guys, Firstly, please CC me in to any replies as I''m not a subscriber these days. I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to achieve more than ~50Mb/sec sustained sequential write to a disk. The DomU is configured as such: name = "zeus.vm" memory = 1024 vcpus = 2 cpus = "1-3" disk = [ ''phy:/dev/RAID1/zeus.vm,xvda,w'', ''phy:/dev/vg_raid6/fileshare,xvdb,w'' ] vif = [ "mac=02:16:36:35:35:09, bridge=br203, vifname=vm.zeus.203", "mac=10:16:36:35:35:09, bridge=br10, vifname=vm.zeus.10" ] bootloader = "pygrub" on_poweroff = ''destroy'' on_reboot = ''restart'' on_crash = ''restart'' I have tested the underlying LVM config by mounting /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a benchmark: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xenhost.lan.crc. 2G 667 96 186976 21 80430 14 956 95 290591 26 373.7 8 Latency 26416us 212ms 168ms 35494us 35989us 83759us Version 1.96 ------Sequential Create------ --------Random Create-------- xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 14901 32 +++++ +++ 19672 39 15307 34 +++++ +++ 18158 37 Latency 17838us 141us 298us 365us 133us 296us ~186Mb/sec write, ~290Mb/sec read. Awesome. I then started a single DomU which gets passed /dev/vg_raid6/fileshare through as xvdb. It is then mounted in /mnt/fileshare/. I then ran bonnie++ again in the DomU: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 658 96 50618 8 42398 10 1138 99 267568 30 494.9 11 Latency 22959us 226ms 311ms 14617us 41816us 72814us Version 1.96 ------Sequential Create------ --------Random Create-------- zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 21749 59 +++++ +++ 31089 73 23283 64 +++++ +++ 31114 75 Latency 18989us 164us 928us 480us 26us 87us ~50Mb/sec write, ~267Mb/sec read. Not so awesome. /dev/vg_raid6/fileshare exists as an LV on /dev/md2: # lvdisplay vg_raid6/fileshare --- Logical volume --- LV Path /dev/vg_raid6/fileshare LV Name fileshare VG Name vg_raid6 LV UUID cwC0yK-Xr56-WB5v-10bw-3AZT-pYj0-piWett LV Write Access read/write LV Creation host, time xenhost.lan.crc.id.au, 2013-02-18 20:59:40 +1100 LV Status available # open 1 LV Size 2.50 TiB Current LE 655360 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 1024 Block device 253:5 md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU] Heres a quick output of ''xm info'' - although its full VM load is running now: # xm info host : xenhost.lan.crc.id.au release : 3.7.9-1.el6xen.x86_64 version : #1 SMP Mon Feb 18 14:46:35 EST 2013 machine : x86_64 nr_cpus : 4 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 3303 hw_caps : bfebfbff:28100800:00000000:00003f40:179ae3bf:00000000:00000001:00000000 virt_caps : hvm total_memory : 8116 free_memory : 1346 free_cpus : 0 xen_major : 4 xen_minor : 2 xen_extra : .1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin cc_compiler : gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4) cc_compile_by : mockbuild cc_compile_domain : crc.id.au cc_compile_date : Sat Feb 16 19:16:38 EST 2013 xend_config_format : 4 In a nutshell, does anyone know *why* I am only able to get ~50Mb/sec sequential writes to the DomU? It certainly isn''t a problem getting normal speeds to the LV while mounted in the Dom0. All OS are Scientific Linux 6.3. The Dom0 runs packages from my kernel-xen repo (http://au1.mirror.crc.id.au/repo/el6/x86_64/). The DomU is completely stock packages. -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 20/02/13 03:10, Steven Haigh wrote:> Hi guys, > > Firstly, please CC me in to any replies as I''m not a subscriber these days. > > I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to > achieve more than ~50Mb/sec sustained sequential write to a disk. The > DomU is configured as such:Since you mention 4.2.1 explicitly, is this a performance regression from previous versions? (4.2.0 or the 4.1 branch)> name = "zeus.vm" > memory = 1024 > vcpus = 2 > cpus = "1-3" > disk = [ ''phy:/dev/RAID1/zeus.vm,xvda,w'', > ''phy:/dev/vg_raid6/fileshare,xvdb,w'' ] > vif = [ "mac=02:16:36:35:35:09, bridge=br203, > vifname=vm.zeus.203", "mac=10:16:36:35:35:09, bridge=br10, > vifname=vm.zeus.10" ] > bootloader = "pygrub" > > on_poweroff = ''destroy'' > on_reboot = ''restart'' > on_crash = ''restart'' > > I have tested the underlying LVM config by mounting > /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a > benchmark: > > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > xenhost.lan.crc. 2G 667 96 186976 21 80430 14 956 95 290591 26 > 373.7 8 > Latency 26416us 212ms 168ms 35494us 35989us 83759us > Version 1.96 ------Sequential Create------ --------Random > Create-------- > xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 14901 32 +++++ +++ 19672 39 15307 34 +++++ +++ > 18158 37 > Latency 17838us 141us 298us 365us 133us 296us > > ~186Mb/sec write, ~290Mb/sec read. Awesome. > > I then started a single DomU which gets passed /dev/vg_raid6/fileshare > through as xvdb. It is then mounted in /mnt/fileshare/. I then ran > bonnie++ again in the DomU: > > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > zeus.crc.id.au 2G 658 96 50618 8 42398 10 1138 99 267568 30 > 494.9 11 > Latency 22959us 226ms 311ms 14617us 41816us 72814us > Version 1.96 ------Sequential Create------ --------Random > Create-------- > zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 21749 59 +++++ +++ 31089 73 23283 64 +++++ +++ > 31114 75 > Latency 18989us 164us 928us 480us 26us 87us > > ~50Mb/sec write, ~267Mb/sec read. Not so awesome.We are currently working on improving the speed of pv block drivers, I will look into this difference between the read/write speed, but I would guess this is due to the size of the request/ring.> > /dev/vg_raid6/fileshare exists as an LV on /dev/md2: > > # lvdisplay vg_raid6/fileshare > --- Logical volume --- > LV Path /dev/vg_raid6/fileshare > LV Name fileshare > VG Name vg_raid6 > LV UUID cwC0yK-Xr56-WB5v-10bw-3AZT-pYj0-piWett > LV Write Access read/write > LV Creation host, time xenhost.lan.crc.id.au, 2013-02-18 20:59:40 +1100 > LV Status available > # open 1 > LV Size 2.50 TiB > Current LE 655360 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 1024 > Block device 253:5 > > > md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] > 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 > [4/4] [UUUU] > > Heres a quick output of ''xm info'' - although its full VM load is running > now: > # xm info > host : xenhost.lan.crc.id.au > release : 3.7.9-1.el6xen.x86_64 > version : #1 SMP Mon Feb 18 14:46:35 EST 2013 > machine : x86_64 > nr_cpus : 4 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 3303 > hw_caps : > bfebfbff:28100800:00000000:00003f40:179ae3bf:00000000:00000001:00000000 > virt_caps : hvm > total_memory : 8116 > free_memory : 1346 > free_cpus : 0 > xen_major : 4 > xen_minor : 2 > xen_extra : .1 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > xen_commandline : dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 > dom0_vcpus_pin > cc_compiler : gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4) > cc_compile_by : mockbuild > cc_compile_domain : crc.id.au > cc_compile_date : Sat Feb 16 19:16:38 EST 2013 > xend_config_format : 4 > > In a nutshell, does anyone know *why* I am only able to get ~50Mb/sec > sequential writes to the DomU? It certainly isn''t a problem getting > normal speeds to the LV while mounted in the Dom0. > > All OS are Scientific Linux 6.3. The Dom0 runs packages from my > kernel-xen repo (http://au1.mirror.crc.id.au/repo/el6/x86_64/). The DomU > is completely stock packages. >
On 20/02/2013 7:26 PM, Roger Pau Monné wrote:> On 20/02/13 03:10, Steven Haigh wrote: >> Hi guys, >> >> Firstly, please CC me in to any replies as I''m not a subscriber these days. >> >> I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to >> achieve more than ~50Mb/sec sustained sequential write to a disk. The >> DomU is configured as such: > > Since you mention 4.2.1 explicitly, is this a performance regression > from previous versions? (4.2.0 or the 4.1 branch)This is actually a very good question. I''ve reinstalled my older packages of Xen 4.1.3 back on the system. Rebooting into the new hypervisor, then starting the single DomU again. Ran bonnie++ again on the DomU: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 658 97 54893 9 40845 10 1056 97 280453 33 561.2 13 Latency 27145us 426ms 257ms 31900us 24701us 222ms Version 1.96 ------Sequential Create------ --------Random Create-------- zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 19281 52 +++++ +++ +++++ +++ 24435 66 +++++ +++ +++++ +++ Latency 22860us 182us 706us 14803us 28us 300us Still around 50Mb/sec - so this doesn''t seem to be a regression, but something else?>> ~50Mb/sec write, ~267Mb/sec read. Not so awesome. > > We are currently working on improving the speed of pv block drivers, I > will look into this difference between the read/write speed, but I would > guess this is due to the size of the request/ring.I would assume this would be in the DomU kernel? -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 20/02/2013 7:49 PM, Steven Haigh wrote:> On 20/02/2013 7:26 PM, Roger Pau Monné wrote: >> On 20/02/13 03:10, Steven Haigh wrote: >>> Hi guys, >>> >>> Firstly, please CC me in to any replies as I''m not a subscriber these >>> days. >>> >>> I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to >>> achieve more than ~50Mb/sec sustained sequential write to a disk. The >>> DomU is configured as such: >> >> Since you mention 4.2.1 explicitly, is this a performance regression >> from previous versions? (4.2.0 or the 4.1 branch) > > This is actually a very good question. I''ve reinstalled my older > packages of Xen 4.1.3 back on the system. Rebooting into the new > hypervisor, then starting the single DomU again. Ran bonnie++ again on > the DomU: > > Still around 50Mb/sec - so this doesn''t seem to be a regression, but > something else?I''ve actually done a bit of thinking about this... A recent thread on linux-raid kernel mailing list about Xen and DomU throughput made me revisit my setup. I know I used to be able to saturate GigE both ways (send and receive) to the samba share served by this DomU. This would mean I''d get at least 90-100Mbyte/sec. What exact config and kernel/xen versions this was as this point in time I cannot say. As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk size. This seemed to make rebuild/resync speeds way worse - so I reverted to 128Kb chunk size. The benchmarks I am getting from the Dom0 is about what I''d expect - but I wouldn''t expect to lose 130Mb/sec write speed to the phy:/ pass through of the LV. From my known config where I could saturate the GigE connection, I have changed from kernel 2.6.32 (Jeremy''s git repo) to the latest vanilla kernels - currently 3.7.9. My build of Xen 4.2.1 also has all of the recent security advisories patched as well. Although it is interesting to note that downgrading to Xen 4.1.2 made no difference to write speeds. -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote: > My build of Xen 4.2.1 also has all of the recent security advisories > patched as well. Although it is interesting to note that downgrading to > Xen 4.1.2 made no difference to write speeds.Not surprising at all, considering that the hypervisor is only a passive library for all PV I/O purposes. You''re likely hunting for a kernel side regression (and hence the mentioning of the hypervisor version as the main factor in the subject is probably misleading). Jan
On 20/02/13 10:12, Jan Beulich wrote:>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote: >> My build of Xen 4.2.1 also has all of the recent security advisories >> patched as well. Although it is interesting to note that downgrading to >> Xen 4.1.2 made no difference to write speeds. > Not surprising at all, considering that the hypervisor is only a passive > library for all PV I/O purposes. You''re likely hunting for a kernel side > regression (and hence the mentioning of the hypervisor version as > the main factor in the subject is probably misleading). > > JanFurther to this, do try to verify if your disk driver has changed recently to use >0 order page allocations for DMA. If it has, then speed will be much slower as there will now be the swiotlb cpu-copy overhead. ~Andrew> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 20/02/2013 10:06 PM, Andrew Cooper wrote:> On 20/02/13 10:12, Jan Beulich wrote: >>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote: >>> My build of Xen 4.2.1 also has all of the recent security advisories >>> patched as well. Although it is interesting to note that downgrading to >>> Xen 4.1.2 made no difference to write speeds. >> Not surprising at all, considering that the hypervisor is only a passive >> library for all PV I/O purposes. You''re likely hunting for a kernel side >> regression (and hence the mentioning of the hypervisor version as >> the main factor in the subject is probably misleading). >> >> Jan > > Further to this, do try to verify if your disk driver has changed > recently to use >0 order page allocations for DMA. If it has, then > speed will be much slower as there will now be the swiotlb cpu-copy > overhead.Any hints on how to do this? ;) The kernel modules in use for my SATA drives are ahci and sata_mv. There are 6 drives in total on the system. sda + sdb = RAID1 sd[c-f] = RAID6 sda, sdb, sdc and sdd are on the onboard SATA controller (ahci) sde, sdf are on the sata_mv 4x PCIe controller. -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 20/02/13 11:08, Steven Haigh wrote:> On 20/02/2013 10:06 PM, Andrew Cooper wrote: >> On 20/02/13 10:12, Jan Beulich wrote: >>>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote: >>>> My build of Xen 4.2.1 also has all of the recent security advisories >>>> patched as well. Although it is interesting to note that >>>> downgrading to >>>> Xen 4.1.2 made no difference to write speeds. >>> Not surprising at all, considering that the hypervisor is only a >>> passive >>> library for all PV I/O purposes. You''re likely hunting for a kernel >>> side >>> regression (and hence the mentioning of the hypervisor version as >>> the main factor in the subject is probably misleading). >>> >>> Jan >> >> Further to this, do try to verify if your disk driver has changed >> recently to use >0 order page allocations for DMA. If it has, then >> speed will be much slower as there will now be the swiotlb cpu-copy >> overhead. > > Any hints on how to do this? ;) > > The kernel modules in use for my SATA drives are ahci and sata_mv. > There are 6 drives in total on the system. > > sda + sdb = RAID1 > sd[c-f] = RAID6 > > sda, sdb, sdc and sdd are on the onboard SATA controller (ahci) > sde, sdf are on the sata_mv 4x PCIe controller. >Sadly that is a hard question to answer, and is driver specific. I cant suggest an easy way other than digging into the source. ~Andrew
On Wed, Feb 20, 2013 at 10:08:58PM +1100, Steven Haigh wrote:> On 20/02/2013 10:06 PM, Andrew Cooper wrote: > >On 20/02/13 10:12, Jan Beulich wrote: > >>>>>On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote: > >>>My build of Xen 4.2.1 also has all of the recent security advisories > >>>patched as well. Although it is interesting to note that downgrading to > >>>Xen 4.1.2 made no difference to write speeds. > >>Not surprising at all, considering that the hypervisor is only a passive > >>library for all PV I/O purposes. You''re likely hunting for a kernel side > >>regression (and hence the mentioning of the hypervisor version as > >>the main factor in the subject is probably misleading). > >> > >>Jan > > > >Further to this, do try to verify if your disk driver has changed > >recently to use >0 order page allocations for DMA. If it has, then > >speed will be much slower as there will now be the swiotlb cpu-copy > >overhead. > > Any hints on how to do this? ;) > > The kernel modules in use for my SATA drives are ahci and sata_mv. > There are 6 drives in total on the system. > > sda + sdb = RAID1 > sd[c-f] = RAID6 > > sda, sdb, sdc and sdd are on the onboard SATA controller (ahci) > sde, sdf are on the sata_mv 4x PCIe controller. >Can you try using only the disks on the ahci controller? sata_mv is known to be buggy and problematic.. I''m not sure if that''s the case here, but if you''re able to easily try using only ahci, it''s worth a shot. -- Pasi
On 20/02/2013 8:49 PM, Steven Haigh wrote:> On 20/02/2013 7:49 PM, Steven Haigh wrote: >> On 20/02/2013 7:26 PM, Roger Pau Monné wrote: >>> On 20/02/13 03:10, Steven Haigh wrote: >>>> Hi guys, >>>> >>>> Firstly, please CC me in to any replies as I''m not a subscriber these >>>> days. >>>> >>>> I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to >>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The >>>> DomU is configured as such: >>> >>> Since you mention 4.2.1 explicitly, is this a performance regression >>> from previous versions? (4.2.0 or the 4.1 branch) >> >> This is actually a very good question. I''ve reinstalled my older >> packages of Xen 4.1.3 back on the system. Rebooting into the new >> hypervisor, then starting the single DomU again. Ran bonnie++ again on >> the DomU: >> >> Still around 50Mb/sec - so this doesn''t seem to be a regression, but >> something else? > > I''ve actually done a bit of thinking about this... A recent thread on > linux-raid kernel mailing list about Xen and DomU throughput made me > revisit my setup. I know I used to be able to saturate GigE both ways > (send and receive) to the samba share served by this DomU. This would > mean I''d get at least 90-100Mbyte/sec. What exact config and kernel/xen > versions this was as this point in time I cannot say. > > As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk > size. This seemed to make rebuild/resync speeds way worse - so I > reverted to 128Kb chunk size. > > The benchmarks I am getting from the Dom0 is about what I''d expect - but > I wouldn''t expect to lose 130Mb/sec write speed to the phy:/ pass > through of the LV. > > From my known config where I could saturate the GigE connection, I have > changed from kernel 2.6.32 (Jeremy''s git repo) to the latest vanilla > kernels - currently 3.7.9. > > My build of Xen 4.2.1 also has all of the recent security advisories > patched as well. Although it is interesting to note that downgrading to > Xen 4.1.2 made no difference to write speeds. >Just wondering if there is any further news or tests that I might be able to do on this? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 08/03/13 09:54, Steven Haigh wrote:> Just wondering if there is any further news or tests that I might be > able to do on this?I have been working on speed improvements for blkfront/blkback, and submitted the first RFC series of patches last week, which can be found at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a WIP, so if you want to test them please be aware there might be hidden bugs. I''ve also pushed them to a branch in my git repo: git://xenbits.xen.org/people/royger/linux.git xen-block-indirect You will need to recompile both the Dom0/DomU kernels (if they are not the same) if you want to test them.
On 8/03/2013 8:43 PM, Roger Pau Monné wrote:> On 08/03/13 09:54, Steven Haigh wrote: >> Just wondering if there is any further news or tests that I might be >> able to do on this? > > I have been working on speed improvements for blkfront/blkback, and > submitted the first RFC series of patches last week, which can be found > at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a > WIP, so if you want to test them please be aware there might be hidden > bugs. I''ve also pushed them to a branch in my git repo: > > git://xenbits.xen.org/people/royger/linux.git xen-block-indirect > > You will need to recompile both the Dom0/DomU kernels (if they are not > the same) if you want to test them. >Hmm - how will this react with using say, a stock kernel in the DomU (ie EL6.3 kernel) but these changes in the Dom0? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 08/03/13 10:46, Steven Haigh wrote:> On 8/03/2013 8:43 PM, Roger Pau Monné wrote: >> On 08/03/13 09:54, Steven Haigh wrote: >>> Just wondering if there is any further news or tests that I might be >>> able to do on this? >> >> I have been working on speed improvements for blkfront/blkback, and >> submitted the first RFC series of patches last week, which can be found >> at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a >> WIP, so if you want to test them please be aware there might be hidden >> bugs. I''ve also pushed them to a branch in my git repo: >> >> git://xenbits.xen.org/people/royger/linux.git xen-block-indirect >> >> You will need to recompile both the Dom0/DomU kernels (if they are not >> the same) if you want to test them. >> > > Hmm - how will this react with using say, a stock kernel in the DomU (ie > EL6.3 kernel) but these changes in the Dom0?It should work, but you won''t be able to see much performance improvements (if any at all). Anyway, I''ve just referred to this series for testing, but this should not be used on anything that''s supposed to be stable/production.
On Wed, Feb 20, 2013 at 03:18:46PM +0200, Pasi Kärkkäinen wrote:> On Wed, Feb 20, 2013 at 10:08:58PM +1100, Steven Haigh wrote: > > On 20/02/2013 10:06 PM, Andrew Cooper wrote: > > >On 20/02/13 10:12, Jan Beulich wrote: > > >>>>>On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote: > > >>>My build of Xen 4.2.1 also has all of the recent security advisories > > >>>patched as well. Although it is interesting to note that downgrading to > > >>>Xen 4.1.2 made no difference to write speeds. > > >>Not surprising at all, considering that the hypervisor is only a passive > > >>library for all PV I/O purposes. You''re likely hunting for a kernel side > > >>regression (and hence the mentioning of the hypervisor version as > > >>the main factor in the subject is probably misleading). > > >> > > >>Jan > > > > > >Further to this, do try to verify if your disk driver has changed > > >recently to use >0 order page allocations for DMA. If it has, then > > >speed will be much slower as there will now be the swiotlb cpu-copy > > >overhead. > > > > Any hints on how to do this? ;)They are fine. They use the SG DMA API: konrad@phenom:~/linux/drivers/ata$ grep "dma_map" * libata-core.c: n_elem = dma_map_sg(ap->dev, qc->sg, qc->n_elem, qc->dma_dir);> > > > The kernel modules in use for my SATA drives are ahci and sata_mv. > > There are 6 drives in total on the system. > > > > sda + sdb = RAID1 > > sd[c-f] = RAID6 > > > > sda, sdb, sdc and sdd are on the onboard SATA controller (ahci) > > sde, sdf are on the sata_mv 4x PCIe controller. > > > > Can you try using only the disks on the ahci controller? > > sata_mv is known to be buggy and problematic.. > I''m not sure if that''s the case here, but if you''re able to easily > try using only ahci, it''s worth a shot. > > -- Pasi > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:> On 20/02/2013 8:49 PM, Steven Haigh wrote: > >On 20/02/2013 7:49 PM, Steven Haigh wrote: > >>On 20/02/2013 7:26 PM, Roger Pau Monné wrote: > >>>On 20/02/13 03:10, Steven Haigh wrote: > >>>>Hi guys, > >>>> > >>>>Firstly, please CC me in to any replies as I''m not a subscriber these > >>>>days. > >>>> > >>>>I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to > >>>>achieve more than ~50Mb/sec sustained sequential write to a disk. The > >>>>DomU is configured as such: > >>> > >>>Since you mention 4.2.1 explicitly, is this a performance regression > >>>from previous versions? (4.2.0 or the 4.1 branch) > >> > >>This is actually a very good question. I''ve reinstalled my older > >>packages of Xen 4.1.3 back on the system. Rebooting into the new > >>hypervisor, then starting the single DomU again. Ran bonnie++ again on > >>the DomU: > >> > >>Still around 50Mb/sec - so this doesn''t seem to be a regression, but > >>something else? > > > >I''ve actually done a bit of thinking about this... A recent thread on > >linux-raid kernel mailing list about Xen and DomU throughput made me > >revisit my setup. I know I used to be able to saturate GigE both ways > >(send and receive) to the samba share served by this DomU. This would > >mean I''d get at least 90-100Mbyte/sec. What exact config and kernel/xen > >versions this was as this point in time I cannot say. > > > >As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk > >size. This seemed to make rebuild/resync speeds way worse - so I > >reverted to 128Kb chunk size. > > > >The benchmarks I am getting from the Dom0 is about what I''d expect - but > >I wouldn''t expect to lose 130Mb/sec write speed to the phy:/ pass > >through of the LV. > > > > From my known config where I could saturate the GigE connection, I have > >changed from kernel 2.6.32 (Jeremy''s git repo) to the latest vanilla > >kernels - currently 3.7.9. > > > >My build of Xen 4.2.1 also has all of the recent security advisories > >patched as well. Although it is interesting to note that downgrading to > >Xen 4.1.2 made no difference to write speeds. > > > > Just wondering if there is any further news or tests that I might be > able to do on this?So usually the problem like this is to unpeel the layers and find out which of them is at fault. You have a stacked block system - LVM on top of RAID6 on top of block devices. To figure out who is interferring with the speeds I would recommend you fault one of the RAID6 disks (so take it out of the RAID6). Pass it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then run ''fio''. Run ''fio'' as well in dom0 on the /dev/sdX and check whether the write performance is different. This is how I how do it: [/dev/xvdXXX] rw=write direct=1 size=4g ioengine=libaio iodepth=32 Then progress up the stack. Try sticking the disk back in RAID6 and doing it on the RAID6. Then on the LVM and so on.
On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:> On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote: >> On 20/02/2013 8:49 PM, Steven Haigh wrote: >>> On 20/02/2013 7:49 PM, Steven Haigh wrote: >>>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote: >>>>> On 20/02/13 03:10, Steven Haigh wrote: >>>>>> Hi guys, >>>>>> >>>>>> Firstly, please CC me in to any replies as I''m not a subscriber these >>>>>> days. >>>>>> >>>>>> I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to >>>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The >>>>>> DomU is configured as such: >>>>> >>>>> Since you mention 4.2.1 explicitly, is this a performance regression >>>> >from previous versions? (4.2.0 or the 4.1 branch) >>>> >>>> This is actually a very good question. I''ve reinstalled my older >>>> packages of Xen 4.1.3 back on the system. Rebooting into the new >>>> hypervisor, then starting the single DomU again. Ran bonnie++ again on >>>> the DomU: >>>> >>>> Still around 50Mb/sec - so this doesn''t seem to be a regression, but >>>> something else? >>> >>> I''ve actually done a bit of thinking about this... A recent thread on >>> linux-raid kernel mailing list about Xen and DomU throughput made me >>> revisit my setup. I know I used to be able to saturate GigE both ways >>> (send and receive) to the samba share served by this DomU. This would >>> mean I''d get at least 90-100Mbyte/sec. What exact config and kernel/xen >>> versions this was as this point in time I cannot say. >>> >>> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk >>> size. This seemed to make rebuild/resync speeds way worse - so I >>> reverted to 128Kb chunk size. >>> >>> The benchmarks I am getting from the Dom0 is about what I''d expect - but >>> I wouldn''t expect to lose 130Mb/sec write speed to the phy:/ pass >>> through of the LV. >>> >>> From my known config where I could saturate the GigE connection, I have >>> changed from kernel 2.6.32 (Jeremy''s git repo) to the latest vanilla >>> kernels - currently 3.7.9. >>> >>> My build of Xen 4.2.1 also has all of the recent security advisories >>> patched as well. Although it is interesting to note that downgrading to >>> Xen 4.1.2 made no difference to write speeds. >>> >> >> Just wondering if there is any further news or tests that I might be >> able to do on this? > > So usually the problem like this is to unpeel the layers and find out > which of them is at fault. You have a stacked block system - LVM on > top of RAID6 on top of block devices. > > To figure out who is interferring with the speeds I would recommend > you fault one of the RAID6 disks (so take it out of the RAID6). Pass > it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then > run ''fio''. Run ''fio'' as well in dom0 on the /dev/sdX and check > whether the write performance is different. > > This is how I how do it: > > [/dev/xvdXXX] > rw=write > direct=1 > size=4g > ioengine=libaio > iodepth=32 > > Then progress up the stack. Try sticking the disk back in RAID6 > and doing it on the RAID6. Then on the LVM and so on.I did try to peel it back a single layer at a time. My test was simply using the same XFS filesystem in the Dom0 instead of the DomU. I tested the underlying LVM config by mounting /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a benchmark: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xenhost.lan.crc. 2G 667 96 186976 21 80430 14 956 95 290591 26 373.7 8 Latency 26416us 212ms 168ms 35494us 35989us 83759us Version 1.96 ------Sequential Create------ --------Random Create-------- xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 14901 32 +++++ +++ 19672 39 15307 34 +++++ +++ 18158 37 Latency 17838us 141us 298us 365us 133us 296us ~186Mb/sec write, ~290Mb/sec read. Awesome. I then started a single DomU which gets passed /dev/vg_raid6/fileshare through as xvdb. It is then mounted in /mnt/fileshare/. I then ran bonnie++ again in the DomU: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 658 96 50618 8 42398 10 1138 99 267568 30 494.9 11 Latency 22959us 226ms 311ms 14617us 41816us 72814us Version 1.96 ------Sequential Create------ --------Random Create-------- zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 21749 59 +++++ +++ 31089 73 23283 64 +++++ +++ 31114 75 Latency 18989us 164us 928us 480us 26us 87us ~50Mb/sec write, ~267Mb/sec read. Not so awesome. As such, the filesystem, RAID6, etc are completely unchanged. The only change is the access method Dom0 vs DomU. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Sat, Mar 09, 2013 at 09:30:54AM +1100, Steven Haigh wrote:> On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote: > >On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote: > >>On 20/02/2013 8:49 PM, Steven Haigh wrote: > >>>On 20/02/2013 7:49 PM, Steven Haigh wrote: > >>>>On 20/02/2013 7:26 PM, Roger Pau Monné wrote: > >>>>>On 20/02/13 03:10, Steven Haigh wrote: > >>>>>>Hi guys, > >>>>>> > >>>>>>Firstly, please CC me in to any replies as I''m not a subscriber these > >>>>>>days. > >>>>>> > >>>>>>I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to > >>>>>>achieve more than ~50Mb/sec sustained sequential write to a disk. The > >>>>>>DomU is configured as such: > >>>>> > >>>>>Since you mention 4.2.1 explicitly, is this a performance regression > >>>>>from previous versions? (4.2.0 or the 4.1 branch) > >>>> > >>>>This is actually a very good question. I''ve reinstalled my older > >>>>packages of Xen 4.1.3 back on the system. Rebooting into the new > >>>>hypervisor, then starting the single DomU again. Ran bonnie++ again on > >>>>the DomU: > >>>> > >>>>Still around 50Mb/sec - so this doesn''t seem to be a regression, but > >>>>something else? > >>> > >>>I''ve actually done a bit of thinking about this... A recent thread on > >>>linux-raid kernel mailing list about Xen and DomU throughput made me > >>>revisit my setup. I know I used to be able to saturate GigE both ways > >>>(send and receive) to the samba share served by this DomU. This would > >>>mean I''d get at least 90-100Mbyte/sec. What exact config and kernel/xen > >>>versions this was as this point in time I cannot say. > >>> > >>>As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk > >>>size. This seemed to make rebuild/resync speeds way worse - so I > >>>reverted to 128Kb chunk size. > >>> > >>>The benchmarks I am getting from the Dom0 is about what I''d expect - but > >>>I wouldn''t expect to lose 130Mb/sec write speed to the phy:/ pass > >>>through of the LV. > >>> > >>> From my known config where I could saturate the GigE connection, I have > >>>changed from kernel 2.6.32 (Jeremy''s git repo) to the latest vanilla > >>>kernels - currently 3.7.9. > >>> > >>>My build of Xen 4.2.1 also has all of the recent security advisories > >>>patched as well. Although it is interesting to note that downgrading to > >>>Xen 4.1.2 made no difference to write speeds. > >>> > >> > >>Just wondering if there is any further news or tests that I might be > >>able to do on this? > > > >So usually the problem like this is to unpeel the layers and find out > >which of them is at fault. You have a stacked block system - LVM on > >top of RAID6 on top of block devices. > > > >To figure out who is interferring with the speeds I would recommend > >you fault one of the RAID6 disks (so take it out of the RAID6). Pass > >it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then > >run ''fio''. Run ''fio'' as well in dom0 on the /dev/sdX and check > >whether the write performance is different. > > > >This is how I how do it: > > > >[/dev/xvdXXX] > >rw=write > >direct=1 > >size=4g > >ioengine=libaio > >iodepth=32 > > > >Then progress up the stack. Try sticking the disk back in RAID6 > >and doing it on the RAID6. Then on the LVM and so on. > > I did try to peel it back a single layer at a time. My test was > simply using the same XFS filesystem in the Dom0 instead of the > DomU.Right, you are using a filesystem. That is another layer :-) And depending on what version of QEMU you have you might be using QEMU as the block PV backend instead of the kernel one. There were versions of QEMU that had highly inferior performance. Hence I was thinking of just using a raw disk to test that.> > I tested the underlying LVM config by mounting > /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as > a benchmark:So still filesystem. Fio can do it on a block level. What does ''xenstore-ls'' show you and ''losetup -a''? I am really curious as to where that file you are providing to the guest as disk is being handled via ''loop'' or via ''QEMU''.
On 12/03/2013 12:30 AM, Konrad Rzeszutek Wilk wrote:> On Sat, Mar 09, 2013 at 09:30:54AM +1100, Steven Haigh wrote: >> On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote: >>> On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote: >>>> On 20/02/2013 8:49 PM, Steven Haigh wrote: >>>>> On 20/02/2013 7:49 PM, Steven Haigh wrote: >>>>>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote: >>>>>>> On 20/02/13 03:10, Steven Haigh wrote: >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> Firstly, please CC me in to any replies as I''m not a subscriber these >>>>>>>> days. >>>>>>>> >>>>>>>> I''ve been trying to debug a problem with Xen 4.2.1 where I am unable to >>>>>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The >>>>>>>> DomU is configured as such: >>>>>>> >>>>>>> Since you mention 4.2.1 explicitly, is this a performance regression >>>>>> >from previous versions? (4.2.0 or the 4.1 branch) >>>>>> >>>>>> This is actually a very good question. I''ve reinstalled my older >>>>>> packages of Xen 4.1.3 back on the system. Rebooting into the new >>>>>> hypervisor, then starting the single DomU again. Ran bonnie++ again on >>>>>> the DomU: >>>>>> >>>>>> Still around 50Mb/sec - so this doesn''t seem to be a regression, but >>>>>> something else? >>>>> >>>>> I''ve actually done a bit of thinking about this... A recent thread on >>>>> linux-raid kernel mailing list about Xen and DomU throughput made me >>>>> revisit my setup. I know I used to be able to saturate GigE both ways >>>>> (send and receive) to the samba share served by this DomU. This would >>>>> mean I''d get at least 90-100Mbyte/sec. What exact config and kernel/xen >>>>> versions this was as this point in time I cannot say. >>>>> >>>>> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk >>>>> size. This seemed to make rebuild/resync speeds way worse - so I >>>>> reverted to 128Kb chunk size. >>>>> >>>>> The benchmarks I am getting from the Dom0 is about what I''d expect - but >>>>> I wouldn''t expect to lose 130Mb/sec write speed to the phy:/ pass >>>>> through of the LV. >>>>> >>>>> From my known config where I could saturate the GigE connection, I have >>>>> changed from kernel 2.6.32 (Jeremy''s git repo) to the latest vanilla >>>>> kernels - currently 3.7.9. >>>>> >>>>> My build of Xen 4.2.1 also has all of the recent security advisories >>>>> patched as well. Although it is interesting to note that downgrading to >>>>> Xen 4.1.2 made no difference to write speeds. >>>>> >>>> >>>> Just wondering if there is any further news or tests that I might be >>>> able to do on this? >>> >>> So usually the problem like this is to unpeel the layers and find out >>> which of them is at fault. You have a stacked block system - LVM on >>> top of RAID6 on top of block devices. >>> >>> To figure out who is interferring with the speeds I would recommend >>> you fault one of the RAID6 disks (so take it out of the RAID6). Pass >>> it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then >>> run ''fio''. Run ''fio'' as well in dom0 on the /dev/sdX and check >>> whether the write performance is different. >>> >>> This is how I how do it: >>> >>> [/dev/xvdXXX] >>> rw=write >>> direct=1 >>> size=4g >>> ioengine=libaio >>> iodepth=32 >>> >>> Then progress up the stack. Try sticking the disk back in RAID6 >>> and doing it on the RAID6. Then on the LVM and so on. >> >> I did try to peel it back a single layer at a time. My test was >> simply using the same XFS filesystem in the Dom0 instead of the >> DomU. > > Right, you are using a filesystem. That is another layer :-) > > And depending on what version of QEMU you have you might be using > QEMU as the block PV backend instead of the kernel one. There > were versions of QEMU that had highly inferior performance. > > Hence I was thinking of just using a raw disk to test that. > >> >> I tested the underlying LVM config by mounting >> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as >> a benchmark: > > > So still filesystem. Fio can do it on a block level. > > What does ''xenstore-ls'' show you and ''losetup -a''? I am really > curious as to where that file you are providing to the guest as > disk is being handled via ''loop'' or via ''QEMU''. >I''ve picked out what I believe is the most relevant from xenstore-ls that belongs to the DomU in question: 1 = "" 51712 = "" domain = "zeus.vm" frontend = "/local/domain/1/device/vbd/51712" uuid = "3aa72be1-0e83-1ee2-a346-8ccef71e9d34" bootable = "1" dev = "xvda" state = "4" params = "/dev/RAID1/zeus.vm" mode = "w" online = "1" frontend-id = "1" type = "phy" physical-device = "fd:6" hotplug-status = "connected" feature-flush-cache = "1" feature-discard = "0" feature-barrier = "1" feature-persistent = "1" sectors = "135397376" info = "0" sector-size = "512" 51728 = "" domain = "zeus.vm" frontend = "/local/domain/1/device/vbd/51728" uuid = "28375672-321c-0e33-4549-d64ee4daadec" bootable = "0" dev = "xvdb" state = "4" params = "/dev/vg_raid6/fileshare" mode = "w" online = "1" frontend-id = "1" type = "phy" physical-device = "fd:5" hotplug-status = "connected" feature-flush-cache = "1" feature-discard = "0" feature-barrier = "1" feature-persistent = "1" sectors = "5368709120" info = "0" sector-size = "512" losetup -a returns nothing. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> >So still filesystem. Fio can do it on a block level. > > > >What does ''xenstore-ls'' show you and ''losetup -a''? I am really > >curious as to where that file you are providing to the guest as > >disk is being handled via ''loop'' or via ''QEMU''. > > > > I''ve picked out what I believe is the most relevant from xenstore-ls > that belongs to the DomU in question:Great. .. snip..> params = "/dev/vg_raid6/fileshare" > mode = "w" > online = "1" > frontend-id = "1" > type = "phy" > physical-device = "fd:5" > hotplug-status = "connected" > feature-flush-cache = "1" > feature-discard = "0" > feature-barrier = "1" > feature-persistent = "1" > sectors = "5368709120" > info = "0" > sector-size = "512"OK, so the flow of data from the guest is: bonnie++ -> FS -> xen-blkfront -> xen-blkback -> LVM -> RAID6 -> multiple disks. Any way you can restructure this to be: fio -> xen-blkfront -> xen-blkback -> one disk from the raid. to see if the issue is between "LVM -> RAID6" or the "bonnie++ -> FS" part? Is the cpu load quite high when you do these writes? What are the RAID6 disks you have? How many?
On 13/03/13 00:04, Konrad Rzeszutek Wilk wrote:>>> So still filesystem. Fio can do it on a block level. >>> >>> What does ''xenstore-ls'' show you and ''losetup -a''? I am really >>> curious as to where that file you are providing to the guest as >>> disk is being handled via ''loop'' or via ''QEMU''. >>> >> >> I''ve picked out what I believe is the most relevant from xenstore-ls >> that belongs to the DomU in question: > > Great. > .. snip.. >> params = "/dev/vg_raid6/fileshare" >> mode = "w" >> online = "1" >> frontend-id = "1" >> type = "phy" >> physical-device = "fd:5" >> hotplug-status = "connected" >> feature-flush-cache = "1" >> feature-discard = "0" >> feature-barrier = "1" >> feature-persistent = "1" >> sectors = "5368709120" >> info = "0" >> sector-size = "512" > > OK, so the flow of data from the guest is: > bonnie++ -> FS -> xen-blkfront -> xen-blkback -> LVM -> RAID6 -> multiple disks. > > Any way you can restructure this to be: > > fio -> xen-blkfront -> xen-blkback -> one disk from the raid. > > > to see if the issue is between "LVM -> RAID6" or the "bonnie++ -> FS" part? > Is the cpu load quite high when you do these writes?Maybe I''m missing something, but running this directly from the Dom0 would give a result of: bonnie++ -> FS -> LVM -> RAID6 These figures were well over 200Mb/sec read and well over 100Mb/sec write. This only takes out the xen-blkfront and xen-blkback - which I thought was the aim? Or is the point of this to make sure that we can replicate it with a single disk and that it isn''t some weird interaction between blkfront/blkback and the LVM/RAID6? CPU Usage doesn''t seem to be a limiting factor. I certainly don''t see massive loads for writing.> > What are the RAID6 disks you have? How many?The RAID6 is made up of 4 x 2Tb 7200RPM Seagate SATA drives... Model Family: Seagate SV35 Device Model: ST2000VX000-9YW164 Serial Number: Z1E10QQJ LU WWN Device Id: 5 000c50 04dd3a1f1 Firmware Version: CV13 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Then in /proc/mdstat: md2 : active raid6 sdd[4] sdc[0] sdf[5] sde[1] 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU] I decided to use whole disks so that I don''t run into alignment issues. The VG is using 4Mb extents, so that should be fine too: # vgdisplay vg_raid6 --- Volume group --- VG Name vg_raid6 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 7 VG Access read/write VG Status resizable MAX LV 0 Cur LV 5 Open LV 5 Max PV 0 Cur PV 1 Act PV 1 VG Size 3.64 TiB PE Size 4.00 MiB Total PE 953863 Alloc PE / Size 688640 / 2.63 TiB Free PE / Size 265223 / 1.01 TiB VG UUID md7G8X-F2mT-JBQa-f5qm-TN4O-kOqs-KWHGR1 -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299
On 24/03/13 18:12, Steven Haigh wrote:> In fact, I just thought of something else.... I have an eSATA caddy that > connects to the same SATA controller. With this, I can slot any SATA > drive into it - and I should easily be able to pass this to any DomU. > > I''ll throw in a 1Tb SATA drive do that I don''t have to break the > existing RAID6 array - as the testing on this drive can be destructive > testing - as otherwise the drive is blank.Disk info: Model Family: Seagate Barracuda 7200.12 Device Model: ST31000528AS Serial Number: 9VP3BE9W LU WWN Device Id: 5 000c50 01a238fd0 Firmware Version: CC49 User Capacity: 1,000,203,804,160 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Results... Dom0 (host machine): # dd if=/dev/zero of=/dev/sdi bs=1M count=4096 oflag=direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 33.6909 s, 127 MB/s Created an ext4 filesystem on /dev/sdi1... # mkfs.ext4 -j /dev/sdi1 Run bonnie++ on the filesystem: # mount /dev/sdi1 /mnt/esata # cd /mnt/esata/ # bonnie++ -u 0:0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xenhost.lan.crc. 2G 433 95 119107 22 36723 7 960 95 145026 12 191.9 4 Latency 33231us 39824us 211ms 31466us 17459us 5073ms Version 1.96 ------Sequential Create------ --------Random Create-------- xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 285us 642us 315us 217us 349us 127us We get ~145Mb/sec block read, ~119Mb/sec block write. Now, lets pass the whole device through to a DomU. # xm block-attach zeus.vm phy:/dev/sdi xvdc w From the DomU now: Firstly, the same dd as above: # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 33.6708 s, 128 MB/s Create the ext4 filesystem again: # mkfs.ext4 -j /dev/xvdc1 Run bonnie++ on the DomU: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 387 99 121891 24 47759 14 992 98 141103 17 248.9 7 Latency 40518us 126ms 152ms 47174us 30061us 250ms Version 1.96 ------Sequential Create------ --------Random Create-------- zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 174us 839us 2249us 113us 42us 185us Interesting. We''re at almost full speed in the DomU. 121Mb/sec write, 141Mb/sec read. So my wonder is now... Why when put in a RAID6 do we have a 180Mb/sec+ from the Dom0, but only 50Mb/sec from the DomU of the same filesystem... Any further testing that may indicate something? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299
On 24/03/13 20:10, Steven Haigh wrote:> So my wonder is now... Why when put in a RAID6 do we have a 180Mb/sec+ > from the Dom0, but only 50Mb/sec from the DomU of the same filesystem...I should actually clarify that this is 180Mb/sec write speed from the Dom0 and 50Mb/sec from the DomU. Really not quite sure why this is. The filesystem in question is XFS - the tests in my previous post were on ext4.> > Any further testing that may indicate something? >-- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299
So, based on my tests yesterday, I decided to break the RAID6 and pull a drive out of it to test directly on the 2Tb drives in question. The array in question: # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU] # mdadm /dev/md2 --fail /dev/sdf mdadm: set /dev/sdf faulty in /dev/md2 # mdadm /dev/md2 --remove /dev/sdf mdadm: hot removed /dev/sdf from /dev/md2 So, all tests are to be done on /dev/sdf. Model Family: Seagate SV35 Device Model: ST2000VX000-9YW164 Serial Number: Z1E17C3X LU WWN Device Id: 5 000c50 04e1bc6f0 Firmware Version: CV13 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical From the Dom0: # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s Create a single partition on the drive, and format it with ext4: Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x98d8baaf Device Boot Start End Blocks Id System /dev/sdf1 2048 3907029167 1953513560 83 Linux Command (m for help): w # mkfs.ext4 -j /dev/sdf1 ...... Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Mount it on the Dom0: # mount /dev/sdf1 /mnt/esata/ # cd /mnt/esata/ # bonnie++ -d . -u 0:0 .... Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 17 296.4 6 Latency 70971us 190ms 221ms 40369us 17657us 164ms So from the Dom0: 133Mb/sec write, 209Mb/sec read. Now, I''ll attach the full disk to a DomU: # xm block-attach zeus.vm phy:/dev/sdf xvdc w And we''ll test from the DomU. # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s Partition the same as in the Dom0 and create an ext4 filesystem on it: I notice something interesting here. In the Dom0, the device is seen as: Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes In the DomU, it is seen as: Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Not sure if this could be related - but continuing testing: Device Boot Start End Blocks Id System /dev/xvdc1 2048 3907029167 1953513560 83 Linux # mkfs.ext4 -j /dev/xvdc1 .... Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done # mount /dev/xvdc1 /mnt/esata/ # cd /mnt/esata/ # bonnie++ -d . -u 0:0 .... Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 23 313.4 9 Latency 34615us 130ms 128ms 33316us 74401us 130ms So still... 116Mb/sec write, 176Mb/sec read to the physical device from the DomU. More than acceptable. It leaves me to wonder.... Could there be something in the Dom0 seeing the drives as 4096 byte sectors, but the DomU seeing it as 512 byte sectors cause an issue? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299
On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:> So, based on my tests yesterday, I decided to break the RAID6 and > pull a drive out of it to test directly on the 2Tb drives in > question. > > The array in question: > # cat /proc/mdstat > Personalities : [raid1] [raid6] [raid5] [raid4] > md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] > 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 > [4/4] [UUUU] > > # mdadm /dev/md2 --fail /dev/sdf > mdadm: set /dev/sdf faulty in /dev/md2 > # mdadm /dev/md2 --remove /dev/sdf > mdadm: hot removed /dev/sdf from /dev/md2 > > So, all tests are to be done on /dev/sdf. > Model Family: Seagate SV35 > Device Model: ST2000VX000-9YW164 > Serial Number: Z1E17C3X > LU WWN Device Id: 5 000c50 04e1bc6f0 > Firmware Version: CV13 > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > > From the Dom0: > # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s > > Create a single partition on the drive, and format it with ext4: > Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes > 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors > Units = sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > Disk identifier: 0x98d8baaf > > Device Boot Start End Blocks Id System > /dev/sdf1 2048 3907029167 1953513560 83 Linux > > Command (m for help): w > > # mkfs.ext4 -j /dev/sdf1 > ...... > Writing inode tables: done > Creating journal (32768 blocks): done > Writing superblocks and filesystem accounting information: done > > Mount it on the Dom0: > # mount /dev/sdf1 /mnt/esata/ > # cd /mnt/esata/ > # bonnie++ -d . -u 0:0 > .... > Version 1.96 ------Sequential Output------ --Sequential > Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > %CP /sec %CP > xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 > 17 296.4 6 > Latency 70971us 190ms 221ms 40369us 17657us > 164ms > > So from the Dom0: 133Mb/sec write, 209Mb/sec read. > > Now, I''ll attach the full disk to a DomU: > # xm block-attach zeus.vm phy:/dev/sdf xvdc w > > And we''ll test from the DomU. > > # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s > > Partition the same as in the Dom0 and create an ext4 filesystem on it: > > I notice something interesting here. In the Dom0, the device is seen as: > Units = sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > > In the DomU, it is seen as: > Units = sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 512 bytes > I/O size (minimum/optimal): 512 bytes / 512 bytes > > Not sure if this could be related - but continuing testing: > Device Boot Start End Blocks Id System > /dev/xvdc1 2048 3907029167 1953513560 83 Linux > > # mkfs.ext4 -j /dev/xvdc1 > .... > Allocating group tables: done > Writing inode tables: done > Creating journal (32768 blocks): done > Writing superblocks and filesystem accounting information: done > > # mount /dev/xvdc1 /mnt/esata/ > # cd /mnt/esata/ > # bonnie++ -d . -u 0:0 > .... > Version 1.96 ------Sequential Output------ --Sequential > Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > %CP /sec %CP > zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 > 23 313.4 9 > Latency 34615us 130ms 128ms 33316us 74401us > 130ms > > So still... 116Mb/sec write, 176Mb/sec read to the physical device > from the DomU. More than acceptable. > > It leaves me to wonder.... Could there be something in the Dom0 > seeing the drives as 4096 byte sectors, but the DomU seeing it as > 512 byte sectors cause an issue?There is certain overhead in it. I still have this in my mailbox so I am not sure whether this issue got ever resolved? I know that the indirect patches in Xen blkback and xen blkfront are meant to resolve some of these issues - by being able to carry a bigger payload. Did you ever try v3.11 kernel in both dom0 and domU? Thanks.> > -- > Steven Haigh > > Email: netwiz@crc.id.au > Web: https://www.crc.id.au > Phone: (03) 9001 6090 - 0412 935 897 > Fax: (03) 8338 0299
On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote: >> So, based on my tests yesterday, I decided to break the RAID6 and >> pull a drive out of it to test directly on the 2Tb drives in >> question. >> >> The array in question: >> # cat /proc/mdstat >> Personalities : [raid1] [raid6] [raid5] [raid4] >> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] >> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> [4/4] [UUUU] >> >> # mdadm /dev/md2 --fail /dev/sdf >> mdadm: set /dev/sdf faulty in /dev/md2 >> # mdadm /dev/md2 --remove /dev/sdf >> mdadm: hot removed /dev/sdf from /dev/md2 >> >> So, all tests are to be done on /dev/sdf. >> Model Family: Seagate SV35 >> Device Model: ST2000VX000-9YW164 >> Serial Number: Z1E17C3X >> LU WWN Device Id: 5 000c50 04e1bc6f0 >> Firmware Version: CV13 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> >> From the Dom0: >> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct >> 4096+0 records in >> 4096+0 records out >> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s >> >> Create a single partition on the drive, and format it with ext4: >> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes >> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> Disk identifier: 0x98d8baaf >> >> Device Boot Start End Blocks Id System >> /dev/sdf1 2048 3907029167 1953513560 83 Linux >> >> Command (m for help): w >> >> # mkfs.ext4 -j /dev/sdf1 >> ...... >> Writing inode tables: done >> Creating journal (32768 blocks): done >> Writing superblocks and filesystem accounting information: done >> >> Mount it on the Dom0: >> # mount /dev/sdf1 /mnt/esata/ >> # cd /mnt/esata/ >> # bonnie++ -d . -u 0:0 >> .... >> Version 1.96 ------Sequential Output------ --Sequential >> Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >> --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >> %CP /sec %CP >> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 >> 17 296.4 6 >> Latency 70971us 190ms 221ms 40369us 17657us >> 164ms >> >> So from the Dom0: 133Mb/sec write, 209Mb/sec read. >> >> Now, I''ll attach the full disk to a DomU: >> # xm block-attach zeus.vm phy:/dev/sdf xvdc w >> >> And we''ll test from the DomU. >> >> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct >> 4096+0 records in >> 4096+0 records out >> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s >> >> Partition the same as in the Dom0 and create an ext4 filesystem on it: >> >> I notice something interesting here. In the Dom0, the device is seen as: >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> >> In the DomU, it is seen as: >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 512 bytes >> I/O size (minimum/optimal): 512 bytes / 512 bytes >> >> Not sure if this could be related - but continuing testing: >> Device Boot Start End Blocks Id System >> /dev/xvdc1 2048 3907029167 1953513560 83 Linux >> >> # mkfs.ext4 -j /dev/xvdc1 >> .... >> Allocating group tables: done >> Writing inode tables: done >> Creating journal (32768 blocks): done >> Writing superblocks and filesystem accounting information: done >> >> # mount /dev/xvdc1 /mnt/esata/ >> # cd /mnt/esata/ >> # bonnie++ -d . -u 0:0 >> .... >> Version 1.96 ------Sequential Output------ --Sequential >> Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >> --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >> %CP /sec %CP >> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 >> 23 313.4 9 >> Latency 34615us 130ms 128ms 33316us 74401us >> 130ms >> >> So still... 116Mb/sec write, 176Mb/sec read to the physical device >> from the DomU. More than acceptable. >> >> It leaves me to wonder.... Could there be something in the Dom0 >> seeing the drives as 4096 byte sectors, but the DomU seeing it as >> 512 byte sectors cause an issue? > > There is certain overhead in it. I still have this in my mailbox > so I am not sure whether this issue got ever resolved? I know that the > indirect patches in Xen blkback and xen blkfront are meant to resolve > some of these issues - by being able to carry a bigger payload. > > Did you ever try v3.11 kernel in both dom0 and domU? Thanks.Hi Konrad, I don''t believe I ever fixed it - however I haven''t tried kernel 3.11 in Dom0 OR DomU... I''ll keep this in my inbox and try to build a 3.11 kernel for both in the near future for testing... -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299
On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote: >> So, based on my tests yesterday, I decided to break the RAID6 and >> pull a drive out of it to test directly on the 2Tb drives in >> question. >> >> The array in question: >> # cat /proc/mdstat >> Personalities : [raid1] [raid6] [raid5] [raid4] >> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] >> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> [4/4] [UUUU] >> >> # mdadm /dev/md2 --fail /dev/sdf >> mdadm: set /dev/sdf faulty in /dev/md2 >> # mdadm /dev/md2 --remove /dev/sdf >> mdadm: hot removed /dev/sdf from /dev/md2 >> >> So, all tests are to be done on /dev/sdf. >> Model Family: Seagate SV35 >> Device Model: ST2000VX000-9YW164 >> Serial Number: Z1E17C3X >> LU WWN Device Id: 5 000c50 04e1bc6f0 >> Firmware Version: CV13 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> >> From the Dom0: >> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct >> 4096+0 records in >> 4096+0 records out >> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s >> >> Create a single partition on the drive, and format it with ext4: >> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes >> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> Disk identifier: 0x98d8baaf >> >> Device Boot Start End Blocks Id System >> /dev/sdf1 2048 3907029167 1953513560 83 Linux >> >> Command (m for help): w >> >> # mkfs.ext4 -j /dev/sdf1 >> ...... >> Writing inode tables: done >> Creating journal (32768 blocks): done >> Writing superblocks and filesystem accounting information: done >> >> Mount it on the Dom0: >> # mount /dev/sdf1 /mnt/esata/ >> # cd /mnt/esata/ >> # bonnie++ -d . -u 0:0 >> .... >> Version 1.96 ------Sequential Output------ --Sequential >> Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >> --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >> %CP /sec %CP >> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 >> 17 296.4 6 >> Latency 70971us 190ms 221ms 40369us 17657us >> 164ms >> >> So from the Dom0: 133Mb/sec write, 209Mb/sec read. >> >> Now, I''ll attach the full disk to a DomU: >> # xm block-attach zeus.vm phy:/dev/sdf xvdc w >> >> And we''ll test from the DomU. >> >> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct >> 4096+0 records in >> 4096+0 records out >> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s >> >> Partition the same as in the Dom0 and create an ext4 filesystem on it: >> >> I notice something interesting here. In the Dom0, the device is seen as: >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> >> In the DomU, it is seen as: >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 512 bytes >> I/O size (minimum/optimal): 512 bytes / 512 bytes >> >> Not sure if this could be related - but continuing testing: >> Device Boot Start End Blocks Id System >> /dev/xvdc1 2048 3907029167 1953513560 83 Linux >> >> # mkfs.ext4 -j /dev/xvdc1 >> .... >> Allocating group tables: done >> Writing inode tables: done >> Creating journal (32768 blocks): done >> Writing superblocks and filesystem accounting information: done >> >> # mount /dev/xvdc1 /mnt/esata/ >> # cd /mnt/esata/ >> # bonnie++ -d . -u 0:0 >> .... >> Version 1.96 ------Sequential Output------ --Sequential >> Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >> --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >> %CP /sec %CP >> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 >> 23 313.4 9 >> Latency 34615us 130ms 128ms 33316us 74401us >> 130ms >> >> So still... 116Mb/sec write, 176Mb/sec read to the physical device >> from the DomU. More than acceptable. >> >> It leaves me to wonder.... Could there be something in the Dom0 >> seeing the drives as 4096 byte sectors, but the DomU seeing it as >> 512 byte sectors cause an issue? > > There is certain overhead in it. I still have this in my mailbox > so I am not sure whether this issue got ever resolved? I know that the > indirect patches in Xen blkback and xen blkfront are meant to resolve > some of these issues - by being able to carry a bigger payload. > > Did you ever try v3.11 kernel in both dom0 and domU? Thanks.Ok, so I finally got around to building kernel 3.11 RPMs today for testing. I upgraded both the Dom0 and DomU to the same kernel: DomU: # dmesg | grep blkfront blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled; blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled; Looks good. Transfer tests using bonnie++ as per before: # bonnie -d . -u 0:0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30 492.3 13 Latency 27305us 124ms 158ms 34222us 16865us 374ms Version 1.96 ------Sequential Create------ --------Random Create-------- zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++ 18389 31 Latency 17775us 154us 180us 16008us 38us 58us Still seems to be a massive discrepancy between Dom0 and DomU write speeds. Interesting is that sequential block reads are nearly 300MB/sec, yet sequential writes were only ~58MB/sec. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote: > > On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote: > >> So, based on my tests yesterday, I decided to break the RAID6 and > >> pull a drive out of it to test directly on the 2Tb drives in > >> question. > >> > >> The array in question: > >> # cat /proc/mdstat > >> Personalities : [raid1] [raid6] [raid5] [raid4] > >> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] > >> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 > >> [4/4] [UUUU] > >> > >> # mdadm /dev/md2 --fail /dev/sdf > >> mdadm: set /dev/sdf faulty in /dev/md2 > >> # mdadm /dev/md2 --remove /dev/sdf > >> mdadm: hot removed /dev/sdf from /dev/md2 > >> > >> So, all tests are to be done on /dev/sdf. > >> Model Family: Seagate SV35 > >> Device Model: ST2000VX000-9YW164 > >> Serial Number: Z1E17C3X > >> LU WWN Device Id: 5 000c50 04e1bc6f0 > >> Firmware Version: CV13 > >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] > >> Sector Sizes: 512 bytes logical, 4096 bytes physical > >> > >> From the Dom0: > >> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct > >> 4096+0 records in > >> 4096+0 records out > >> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s > >> > >> Create a single partition on the drive, and format it with ext4: > >> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes > >> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors > >> Units = sectors of 1 * 512 = 512 bytes > >> Sector size (logical/physical): 512 bytes / 4096 bytes > >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes > >> Disk identifier: 0x98d8baaf > >> > >> Device Boot Start End Blocks Id System > >> /dev/sdf1 2048 3907029167 1953513560 83 Linux > >> > >> Command (m for help): w > >> > >> # mkfs.ext4 -j /dev/sdf1 > >> ...... > >> Writing inode tables: done > >> Creating journal (32768 blocks): done > >> Writing superblocks and filesystem accounting information: done > >> > >> Mount it on the Dom0: > >> # mount /dev/sdf1 /mnt/esata/ > >> # cd /mnt/esata/ > >> # bonnie++ -d . -u 0:0 > >> .... > >> Version 1.96 ------Sequential Output------ --Sequential > >> Input- --Random- > >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > >> --Block-- --Seeks-- > >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > >> %CP /sec %CP > >> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 > >> 17 296.4 6 > >> Latency 70971us 190ms 221ms 40369us 17657us > >> 164ms > >> > >> So from the Dom0: 133Mb/sec write, 209Mb/sec read. > >> > >> Now, I''ll attach the full disk to a DomU: > >> # xm block-attach zeus.vm phy:/dev/sdf xvdc w > >> > >> And we''ll test from the DomU. > >> > >> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct > >> 4096+0 records in > >> 4096+0 records out > >> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s > >> > >> Partition the same as in the Dom0 and create an ext4 filesystem on it: > >> > >> I notice something interesting here. In the Dom0, the device is seen as: > >> Units = sectors of 1 * 512 = 512 bytes > >> Sector size (logical/physical): 512 bytes / 4096 bytes > >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes > >> > >> In the DomU, it is seen as: > >> Units = sectors of 1 * 512 = 512 bytes > >> Sector size (logical/physical): 512 bytes / 512 bytes > >> I/O size (minimum/optimal): 512 bytes / 512 bytes > >> > >> Not sure if this could be related - but continuing testing: > >> Device Boot Start End Blocks Id System > >> /dev/xvdc1 2048 3907029167 1953513560 83 Linux > >> > >> # mkfs.ext4 -j /dev/xvdc1 > >> .... > >> Allocating group tables: done > >> Writing inode tables: done > >> Creating journal (32768 blocks): done > >> Writing superblocks and filesystem accounting information: done > >> > >> # mount /dev/xvdc1 /mnt/esata/ > >> # cd /mnt/esata/ > >> # bonnie++ -d . -u 0:0 > >> .... > >> Version 1.96 ------Sequential Output------ --Sequential > >> Input- --Random- > >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > >> --Block-- --Seeks-- > >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > >> %CP /sec %CP > >> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 > >> 23 313.4 9 > >> Latency 34615us 130ms 128ms 33316us 74401us > >> 130ms > >> > >> So still... 116Mb/sec write, 176Mb/sec read to the physical device > >> from the DomU. More than acceptable. > >> > >> It leaves me to wonder.... Could there be something in the Dom0 > >> seeing the drives as 4096 byte sectors, but the DomU seeing it as > >> 512 byte sectors cause an issue? > > > > There is certain overhead in it. I still have this in my mailbox > > so I am not sure whether this issue got ever resolved? I know that the > > indirect patches in Xen blkback and xen blkfront are meant to resolve > > some of these issues - by being able to carry a bigger payload. > > > > Did you ever try v3.11 kernel in both dom0 and domU? Thanks. > > Ok, so I finally got around to building kernel 3.11 RPMs today for > testing. I upgraded both the Dom0 and DomU to the same kernel:Woohoo!> > DomU: > # dmesg | grep blkfront > blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; > indirect descriptors: enabled; > blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; > indirect descriptors: enabled; > > Looks good. > > Transfer tests using bonnie++ as per before: > # bonnie -d . -u 0:0 > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30 > 492.3 13 > Latency 27305us 124ms 158ms 34222us 16865us > 374ms > Version 1.96 ------Sequential Create------ --------Random > Create-------- > zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > /sec %CP > 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++ > 18389 31 > Latency 17775us 154us 180us 16008us 38us > 58us > > Still seems to be a massive discrepancy between Dom0 and DomU write > speeds. Interesting is that sequential block reads are nearly 300MB/sec, > yet sequential writes were only ~58MB/sec.OK, so the other thing that people were pointing out that is you can use xen-blkfront.max parameter. By default it is 32, but try 8. Or 64. Or 256. The indirect descriptor allows us to put more I/Os on the ring - and I am hoping that will: a) solve your problem b) not solve your problem, but demonstrate that the issue is not with the ring, but with something else making your writes slower. Hmm, are you by any chance using O_DIRECT when running bonnie++ in dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is done to not use the dom0 page cache - otherwise you end up with a double buffer where the writes are insane speed - but with absolutly no safety. If you want to try disabling that (so no O_DIRECT), I would do this little change: diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index bf4b9d2..823b629 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, break; case BLKIF_OP_WRITE: blkif->st_wr_req++; - operation = WRITE_ODIRECT; + operation = WRITE; break; case BLKIF_OP_WRITE_BARRIER: drain = true;
On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote:> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote: >> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote: >>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote: >>>> So, based on my tests yesterday, I decided to break the RAID6 and >>>> pull a drive out of it to test directly on the 2Tb drives in >>>> question. >>>> >>>> The array in question: >>>> # cat /proc/mdstat >>>> Personalities : [raid1] [raid6] [raid5] [raid4] >>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] >>>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 >>>> [4/4] [UUUU] >>>> >>>> # mdadm /dev/md2 --fail /dev/sdf >>>> mdadm: set /dev/sdf faulty in /dev/md2 >>>> # mdadm /dev/md2 --remove /dev/sdf >>>> mdadm: hot removed /dev/sdf from /dev/md2 >>>> >>>> So, all tests are to be done on /dev/sdf. >>>> Model Family: Seagate SV35 >>>> Device Model: ST2000VX000-9YW164 >>>> Serial Number: Z1E17C3X >>>> LU WWN Device Id: 5 000c50 04e1bc6f0 >>>> Firmware Version: CV13 >>>> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>> >>>> From the Dom0: >>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct >>>> 4096+0 records in >>>> 4096+0 records out >>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s >>>> >>>> Create a single partition on the drive, and format it with ext4: >>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes >>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors >>>> Units = sectors of 1 * 512 = 512 bytes >>>> Sector size (logical/physical): 512 bytes / 4096 bytes >>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >>>> Disk identifier: 0x98d8baaf >>>> >>>> Device Boot Start End Blocks Id System >>>> /dev/sdf1 2048 3907029167 1953513560 83 Linux >>>> >>>> Command (m for help): w >>>> >>>> # mkfs.ext4 -j /dev/sdf1 >>>> ...... >>>> Writing inode tables: done >>>> Creating journal (32768 blocks): done >>>> Writing superblocks and filesystem accounting information: done >>>> >>>> Mount it on the Dom0: >>>> # mount /dev/sdf1 /mnt/esata/ >>>> # cd /mnt/esata/ >>>> # bonnie++ -d . -u 0:0 >>>> .... >>>> Version 1.96 ------Sequential Output------ --Sequential >>>> Input- --Random- >>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >>>> --Block-- --Seeks-- >>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >>>> %CP /sec %CP >>>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 >>>> 17 296.4 6 >>>> Latency 70971us 190ms 221ms 40369us 17657us >>>> 164ms >>>> >>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read. >>>> >>>> Now, I''ll attach the full disk to a DomU: >>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w >>>> >>>> And we''ll test from the DomU. >>>> >>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct >>>> 4096+0 records in >>>> 4096+0 records out >>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s >>>> >>>> Partition the same as in the Dom0 and create an ext4 filesystem on it: >>>> >>>> I notice something interesting here. In the Dom0, the device is seen as: >>>> Units = sectors of 1 * 512 = 512 bytes >>>> Sector size (logical/physical): 512 bytes / 4096 bytes >>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >>>> >>>> In the DomU, it is seen as: >>>> Units = sectors of 1 * 512 = 512 bytes >>>> Sector size (logical/physical): 512 bytes / 512 bytes >>>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>>> >>>> Not sure if this could be related - but continuing testing: >>>> Device Boot Start End Blocks Id System >>>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux >>>> >>>> # mkfs.ext4 -j /dev/xvdc1 >>>> .... >>>> Allocating group tables: done >>>> Writing inode tables: done >>>> Creating journal (32768 blocks): done >>>> Writing superblocks and filesystem accounting information: done >>>> >>>> # mount /dev/xvdc1 /mnt/esata/ >>>> # cd /mnt/esata/ >>>> # bonnie++ -d . -u 0:0 >>>> .... >>>> Version 1.96 ------Sequential Output------ --Sequential >>>> Input- --Random- >>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >>>> --Block-- --Seeks-- >>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >>>> %CP /sec %CP >>>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 >>>> 23 313.4 9 >>>> Latency 34615us 130ms 128ms 33316us 74401us >>>> 130ms >>>> >>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device >>>> from the DomU. More than acceptable. >>>> >>>> It leaves me to wonder.... Could there be something in the Dom0 >>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as >>>> 512 byte sectors cause an issue? >>> >>> There is certain overhead in it. I still have this in my mailbox >>> so I am not sure whether this issue got ever resolved? I know that the >>> indirect patches in Xen blkback and xen blkfront are meant to resolve >>> some of these issues - by being able to carry a bigger payload. >>> >>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks. >> >> Ok, so I finally got around to building kernel 3.11 RPMs today for >> testing. I upgraded both the Dom0 and DomU to the same kernel: > > Woohoo! >> >> DomU: >> # dmesg | grep blkfront >> blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; >> indirect descriptors: enabled; >> blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; >> indirect descriptors: enabled; >> >> Looks good. >> >> Transfer tests using bonnie++ as per before: >> # bonnie -d . -u 0:0 >> Version 1.96 ------Sequential Output------ --Sequential Input- >> --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> /sec %CP >> zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30 >> 492.3 13 >> Latency 27305us 124ms 158ms 34222us 16865us >> 374ms >> Version 1.96 ------Sequential Create------ --------Random >> Create-------- >> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> /sec %CP >> 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++ >> 18389 31 >> Latency 17775us 154us 180us 16008us 38us >> 58us >> >> Still seems to be a massive discrepancy between Dom0 and DomU write >> speeds. Interesting is that sequential block reads are nearly 300MB/sec, >> yet sequential writes were only ~58MB/sec. > > OK, so the other thing that people were pointing out that is you > can use xen-blkfront.max parameter. By default it is 32, but try 8. > Or 64. Or 256.Ahh - interesting. I used the following: Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto console=hvc0 xen-blkfront.max=X 8: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 696 92 50906 7 46102 11 1013 97 256784 27 496.5 10 Latency 24374us 199ms 117ms 30855us 38008us 85175us 16: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 675 92 58078 8 57585 13 1005 97 262735 25 505.6 10 Latency 24412us 187ms 183ms 23661us 53850us 232ms 32: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 698 92 57416 8 63328 13 1063 97 267154 24 498.2 12 Latency 24264us 199ms 81362us 33144us 22526us 237ms 64: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 574 86 88447 13 68988 17 897 97 265128 27 493.7 13 128: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 702 97 107638 14 70158 15 1045 97 255596 24 491.0 12 Latency 27279us 17553us 134ms 29771us 38392us 65761us 256: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zeus.crc.id.au 2G 689 91 102554 14 67337 15 1012 97 262475 24 484.4 12 Latency 20642us 104ms 189ms 36624us 45286us 80023us So, as a nice summary: 8: 50Mb/sec 16: 58Mb/sec 32: 57Mb/sec 64: 88Mb/sec 128: 107Mb/sec 256: 102Mb/sec So, maybe it''s coincidence, maybe it isn''t - but the best (factoring margin of error) seems to be 128 - which happens to be the block size of the underlying RAID6 array on the Dom0. # cat /proc/mdstat md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0] 3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU]> The indirect descriptor allows us to put more I/Os on the ring - and > I am hoping that will: > a) solve your problemWell, it looks like this solves the issue - at least increasing the max causes almost double the write speed - and no change to read speeds (within margin of error).> b) not solve your problem, but demonstrate that the issue is not with > the ring, but with something else making your writes slower. > > Hmm, are you by any chance using O_DIRECT when running bonnie++ in > dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is > done to not use the dom0 page cache - otherwise you end up with > a double buffer where the writes are insane speed - but with absolutly > no safety. > > If you want to try disabling that (so no O_DIRECT), I would do this > little change: > > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > index bf4b9d2..823b629 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > break; > case BLKIF_OP_WRITE: > blkif->st_wr_req++; > - operation = WRITE_ODIRECT; > + operation = WRITE; > break; > case BLKIF_OP_WRITE_BARRIER: > drain = true;With the above results, is this still useful? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Steven Haigh <netwiz@crc.id.au> wrote:>On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote: >> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote: >>> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote: >>>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote: >>>>> So, based on my tests yesterday, I decided to break the RAID6 and >>>>> pull a drive out of it to test directly on the 2Tb drives in >>>>> question. >>>>> >>>>> The array in question: >>>>> # cat /proc/mdstat >>>>> Personalities : [raid1] [raid6] [raid5] [raid4] >>>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5] >>>>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 >>>>> [4/4] [UUUU] >>>>> >>>>> # mdadm /dev/md2 --fail /dev/sdf >>>>> mdadm: set /dev/sdf faulty in /dev/md2 >>>>> # mdadm /dev/md2 --remove /dev/sdf >>>>> mdadm: hot removed /dev/sdf from /dev/md2 >>>>> >>>>> So, all tests are to be done on /dev/sdf. >>>>> Model Family: Seagate SV35 >>>>> Device Model: ST2000VX000-9YW164 >>>>> Serial Number: Z1E17C3X >>>>> LU WWN Device Id: 5 000c50 04e1bc6f0 >>>>> Firmware Version: CV13 >>>>> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical >>>>> >>>>> From the Dom0: >>>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct >>>>> 4096+0 records in >>>>> 4096+0 records out >>>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s >>>>> >>>>> Create a single partition on the drive, and format it with ext4: >>>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes >>>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 >sectors >>>>> Units = sectors of 1 * 512 = 512 bytes >>>>> Sector size (logical/physical): 512 bytes / 4096 bytes >>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >>>>> Disk identifier: 0x98d8baaf >>>>> >>>>> Device Boot Start End Blocks Id System >>>>> /dev/sdf1 2048 3907029167 1953513560 83 Linux >>>>> >>>>> Command (m for help): w >>>>> >>>>> # mkfs.ext4 -j /dev/sdf1 >>>>> ...... >>>>> Writing inode tables: done >>>>> Creating journal (32768 blocks): done >>>>> Writing superblocks and filesystem accounting information: done >>>>> >>>>> Mount it on the Dom0: >>>>> # mount /dev/sdf1 /mnt/esata/ >>>>> # cd /mnt/esata/ >>>>> # bonnie++ -d . -u 0:0 >>>>> .... >>>>> Version 1.96 ------Sequential Output------ --Sequential >>>>> Input- --Random- >>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >>>>> --Block-- --Seeks-- >>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >>>>> %CP /sec %CP >>>>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 >209114 >>>>> 17 296.4 6 >>>>> Latency 70971us 190ms 221ms 40369us >17657us >>>>> 164ms >>>>> >>>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read. >>>>> >>>>> Now, I''ll attach the full disk to a DomU: >>>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w >>>>> >>>>> And we''ll test from the DomU. >>>>> >>>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct >>>>> 4096+0 records in >>>>> 4096+0 records out >>>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s >>>>> >>>>> Partition the same as in the Dom0 and create an ext4 filesystem on >it: >>>>> >>>>> I notice something interesting here. In the Dom0, the device is >seen as: >>>>> Units = sectors of 1 * 512 = 512 bytes >>>>> Sector size (logical/physical): 512 bytes / 4096 bytes >>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >>>>> >>>>> In the DomU, it is seen as: >>>>> Units = sectors of 1 * 512 = 512 bytes >>>>> Sector size (logical/physical): 512 bytes / 512 bytes >>>>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>>>> >>>>> Not sure if this could be related - but continuing testing: >>>>> Device Boot Start End Blocks Id System >>>>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux >>>>> >>>>> # mkfs.ext4 -j /dev/xvdc1 >>>>> .... >>>>> Allocating group tables: done >>>>> Writing inode tables: done >>>>> Creating journal (32768 blocks): done >>>>> Writing superblocks and filesystem accounting information: done >>>>> >>>>> # mount /dev/xvdc1 /mnt/esata/ >>>>> # cd /mnt/esata/ >>>>> # bonnie++ -d . -u 0:0 >>>>> .... >>>>> Version 1.96 ------Sequential Output------ --Sequential >>>>> Input- --Random- >>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >>>>> --Block-- --Seeks-- >>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >>>>> %CP /sec %CP >>>>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 >176407 >>>>> 23 313.4 9 >>>>> Latency 34615us 130ms 128ms 33316us >74401us >>>>> 130ms >>>>> >>>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device >>>>> from the DomU. More than acceptable. >>>>> >>>>> It leaves me to wonder.... Could there be something in the Dom0 >>>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as >>>>> 512 byte sectors cause an issue? >>>> >>>> There is certain overhead in it. I still have this in my mailbox >>>> so I am not sure whether this issue got ever resolved? I know that >the >>>> indirect patches in Xen blkback and xen blkfront are meant to >resolve >>>> some of these issues - by being able to carry a bigger payload. >>>> >>>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks. >>> >>> Ok, so I finally got around to building kernel 3.11 RPMs today for >>> testing. I upgraded both the Dom0 and DomU to the same kernel: >> >> Woohoo! >>> >>> DomU: >>> # dmesg | grep blkfront >>> blkfront: xvda: flush diskcache: enabled; persistent grants: >enabled; >>> indirect descriptors: enabled; >>> blkfront: xvdb: flush diskcache: enabled; persistent grants: >enabled; >>> indirect descriptors: enabled; >>> >>> Looks good. >>> >>> Transfer tests using bonnie++ as per before: >>> # bonnie -d . -u 0:0 >>> Version 1.96 ------Sequential Output------ --Sequential >Input- >>> --Random- >>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- >--Block-- >>> --Seeks-- >>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec >%CP >>> /sec %CP >>> zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 >30 >>> 492.3 13 >>> Latency 27305us 124ms 158ms 34222us 16865us >>> 374ms >>> Version 1.96 ------Sequential Create------ --------Random >>> Create-------- >>> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- >--Read--- >>> -Delete-- >>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec >%CP >>> /sec %CP >>> 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ >+++ >>> 18389 31 >>> Latency 17775us 154us 180us 16008us 38us >>> 58us >>> >>> Still seems to be a massive discrepancy between Dom0 and DomU write >>> speeds. Interesting is that sequential block reads are nearly >300MB/sec, >>> yet sequential writes were only ~58MB/sec. >> >> OK, so the other thing that people were pointing out that is you >> can use xen-blkfront.max parameter. By default it is 32, but try 8. >> Or 64. Or 256. > >Ahh - interesting. > >I used the following: >Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM >LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us >crashkernel=auto console=hvc0 xen-blkfront.max=X > >8: >Version 1.96 ------Sequential Output------ --Sequential Input- >--Random- >Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >--Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >/sec %CP >zeus.crc.id.au 2G 696 92 50906 7 46102 11 1013 97 256784 27 >496.5 10 >Latency 24374us 199ms 117ms 30855us 38008us >85175us > >16: >Version 1.96 ------Sequential Output------ --Sequential Input- >--Random- >Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >--Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >/sec %CP >zeus.crc.id.au 2G 675 92 58078 8 57585 13 1005 97 262735 25 >505.6 10 >Latency 24412us 187ms 183ms 23661us 53850us >232ms > >32: >Version 1.96 ------Sequential Output------ --Sequential Input- >--Random- >Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >--Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >/sec %CP >zeus.crc.id.au 2G 698 92 57416 8 63328 13 1063 97 267154 24 >498.2 12 >Latency 24264us 199ms 81362us 33144us 22526us >237ms > >64: >Version 1.96 ------Sequential Output------ --Sequential Input- >--Random- >Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >--Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >/sec %CP >zeus.crc.id.au 2G 574 86 88447 13 68988 17 897 97 265128 27 >493.7 13 > >128: >Version 1.96 ------Sequential Output------ --Sequential Input- >--Random- >Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >--Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >/sec %CP >zeus.crc.id.au 2G 702 97 107638 14 70158 15 1045 97 255596 24 >491.0 12 >Latency 27279us 17553us 134ms 29771us 38392us >65761us > >256: >Version 1.96 ------Sequential Output------ --Sequential Input- >--Random- >Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >--Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >/sec %CP >zeus.crc.id.au 2G 689 91 102554 14 67337 15 1012 97 262475 24 >484.4 12 >Latency 20642us 104ms 189ms 36624us 45286us >80023us > >So, as a nice summary: >8: 50Mb/sec >16: 58Mb/sec >32: 57Mb/sec >64: 88Mb/sec >128: 107Mb/sec >256: 102Mb/sec > >So, maybe it''s coincidence, maybe it isn''t - but the best (factoring >margin of error) seems to be 128 - which happens to be the block size >of >the underlying RAID6 array on the Dom0. > ># cat /proc/mdstat >md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0] > 3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] >[UUUU] > >> The indirect descriptor allows us to put more I/Os on the ring - and >> I am hoping that will: >> a) solve your problem > >Well, it looks like this solves the issue - at least increasing the max >causes almost double the write speed - and no change to read speeds >(within margin of error). > >> b) not solve your problem, but demonstrate that the issue is not >with >> the ring, but with something else making your writes slower. >> >> Hmm, are you by any chance using O_DIRECT when running bonnie++ in >> dom0? The xen-blkback tacks on O_DIRECT to all write requests. This >is >> done to not use the dom0 page cache - otherwise you end up with >> a double buffer where the writes are insane speed - but with >absolutly >> no safety. >> >> If you want to try disabling that (so no O_DIRECT), I would do this >> little change: >> >> diff --git a/drivers/block/xen-blkback/blkback.c >b/drivers/block/xen-blkback/blkback.c >> index bf4b9d2..823b629 100644 >> --- a/drivers/block/xen-blkback/blkback.c >> +++ b/drivers/block/xen-blkback/blkback.c >> @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct >xen_blkif *blkif, >> break; >> case BLKIF_OP_WRITE: >> blkif->st_wr_req++; >> - operation = WRITE_ODIRECT; >> + operation = WRITE; >> break; >> case BLKIF_OP_WRITE_BARRIER: >> drain = true; > >With the above results, is this still useful?No. There is no need. Awesome that this fixed it. Roger had mentioned that he had seen similar behavior. We should probably do a patch that interrogates the backend for optimal segment size and informs the frontend - so it can set it not.