Luke S Crawford
2008-May-28 09:58 UTC
[Xen-users] I/O performance problem using LVM mirrors to back phy: devices
So, we just moved to some much faster hardware. intel q6600 CPU, 8Gb unbuffered ECC, ICH7 sata (2x1TB disks) - and we were irritated and puzzled to find that the new setup had really, really slow I/O. The odd thing is that the performance is fine if you just mount the LV directly from the Dom0... but if you xm block-attach it to the Dom0 and then mount it, you get 1/10th the speed. we are running CentOS 5.1, kernel 2.6.18-53.1.14.el5 full writeup: We noted that the performance of mirrored logical volumes accessed through xenblk was about 1/10th that of non-mirrored LVs, or of LVs mirrored with the --corelog option. Mirrored LVs performed fine when accessed normally within the dom0, but performance dropped when accessed via xm block-attach. This was, to our minds, ridiculous. First, we created two logical volumes in the volume group "test": one with mirroring and a mirror log and one with the --corelog option. # lvcreate -m 1 -L 2G -n test_mirror test # lvcreate -m 1 --corelog -L 2G -n test_core test Then we made filesystems and mounted them: # mke2fs -j /dev/test/test* # mkdir -p /mnt/test/mirror # mkdir -p /mnt/test/core # mount /dev/test/test_mirror /mnt/test/mirror Next we started oprofile, instructing it to count BUS_IO_WAIT events: # opcontrol --start --event=BUS_IO_WAIT:500:0xc0 --xen=/usr/lib/debug/boot/xen-syms-2.6.18-53.1.14.el5.debug --vmlinux=/usr/lib/debug/lib/modules/2.6.18-53.1.14.el5xen/vmlinux --separate=all Then we ran bonnie on each device in sequence, stopping oprofile and saving the output each time. # bonnie++ -d /mnt/test/mirror # opcontrol --stop # opcontrol --save=mirrorlog # opcontrol --reset The LV with the corelog displayed negligible iowait, as expected. However, the other experienced quite a bit: # opreport -t 1 --symbols session:iowait_mirror warning: /ahci could not be found. CPU: Core 2, speed 2400.08 MHz (estimated) Counted BUS_IO_WAIT events (IO requests waiting in the bus queue) with a unit mask of 0xc0 (All cores) count 500 Processes with a thread ID of 0 Processes with a thread ID of 463 Processes with a thread ID of 14185 samples % samples % samples % app name symbol name 32 91.4286 15 93.7500 0 0 xen-syms-2.6.18-53.1.14.el5.debug pit_read_counter 1 2.8571 0 0 0 0 ahci (no symbols) 1 2.8571 0 0 0 0 vmlinux bio_put 1 2.8571 0 0 0 0 vmlinux hypercall_page>From this, it seemed clear that the culprit was in thepit_read_counter function. Any ideas on where to take it from here? Credit to Chris Takemura <chris@prgmr.com> for repeating the problem with oprofile, and the writeup _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users