Hey, I''ve faced strange problem with block devices. When trying to read some file (from read-only ext3), everything looks good, except that file content is corrupted! But this can be coincidence (that "failed" reads doesn''t hit filesystem metadata). fsck in dom0 on filesystem image returns no errors. fsck (with -nf flags) in domU on the device causes the kernel to output "blkfront: flush disk cache: empty write xvdd op failed", "blkfront: xvdd: barrier or flush: disable". And returns no filesystem errors. From that point, file reads return correct file content. For most cases dropping block cache (echo 3 > /proc/sys/vm/drop_caches) or remounting device also "fixes" the problem. On RW device (with different size, filesystem and content), domU kernel complains about EXT4 errors. Doesn''t observed such strange issues on device-mapper backed devices. On 3.2.7 it worked, problem observed on 3.3.5 and 3.4 in dom0, regardless of domU kernel (tried 3.2.7, 3.3.5, 3.4.0). I''ve suspected feature-flush-cache/feature-barrier, but when disabled its advertise in blkback code, problem still occurs. Some details: dom0: 3.4.0-1.pvops.qubes.x86_64 (vanilla 3.4 + Konrad''s patches for ACPI S3) domU: 3.3.5-1.pvops.qubes.x86_64 (vanilla 3.3.5 + Konrad''s patches for ACPI S3) domU: # mount | grep /lib/modules /dev/xvdd on /usr/lib/modules type ext3 (ro,relatime,errors=continue,barrier=1,data=ordered) # pwd /lib/modules/3.3.5-1.pvops.qubes.x86_64/kernel # md5sum ./sound/usb/snd-usbmidi-lib.ko fbc0aeb4dd5c0c3b041a5899a15c6566 ./sound/usb/snd-usbmidi-lib.ko # ls -l ./sound/usb/snd-usbmidi-lib.ko -rwxr--r-- 1 root root 38248 May 20 14:14 ./sound/usb/snd-usbmidi-lib.ko dom0: # mount|grep modules /var/lib/qubes/vm-kernels/3.3.5/modules.img on /mnt/tmp type ext3 (ro,loop=/dev/loop10) # pwd /mnt/tmp/3.3.5-1.pvops.qubes.x86_64/kernel # md5sum ./sound/usb/snd-usbmidi-lib.ko 9d2d3fedd4a357252e367fa8109c16ed ./sound/usb/snd-usbmidi-lib.ko # ls -l ./sound/usb/snd-usbmidi-lib.ko -rwxr--r-- 1 root root 38248 May 20 14:14 ./sound/usb/snd-usbmidi-lib.ko And block backend parameters: # xenstore-ls /local/domain/0/backend/vbd/3/51760 frontend = "/local/domain/3/device/vbd/51760" params = "/var/lib/qubes/vm-kernels/3.3.5/modules.img" scripted = "1" frontend-id = "3" online = "1" removable = "0" bootable = "1" state = "4" dev = "xvdd" type = "file" mode = "r" node = "/dev/loop4" physical-device = "7:4" hotplug-status = "connected" feature-flush-cache = "1" discard-granularity = "4096" discard-alignment = "0" discard-secure = "0" feature-discard = "1" feature-barrier = "1" sectors = "409600" info = "4" sector-size = "512" BTW 3.2.7 advertise feature-flush-cache=0 and feature-barrier=0 on this one device (RO, loop backed). Don''t know why, but seems irrelevant to this issue. -- Best Regards / Pozdrawiam, Marek Marczykowski Invisible Things Lab _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 08.06.2012 15:11, Marek Marczykowski wrote:> Hey, > > I''ve faced strange problem with block devices. When trying to read some file > (from read-only ext3), everything looks good, except that file content is > corrupted! But this can be coincidence (that "failed" reads doesn''t hit > filesystem metadata). > fsck in dom0 on filesystem image returns no errors. > fsck (with -nf flags) in domU on the device causes the kernel to output > "blkfront: flush disk cache: empty write xvdd op failed", "blkfront: xvdd: > barrier or flush: disable". And returns no filesystem errors. From that point, > file reads return correct file content. For most cases dropping block cache > (echo 3 > /proc/sys/vm/drop_caches) or remounting device also "fixes" the problem. > > On RW device (with different size, filesystem and content), domU kernel > complains about EXT4 errors. > Doesn''t observed such strange issues on device-mapper backed devices. > > On 3.2.7 it worked, problem observed on 3.3.5 and 3.4 in dom0, regardless of > domU kernel (tried 3.2.7, 3.3.5, 3.4.0). > > I''ve suspected feature-flush-cache/feature-barrier, but when disabled its > advertise in blkback code, problem still occurs. > > Some details: > dom0: 3.4.0-1.pvops.qubes.x86_64 (vanilla 3.4 + Konrad''s patches for ACPI S3) > domU: 3.3.5-1.pvops.qubes.x86_64 (vanilla 3.3.5 + Konrad''s patches for ACPI S3)(...) Still the case on 3.4.1 with applied patches from Konrad''s for-jens-3.5 branch. I''ve compared file contents and it differs in (multiply of) 1024 bytes - the same as filesystem block size. And only if block wasn''t in pagecache in dom0. When I flush VM pagecache (echo 1 > /proc/.../drop_caches) after trying to read some files (actually md5sum -c), but not dom0 pagecache - problem vanished. But if I clean also dom0 pagecache - problem returns. Any clues welcomed... -- Best Regards / Pozdrawiam, Marek Marczykowski Invisible Things Lab _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 14.06.2012 14:27, Marek Marczykowski wrote:> On 08.06.2012 15:11, Marek Marczykowski wrote: >> Hey, >> >> I''ve faced strange problem with block devices. When trying to read some file >> (from read-only ext3), everything looks good, except that file content is >> corrupted! But this can be coincidence (that "failed" reads doesn''t hit >> filesystem metadata). >> fsck in dom0 on filesystem image returns no errors. >> fsck (with -nf flags) in domU on the device causes the kernel to output >> "blkfront: flush disk cache: empty write xvdd op failed", "blkfront: xvdd: >> barrier or flush: disable". And returns no filesystem errors. From that point, >> file reads return correct file content. For most cases dropping block cache >> (echo 3 > /proc/sys/vm/drop_caches) or remounting device also "fixes" the problem. >> >> On RW device (with different size, filesystem and content), domU kernel >> complains about EXT4 errors. >> Doesn''t observed such strange issues on device-mapper backed devices. >> >> On 3.2.7 it worked, problem observed on 3.3.5 and 3.4 in dom0, regardless of >> domU kernel (tried 3.2.7, 3.3.5, 3.4.0). >> >> I''ve suspected feature-flush-cache/feature-barrier, but when disabled its >> advertise in blkback code, problem still occurs. >> >> Some details: >> dom0: 3.4.0-1.pvops.qubes.x86_64 (vanilla 3.4 + Konrad''s patches for ACPI S3) >> domU: 3.3.5-1.pvops.qubes.x86_64 (vanilla 3.3.5 + Konrad''s patches for ACPI S3) > > (...) > Still the case on 3.4.1 with applied patches from Konrad''s for-jens-3.5 branch. > I''ve compared file contents and it differs in (multiply of) 1024 bytes - the > same as filesystem block size. And only if block wasn''t in pagecache in dom0. > When I flush VM pagecache (echo 1 > /proc/.../drop_caches) after trying to > read some files (actually md5sum -c), but not dom0 pagecache - problem > vanished. But if I clean also dom0 pagecache - problem returns. > > Any clues welcomed...Ok, found the reason. It wasn''t blkback fault, even on baremetal, loopback-mounted image had the same problem. It was caused by "0fc9d104 radix-tree: use iterators in find_get_pages* functions" commit somehow between 3.3 and 3.4. It is already fixed in 3.4.2. -- Best Regards / Pozdrawiam, Marek Marczykowski Invisible Things Lab _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel