Ian Campbell
2011-Aug-25 06:47 UTC
[Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
Hi Konrad, Does this look at all familiar? There is some more info in the full bug log at http://bugs.debian.org/637234 . In particular, contrary to the message below, the user subsequently confirmed that the issue appears to be Xen specific (doesn''t happen on native or vmware) and that it arose between 2.6.39-2-686-pae and 3.0.0-1-686-pae. Could it be related to edf6ef59ec7e "xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support"? That looks like the only pertinent change between 2.6.39 and 3.0. Gedalya, 2.6.39-2-686-pae could be anything from v2.6.39..v2.6.39.2 please could you confirm which package version you have installed in case it makes a difference. Cheers, Ian. On Tue, 2011-08-09 at 14:07 -0400, Gedalya wrote:> Package: linux-2.6 > Version: 3.0.0-1 > Severity: important > > > Hello, > > I have a xen host running debian squeeze, amd64, some of the DomU''s are > running wheezy. My mail server is a DomU called "mail", using ext4 for the > root (and other) FS. A dist-upgrade on "mail" has upgraded the kernel to > linux-image-3.0.0-1-686-pae, and at this point I started getting I/O errors > during the boot process, as follows: > > ----------- > Starting MySQL database server: mysqld[ 6.453894] end_request: I/O error, dev xvda, sector 4456704 > [ 6.453919] end_request: I/O error, dev xvda, sector 4456704 > [ 6.453964] Aborting journal on device xvda-8. > [ 6.462873] EXT4-fs error (device xvda): ext4_journal_start_sb:296: Detected aborted journal > [ 6.462903] EXT4-fs (xvda): Remounting filesystem read-only > [ 6.463276] journal commit I/O error > . . . . . . . . . . . . . . failed! > Starting MTA: exim4. > Starting IMAP/POP3 mail server: dovecot. > startpar: service(s) returned failure: mysql ... failed! > ----------- > > So I went ahead and installed wheezy on a brand new DomU, and this > was repeated immediately when booting the machine after the installation > completed. > > ----------- > Starting NFS common utilities: statd[ 3.977392] end_request: I/O error, dev xvda, sector 4456808 > [ 3.977415] end_request: I/O error, dev xvda, sector 4456808 > [ 3.977470] Aborting journal on device xvda-8. > [ 3.990442] journal commit I/O error > [ 3.991041] EXT4-fs error (device xvda): ext4_journal_start_sb:296: Detected aborted journal > [ 3.991126] EXT4-fs (xvda): Remounting filesystem read-only > failed! > Cleaning up temporary files.... > Setting up console font and keymap...done. > startpar: service(s) returned failure: nfs-common ... failed! > INIT: Entering runlevel: 2 > Using makefile-style concurrent boot in runlevel 2. > Starting rpcbind daemon...Already running.. > Starting NFS common utilities: statd failed! > touch: cannot touch `/var/log/dmesg.new'': Read-only file system > chown: cannot access `/var/log/dmesg.new'': No such file or directory > chmod: cannot access `/var/log/dmesg.new'': No such file or directory > ln: creating hard link `/var/log//dmesg.0'': Read-only file system > ... etc. ... > ----------- > > Now, it happenes this way exactly every _other_ time the machines boot. > When I reboot after these I/O errors, fsck is run and then the machine > seems to be actually fine until the next reboot when it all happens > again. > > For me, this is happening on xen DomU''s, only when running linux > 3.0.0-1-686-pae, only when using ext4 for the root FS. > No problems when booting back to 2.6.39-2-686-pae. > > Please let me know what more specific testing needs to be done, if > necessary I can test more platforms / flavors. > > I have observed nothing to suggest this is related to xen, it''s just my > platform here. > > -- Package-specific info: > ** Version: > Linux version 3.0.0-1-686-pae (Debian 3.0.0-1) (ben@decadent.org.uk) (gcc version 4.5.3 (Debian 4.5.3-3) ) #1 SMP Sun Jul 24 14:27:32 UTC 2011 > > ** Command line: > root=UUID=8a1a7bca-b0e2-4714-baf1-b852eab25843 ro quiet > > ** Not tainted > > ** Kernel log: > [ 0.016117] PCI: System does not support PCI > [ 0.016120] PCI: System does not support PCI > [ 0.016231] Switching to clocksource xen > [ 0.017739] pnp: PnP ACPI: disabled > [ 0.017742] PnPBIOS: Disabled > [ 0.018820] Switched to NOHz mode on CPU #1 > [ 0.018902] Switched to NOHz mode on CPU #0 > [ 0.020460] PCI: max bus depth: 0 pci_try_num: 1 > [ 0.020696] NET: Registered protocol family 2 > [ 0.020967] IP route cache hash table entries: 8192 (order: 3, 32768 bytes) > [ 0.021437] TCP established hash table entries: 32768 (order: 6, 262144 bytes) > [ 0.021752] TCP bind hash table entries: 32768 (order: 6, 262144 bytes) > [ 0.022063] TCP: Hash tables configured (established 32768 bind 32768) > [ 0.022069] TCP reno registered > [ 0.022077] UDP hash table entries: 512 (order: 2, 16384 bytes) > [ 0.022100] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes) > [ 0.022469] NET: Registered protocol family 1 > [ 0.022486] PCI: CLS 0 bytes, default 64 > [ 0.022574] Unpacking initramfs... > [ 0.042069] Freeing initrd memory: 22480k freed > [ 0.046257] platform rtc_cmos: registered platform RTC device (no PNP device found) > [ 0.046605] audit: initializing netlink socket (disabled) > [ 0.046616] type=2000 audit(1312911347.921:1): initialized > [ 0.056740] HugeTLB registered 2 MB page size, pre-allocated 0 pages > [ 0.057039] VFS: Disk quotas dquot_6.5.2 > [ 0.057099] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) > [ 0.057194] msgmni has been set to 999 > [ 0.057354] alg: No test for stdrng (krng) > [ 0.057382] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) > [ 0.057386] io scheduler noop registered > [ 0.057388] io scheduler deadline registered > [ 0.057402] io scheduler cfq registered (default) > [ 0.057598] isapnp: Scanning for PnP cards... > [ 0.409558] isapnp: No Plug & Play device found > [ 0.409873] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled > [ 0.412773] Linux agpgart interface v0.103 > [ 0.413203] i8042: PNP: No PS/2 controller found. Probing ports directly. > [ 0.414033] i8042: No controller found > [ 0.414227] mousedev: PS/2 mouse device common for all mice > [ 0.454109] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0 > [ 0.454143] rtc_cmos: probe of rtc_cmos failed with error -38 > [ 0.454162] cpuidle: using governor ladder > [ 0.454164] cpuidle: using governor menu > [ 0.454336] TCP cubic registered > [ 0.454455] NET: Registered protocol family 10 > [ 0.454980] Mobile IPv6 > [ 0.454983] NET: Registered protocol family 17 > [ 0.454987] Registering the dns_resolver key type > [ 0.455001] Using IPI No-Shortcut mode > [ 0.455069] PM: Hibernation image not present or could not be loaded. > [ 0.455080] registered taskstats version 1 > [ 0.455093] XENBUS: Device with no driver: device/vbd/51712 > [ 0.455095] XENBUS: Device with no driver: device/vbd/51744 > [ 0.455097] XENBUS: Device with no driver: device/vif/0 > [ 0.455099] XENBUS: Device with no driver: device/vif/1 > [ 0.455102] XENBUS: Device with no driver: device/console/0 > [ 0.455114] /build/buildd-linux-2.6_3.0.0-1-i386-ML66CU/linux-2.6-3.0.0/debian/build/source_i386_none/drivers/rtc/hctosys.c: unable to open rtc device (rtc0) > [ 0.455175] Initializing network drop monitor service > [ 0.455438] Freeing unused kernel memory: 404k freed > [ 0.456030] Write protecting the kernel text: 2768k > [ 0.456248] Write protecting the kernel read-only data: 1068k > [ 0.456250] NX-protecting the kernel data: 3376k > [ 0.490525] udevd[50]: starting version 172 > [ 0.510452] Initialising Xen virtual ethernet driver. > [ 0.526964] blkfront: xvda: barrier: enabled > [ 0.528495] xvda: > [ 0.528633] Setting capacity to 10485760 > [ 0.528637] xvda: detected capacity change from 0 to 5368709120 > [ 0.529412] blkfront: xvdc: barrier: enabled > [ 0.558774] xvdc: unknown partition table > [ 0.559489] Setting capacity to 1048576 > [ 0.559502] xvdc: detected capacity change from 0 to 536870912 > [ 0.973128] PM: Starting manual resume from disk > [ 0.973131] PM: Hibernation image partition 202:32 present > [ 0.973133] PM: Looking for hibernation image. > [ 0.973405] PM: Image not found (code -22) > [ 0.973408] PM: Hibernation image not present or could not be loaded. > [ 0.983577] EXT4-fs (xvda): INFO: recovery required on readonly filesystem > [ 0.983581] EXT4-fs (xvda): write access will be enabled during recovery > [ 1.024513] EXT4-fs warning (device xvda): ext4_clear_journal_err:4155: Filesystem error recorded from previous mount: IO failure > [ 1.024524] EXT4-fs warning (device xvda): ext4_clear_journal_err:4156: Marking fs in need of filesystem check. > [ 1.025790] EXT4-fs (xvda): recovery complete > [ 1.026596] EXT4-fs (xvda): mounted filesystem with ordered data mode. Opts: (null) > [ 1.928491] udevd[160]: starting version 172 > [ 2.124852] input: PC Speaker as /devices/platform/pcspkr/input/input0 > [ 2.204922] Error: Driver ''pcspkr'' is already registered, aborting... > [ 2.550476] Adding 524284k swap on /dev/xvdc. Priority:-1 extents:1 across:524284k SS > [ 2.564932] EXT4-fs (xvda): re-mounted. Opts: (null) > [ 3.156251] blkfront: barrier: empty write xvda op failed > [ 3.156255] blkfront: xvda: barrier or flush: disabled > [ 3.185628] EXT4-fs (xvda): re-mounted. Opts: errors=remount-ro > [ 3.251006] loop: module loaded > [ 4.326336] RPC: Registered named UNIX socket transport module. > [ 4.326344] RPC: Registered udp transport module. > [ 4.326350] RPC: Registered tcp transport module. > [ 4.326356] RPC: Registered tcp NFSv4.1 backchannel transport module. > [ 4.361714] FS-Cache: Loaded > [ 4.382614] FS-Cache: Netfs ''nfs'' registered for caching > [ 4.402479] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). > [ 14.460105] eth0: no IPv6 routers present > > ** Model information > not available > > ** Loaded modules: > Module Size Used by > nfsd 197933 2 > nfs 218404 0 > lockd 61314 2 nfsd,nfs > fscache 31952 1 nfs > auth_rpcgss 32183 2 nfsd,nfs > nfs_acl 12463 2 nfsd,nfs > sunrpc 139050 6 nfsd,nfs,lockd,auth_rpcgss,nfs_acl > loop 17866 0 > evdev 12995 0 > snd_pcm 53315 0 > snd_timer 22027 1 snd_pcm > snd 38562 2 snd_pcm,snd_timer > soundcore 12992 1 snd > snd_page_alloc 12899 1 snd_pcm > pcspkr 12515 0 > ext4 274801 1 > mbcache 12898 1 ext4 > jbd2 56798 1 ext4 > crc16 12327 1 ext4 > xen_netfront 21670 0 > xen_blkfront 17215 2 > > ** PCI devices: > > ** USB devices: > not available > > > -- System Information: > Debian Release: wheezy/sid > APT prefers testing > APT policy: (500, ''testing'') > Architecture: i386 (i686) > > Kernel: Linux 3.0.0-1-686-pae (SMP w/2 CPU cores) > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/dash > > Versions of packages linux-image-3.0.0-1-686-pae depends on: > ii debconf [debconf-2.0] 1.5.40 Debian configuration management sy > ii initramfs-tools [linux-initra 0.99 tools for generating an initramfs > ii linux-base 3.3 Linux image base package > ii module-init-tools 3.16-1 tools for managing Linux kernel mo > > Versions of packages linux-image-3.0.0-1-686-pae recommends: > pn firmware-linux-free <none> (no description available) > ii libc6-i686 2.13-10 Embedded GNU C Library: Shared lib > > Versions of packages linux-image-3.0.0-1-686-pae suggests: > ii grub-pc 1.99-9 GRand Unified Bootloader, version > pn linux-doc-3.0.0 <none> (no description available) > > Versions of packages linux-image-3.0.0-1-686-pae is related to: > pn firmware-bnx2 <none> (no description available) > pn firmware-bnx2x <none> (no description available) > pn firmware-ipw2x00 <none> (no description available) > pn firmware-ivtv <none> (no description available) > pn firmware-iwlwifi <none> (no description available) > pn firmware-linux <none> (no description available) > pn firmware-linux-nonfree <none> (no description available) > pn firmware-qlogic <none> (no description available) > pn firmware-ralink <none> (no description available) > pn xen-hypervisor <none> (no description available) > > -- debconf information: > linux-image-3.0.0-1-686-pae/prerm/removing-running-kernel-3.0.0-1-686-pae: true > linux-image-3.0.0-1-686-pae/postinst/ignoring-ramdisk: > linux-image-3.0.0-1-686-pae/postinst/missing-firmware-3.0.0-1-686-pae: > linux-image-3.0.0-1-686-pae/postinst/depmod-error-initrd-3.0.0-1-686-pae: false > > > > -- > To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org > with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org > Archive: http://lists.debian.org/20110809180728.2279.11548.reportbug@mail1.gedalya.net > >-- Ian Campbell In those days he was wiser than he is now -- he used to frequently take my advice. -- Winston Churchill _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gedalya
2011-Aug-25 07:20 UTC
[Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
> Gedalya, 2.6.39-2-686-pae could be anything from v2.6.39..v2.6.39.2 > please could you confirm which package version you have installed in > case it makes a difference. >root@mail:~# uname -a Linux mail 2.6.39-2-686-pae #1 SMP Tue Jul 5 03:48:49 UTC 2011 i686 GNU/Linux root@mail:~# dpkg -l | grep linux-image ii linux-image-2.6-686-pae 3.0.0+39 Linux for modern PCs (dummy package) ii linux-image-2.6.39-2-686-pae 2.6.39-3 Linux 2.6.39 for modern PCs ii linux-image-3.0.0-1-686-pae 3.0.0-1 Linux 3.0.0 for modern PCs ii linux-image-686-pae 3.0.0+39 Linux for modern PCs (meta-package) root@mail:~# _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-26 17:53 UTC
Re: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Thu, Aug 25, 2011 at 07:47:08AM +0100, Ian Campbell wrote:> Hi Konrad, > > Does this look at all familiar? There is some more info in the full bug > log at http://bugs.debian.org/637234 . In particular, contrary to the > message below, the user subsequently confirmed that the issue appears to > be Xen specific (doesn''t happen on native or vmware) and that it arose > between 2.6.39-2-686-pae and 3.0.0-1-686-pae. > > Could it be related to edf6ef59ec7e "xen-blkfront: Introduce > BLKIF_OP_FLUSH_DISKCACHE support"? That looks like the only pertinent > change between 2.6.39 and 3.0.It shouldn''t - from the look of it: [ 0.529412] blkfront: xvdc: barrier: enabled it looks as if the ''feature-barrier'' is used. Not ''feature-flush-cache'' - otherwise you would have seen a message about that. But then.. 3.0 (and 2.6.39) don''t do barriers anymore. However the backend seems to do it. And from my understanding is that the barrier request is a superset of a flush request so it should work. Put maybe that is an incorrect assumption. One way to make sure that is not the case is to disable barriers in the guest. Meaning in /etc/fstab have something like this: /dev/xvdc /blah ext4 errors=remount-ro,barrier=0 0 1 The other question is what version of Dom0 are you running? Is it 2.6.32? 2.6.39? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gedalya
2011-Aug-26 22:58 UTC
Re: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
> One way to make sure that is not the case is to disable barriers in the > guest. Meaning in /etc/fstab have something like this: > > /dev/xvdc /blah ext4 errors=remount-ro,barrier=0 0 1That seems to fix it. It was remounting as read only either during the boot process or immediately after, and now it boots up and seems to stay up. I''ll test laster with a DomU that actually has things running. This also fixes the reboot problem I noted earlier, init 6 now reboots the DomU rather than destory it.> > The other question is what version of Dom0 are you running? Is it 2.6.32? > 2.6.39?squeeze, running linux-image-2.6.32-5-xen-amd64 2.6.32-35 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-29 14:08 UTC
Re: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Fri, Aug 26, 2011 at 06:58:34PM -0400, Gedalya wrote:> > >One way to make sure that is not the case is to disable barriers in the > >guest. Meaning in /etc/fstab have something like this: > > > >/dev/xvdc /blah ext4 errors=remount-ro,barrier=0 0 1 > > That seems to fix it. It was remounting as read only either during > the boot process or immediately after, and now it boots up and seems > to stay up. I''ll test laster with a DomU that actually has things > running.Yeeey!> > This also fixes the reboot problem I noted earlier, init 6 now > reboots the DomU rather than destory it. > > > > >The other question is what version of Dom0 are you running? Is it 2.6.32? > >2.6.39? > squeeze, running linux-image-2.6.32-5-xen-amd64 2.6.32-35Oh, I think I know _exactly_ what bug that is: This git commit: 280802657fb95c52bb5a35d43fea60351883b2af "xen/blkback: When writting barriers set the sector number to zero" has to be reverted. Specifically: commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Tue Jul 19 16:44:42 2011 -0700 Revert "xen/blkback: When writting barriers set the sector number to zero..." This reverts commit 280802657fb95c52bb5a35d43fea60351883b2af. This patch is reported to cause disk corruption: From: "Huang2, Wei" <Wei.Huang2@amd.com> We recently found a disk corruption issue with SLES11 SP1 guest. Basically the guest disk becomes non-bootable after guest shutdown. This is a SLES specific issue as we didn’t see on other Linux and Windows VMs. Here is the configuration: =========== 1. Xen: xen-4.1-testing, changeset 23096 2. Dom0: Jeremy’s latest pvops 6d94b75 (June 1) 3. VM: SLES 11 SP1, installed as physical machine with raw disk format =========== Regarding the disk before corruption, “file sles11sp1.img” command read: “/root/guests/sles11-sp1/sles11sp1.img: x86 boot sector; partition 1: ID=0x82, starthead 1, startsector 63, 4208967 sectors; partition 2: ID=0x83, active, starthead 0, startsector 4209030, 16755795 sectors”. After corruption, it became a data file: ““/root/guests/sles11-sp1/sles11sp1.img: data”. and this one added: 25266338a41470a21e9b3974445be09e0640dda7 xen/blkback: don''t fail empty barrier requests The sector number on empty barrier requests may (will?) be -1, which, given that it''s being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk''s size. Inspired by Konrad''s "When writting barriers set the sector number to zero...". _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ben Hutchings
2011-Sep-07 01:51 UTC
Re: Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Mon, 2011-08-29 at 10:08 -0400, Konrad Rzeszutek Wilk wrote: [...]> Oh, I think I know _exactly_ what bug that is: > > This git commit: > 280802657fb95c52bb5a35d43fea60351883b2af "xen/blkback: When writting barriers set the sector number to zero" > has to be reverted. Specifically: > > commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Date: Tue Jul 19 16:44:42 2011 -0700 > > Revert "xen/blkback: When writting barriers set the sector number to zero..."[...]> and this one added: > > 25266338a41470a21e9b3974445be09e0640dda7 > xen/blkback: don''t fail empty barrier requests[...] Which repository are these in? Ben. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-07 12:29 UTC
Re: Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Wed, Sep 07, 2011 at 02:51:04AM +0100, Ben Hutchings wrote:> On Mon, 2011-08-29 at 10:08 -0400, Konrad Rzeszutek Wilk wrote: > [...] > > Oh, I think I know _exactly_ what bug that is: > > > > This git commit: > > 280802657fb95c52bb5a35d43fea60351883b2af "xen/blkback: When writting barriers set the sector number to zero" > > has to be reverted. Specifically: > > > > commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b > > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > Date: Tue Jul 19 16:44:42 2011 -0700 > > > > Revert "xen/blkback: When writting barriers set the sector number to zero..." > [...] > > and this one added: > > > > 25266338a41470a21e9b3974445be09e0640dda7 > > xen/blkback: don''t fail empty barrier requests > [...] > > Which repository are these in?Jeremy''s: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git> > Ben. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- XAPI on debian installation issue
- How to boot Ubuntu 9.04(Jaunty) or Debian Lenny (vmlinuz-2.6.26-2-686-bigmem) as domU under Lenny''s vmlinuz-2.6.26-2-xen-686 as dom0. PVM''s initrd issue
- [PATCH] blkback: Fix block I/O latency issue
- Xen 4.1.2 / Fedora 16 pv domU does not show Anaconda
- Problems with PCI pass-through