Miguel Sousa Filipe
2008-Jun-04 00:01 UTC
Bug report, OOPs on weird conditions.. device under mdadm raid1 failed...
Hello, I have a Kernel Bug report. I must warn that this happened on bad hardware.. one disc of a md raid1 started to fault.. my dmesg if full of these: ata1: soft resetting port ata1.00: configured for UDMA/33 ata1: EH complete ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:20:bf:23:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 16384 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: soft resetting port ata1.00: configured for UDMA/33 sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 9151 ata1: EH complete sd 0:0:0:0: [sda] 240121728 512-byte hardware sectors (122942 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn''t support DPO or FUA kernel is gentoo-hardened-2.6.23 on x86. after a lot of messages like the ones above, I have this in my dmesg.. ------------[ cut here ]------------ Kernel BUG at e1cd57a4 [verbose debug info unavailable] invalid opcode: 0000 [#1] Modules linked in: nouveau drm btrfs libcrc32c snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_via82xx snd_ac97_codec snd_mpu401_uart snd_ens1370 snd_rawmidi snd_seq_device snd_pcm snd_timer snd_ak4531_codec snd snd_page_alloc usbhid ehci_hcd nls_cp437 vfat fat nfsd exportfs lockd auth_rpcgss sunrpc xt_multiport xt_TCPMSS xt_tcpudp xt_mac ipt_MASQUERADE xt_state xt_limit xt_tcpmss nf_nat_irc nf_conntrack_irc iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables xfs usb_storage parport_pc parport floppy ac97_bus soundcore via_rhine i2c_viapro i2c_core uhci_hcd usbcore 8139cp 8139too mii bitrev crc32 CPU: 0 EIP: 0060:[<e1cd57a4>] Not tainted VLI EFLAGS: 00010282 (2.6.23-hardened-r4 #6) EIP is at btrfs_commit_transaction+0x554/0x5a0 [btrfs] eax: fffffffb ebx: cd7ab6c8 ecx: 00000015 edx: d97ef160 esi: da159200 edi: 00000000 ebp: fffffffb esp: c8139f34 ds: 0068 es: 0068 fs: 0000 gs: 0000 ss: 0068 Process btrfs/0 (pid: 31392, ti=c8138000 task=d7e0baa0 task.ti=c8138000) Stack: 00000000 c330e3e4 e1cd4c14 d857159c 00000001 c8139f6c c330e3e4 ceddecc0 00000005 00000000 d7e0baa0 c042b400 c8139f64 c8139f64 d7b94240 d7b94240 da159200 00001d4c e1cd57f0 00000000 e1cd58a5 d85715c0 c65520c0 c0428085 Call Trace: [<e1cd4c14>] btrfs_defrag_root+0xd4/0xe0 [btrfs] [<c042b400>] autoremove_wake_function+0x0/0x50 [<e1cd57f0>] btrfs_transaction_cleaner+0x0/0xd0 [btrfs] [<e1cd58a5>] btrfs_transaction_cleaner+0xb5/0xd0 [btrfs] [<c0428085>] run_workqueue+0x65/0xe0 [<c042b400>] autoremove_wake_function+0x0/0x50 [<c042889b>] worker_thread+0x9b/0xf0 [<c042b400>] autoremove_wake_function+0x0/0x50 [<c0428800>] worker_thread+0x0/0xf0 [<c042b0c2>] kthread+0x42/0x70 [<c042b080>] kthread+0x0/0x70 [<c0404577>] kernel_thread_helper+0x7/0x10 ======================Code: 00 05 9c 01 00 00 e8 3c 66 97 de 8b 86 bc 00 00 00 05 90 01 00 00 e8 2c 66 97 de 89 d8 e8 e5 ef ff ff e9 1d fc ff ff 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 0f 0b 66 90 eb fc b8 df b5 ce e1 bd 48 EIP: [<e1cd57a4>] btrfs_commit_transaction+0x554/0x5a0 [btrfs] SS:ESP 0068:c8139f34 Some time after all this... other mount points/volumes behind this raid1 also started to fail. ata1: port is slow to respond, please be patient (Status 0xd0) ata1: soft resetting port ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0xd0) ata1: soft resetting port ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: limiting speed to UDMA/33:PIO3 ata1: failed to recover some devices, retrying in 5 secs ata1: port is slow to respond, please be patient (Status 0xd0) ata1: soft resetting port ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1.00: disabled ata1: EH pending after completion, repeating EH (cnt=4) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: soft resetting port ata1: EH complete sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 end_request: I/O error, dev sda, sector 240121552 md: super_written gets error=-5, uptodate=0 sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 end_request: I/O error, dev sda, sector 12695824 Device md1, XFS metadata write error block 0x40 in md1 sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 end_request: I/O error, dev sda, sector 130152336 I/O error in filesystem ("md1") meta-data dev md1 block 0x7003ec0 ("xlog_iodone") error 5 buf count 5120 xfs_force_shutdown(md1,0x2) called from line 958 of file fs/xfs/xfs_log.c. Return address = 0xe0a91f58 Filesystem "md1": Log I/O Error Detected. Shutting down filesystem: md1 Please umount the filesystem, and rectify the problem(s) sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 end_request: I/O error, dev sda, sector 240121552 md: super_written gets error=-5, uptodate=0 should I report this to md mantainers? .. I don''t think that propagating a disc error all the way up to the filesystems should happen on raid1... Kind regards... -- Miguel Sousa Filipe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2008-Jun-09 13:41 UTC
Re: Bug report, OOPs on weird conditions.. device under mdadm raid1 failed...
On Wed, 2008-06-04 at 01:01 +0100, Miguel Sousa Filipe wrote:> Hello, > > I have a Kernel Bug report. > I must warn that this happened on bad hardware.. one disc of a md > raid1 started to fault.. > my dmesg if full of these: >Something rather strange is happening here, but once Btrfs oopses it is hard to blame MD for anything that happens. Have you reproduced this at all? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html