This is a repost of sorts, and for that I am sorry; I do not think my
original posting subject was very clear, and I have more data about
the problem.
I'm experiencing lots of kernel errors when reading or writing to a
disk that is part of an mdadm softraid-5 array. Since originally
detecting this problem, I have isolated it to one disk, but I'm not
sure what the cause of the error is. I have torn down the array and
have been testing the actual device now.
I tested the drive several times using smartctl and the tests came
back OK / PASSED each time, and no kernel errors were logged.
I tested the drive with the "badblocks" command, and there were no
problems reported, and no kernel errors were logged.
As soon as I try to write any substantial data to the device, kernel
errors start pouring in:
# dd if=/dev/zero of=/dev/sdg oflag=sync bs=8M count=100
The dd command does its thing, and eventually finishes without error,
but at a very slow speed, most of my drives run this command at
65-75mb/sec, and sdg completes it at 13mb/sec.
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>]
blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>]
__do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>]
call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>]
do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>]
raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>]
start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>]
_sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>]
blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>]
__do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>]
call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>]
do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>]
raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>]
start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>]
_sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>]
blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>]
__do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>]
call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>]
do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>]
raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>]
start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>]
_sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>]
blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>]
__do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>]
call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>]
do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>]
raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>]
start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>]
_sinittext+0x1e5/0x1eb
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: BUG: warning at
drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G )
Apr 18 01:10:00 xenmaster kernel:
Apr 18 01:10:00 xenmaster kernel: Call Trace:
Apr 18 01:10:00 xenmaster kernel: <IRQ> [<ffffffff880b6625>]
:libata:ata_qc_issue+0x61/0x4a9
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bacf3>]
:libata:ata_scsi_rw_xlat+0x119/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880babda>]
:libata:ata_scsi_rw_xlat+0x0/0x188
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880baea2>]
:libata:ata_scsi_translate+0x140/0x16d
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880735a6>]
:scsi_mod:scsi_done+0x0/0x18
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880bda72>]
:libata:ata_scsi_queuecmd+0x1b4/0x1d4
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88073c83>]
:scsi_mod:scsi_dispatch_cmd+0x290/0x322
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880790e2>]
:scsi_mod:scsi_request_fn+0x2c5/0x39c
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025e2f2>]
blk_run_queue+0x41/0x72
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078016>]
:scsi_mod:scsi_next_command+0x2d/0x39
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078174>]
:scsi_mod:scsi_end_request+0xbf/0xcd
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880782d0>]
:scsi_mod:scsi_io_completion+0x14e/0x324
Apr 18 01:10:00 xenmaster kernel: [<ffffffff880a57cd>]
:sd_mod:sd_rw_intr+0x21d/0x257
Apr 18 01:10:00 xenmaster kernel: [<ffffffff88078565>]
:scsi_mod:scsi_device_unbusy+0x67/0x81
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802389b8>]
blk_done_softirq+0x67/0x75
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80212880>]
__do_softirq+0x8d/0x13b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025fda4>]
call_softirq+0x1c/0x278
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026d08e>]
do_softirq+0x31/0x98
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026cf09>] do_IRQ+0xec/0xf5
Apr 18 01:10:00 xenmaster kernel: [<ffffffff803a6cca>]
evtchn_do_upcall+0x13b/0x1fb
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8025f8d6>]
do_hypervisor_callback+0x1e/0x2c
Apr 18 01:10:00 xenmaster kernel: <EOI> [<ffffffff8026df02>]
monotonic_clock+0x35/0x7b
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff802063aa>]
hypercall_page+0x3aa/0x1000
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026e4e5>]
raw_safe_halt+0x84/0xa8
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8026ba22>] xen_idle+0x38/0x4a
Apr 18 01:10:00 xenmaster kernel: [<ffffffff8024a803>] cpu_idle+0x97/0xba
Apr 18 01:10:00 xenmaster kernel: [<ffffffff80634b09>]
start_kernel+0x21f/0x224
Apr 18 01:10:00 xenmaster kernel: [<ffffffff806341e5>]
_sinittext+0x1e5/0x1eb
The machine in question is running 64bit Centos 5.3 Xen kernel and is Dom0:
# uname -a
Linux xenmaster.dimension-x.local 2.6.18-128.1.6.el5xen #1 SMP Wed Apr
1 09:53:14 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
This problem started about five days ago, when I upgraded from 5.2 to
5.3. I'm not sure if that was the cause or just a coincidence.
Any suggestions?
-Gordon