Gordon McLellan
2009-Apr-17 18:23 UTC
[CentOS] problem with 5.3 upgrade or just bad timing?
I've been experiencing delays access data off my file server since I upgraded to 5.3... either I hosed something, have bad hardware or very unlikely, found a bug. When reading or writing data, the stream to the hdd's stops every 5-10 min and %iowait goes through the roof. I checked the logs and they are filled with this diagnostic data that I can't readily decipher. my setup involves two md raid5 devices, 9+1 sata drives total, five of them are on the internal intel sata (i5000) controller, five of them on an external super micro (marvell) controller. I need to do some more testing to see which of (or is it both?) of the md devices is generating these errors. system is running centos 5.3 64bit: # uname -a Linux xenmaster.dimension-x.local 2.6.18-128.1.6.el5xen #1 SMP Wed Apr 1 09:53:14 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md1 : active raid5 sdk1[2] sdj1[4] sdi1[3] sdh1[0] sdg1[1] 976783616 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] md0 : active raid5 sdf1[3] sde1[1] sdd1[4](S) sdc1[0] sdb1[2] 2197715712 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] unused devices: <none> here is what dmesg has to say: (some of it looks like repeats but some of it is very slightly different too) BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Call Trace: [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c [<ffffffff8025c62a>] generic_unplug_device+0x22/0x37 [<ffffffff8835c060>] :raid456:unplug_slaves+0x60/0x9b [<ffffffff8836122f>] :raid456:raid5d+0x17c/0x187 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14 [<ffffffff803fed01>] md_thread+0xf8/0x10e [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e [<ffffffff803fec09>] md_thread+0x0/0x10e [<ffffffff80233483>] kthread+0xfe/0x132 [<ffffffff8025fb2c>] child_rip+0xa/0x12 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233385>] kthread+0x0/0x132 [<ffffffff8025fb22>] child_rip+0x0/0x12 BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Call Trace: [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c [<ffffffff8025c62a>] generic_unplug_device+0x22/0x37 [<ffffffff8835c060>] :raid456:unplug_slaves+0x60/0x9b [<ffffffff8836122f>] :raid456:raid5d+0x17c/0x187 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14 [<ffffffff803fed01>] md_thread+0xf8/0x10e [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e [<ffffffff803fec09>] md_thread+0x0/0x10e [<ffffffff80233483>] kthread+0xfe/0x132 [<ffffffff8025fb2c>] child_rip+0xa/0x12 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233385>] kthread+0x0/0x132 [<ffffffff8025fb22>] child_rip+0x0/0x12 BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Call Trace: [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff8033949e>] cfq_kick_queue+0x0/0x89 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c [<ffffffff80339514>] cfq_kick_queue+0x76/0x89 [<ffffffff8024ee1e>] run_workqueue+0x94/0xe4 [<ffffffff8024b727>] worker_thread+0x0/0x122 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff8024b817>] worker_thread+0xf0/0x122 [<ffffffff80286daf>] default_wake_function+0x0/0xe [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233483>] kthread+0xfe/0x132 [<ffffffff8025fb2c>] child_rip+0xa/0x12 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff8026df02>] monotonic_clock+0x35/0x7b [<ffffffff80233385>] kthread+0x0/0x132 [<ffffffff8025fb22>] child_rip+0x0/0x12 BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Call Trace: [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c [<ffffffff8025c62a>] generic_unplug_device+0x22/0x37 [<ffffffff8835c060>] :raid456:unplug_slaves+0x60/0x9b [<ffffffff8836122f>] :raid456:raid5d+0x17c/0x187 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14 [<ffffffff803fed01>] md_thread+0xf8/0x10e [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e [<ffffffff803fec09>] md_thread+0x0/0x10e [<ffffffff80233483>] kthread+0xfe/0x132 [<ffffffff8025fb2c>] child_rip+0xa/0x12 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233385>] kthread+0x0/0x132 [<ffffffff8025fb22>] child_rip+0x0/0x12 BUG: warning at drivers/ata/libata-core.c:4923/ata_qc_issue() (Tainted: G ) Call Trace: <IRQ> [<ffffffff880b6625>] :libata:ata_qc_issue+0x61/0x4a9 [<ffffffff880bacf3>] :libata:ata_scsi_rw_xlat+0x119/0x188 [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff880babda>] :libata:ata_scsi_rw_xlat+0x0/0x188 [<ffffffff880baea2>] :libata:ata_scsi_translate+0x140/0x16d [<ffffffff880735a6>] :scsi_mod:scsi_done+0x0/0x18 [<ffffffff880bda72>] :libata:ata_scsi_queuecmd+0x1b4/0x1d4 [<ffffffff88073c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 [<ffffffff880790e2>] :scsi_mod:scsi_request_fn+0x2c5/0x39c [<ffffffff8025e2f2>] blk_run_queue+0x41/0x72 [<ffffffff88078016>] :scsi_mod:scsi_next_command+0x2d/0x39 [<ffffffff88078174>] :scsi_mod:scsi_end_request+0xbf/0xcd [<ffffffff880782d0>] :scsi_mod:scsi_io_completion+0x14e/0x324 [<ffffffff880a57cd>] :sd_mod:sd_rw_intr+0x21d/0x257 [<ffffffff88078565>] :scsi_mod:scsi_device_unbusy+0x67/0x81 [<ffffffff802389b8>] blk_done_softirq+0x67/0x75 [<ffffffff80212880>] __do_softirq+0x8d/0x13b [<ffffffff8025fda4>] call_softirq+0x1c/0x278 [<ffffffff8026d08e>] do_softirq+0x31/0x98 [<ffffffff8026cf09>] do_IRQ+0xec/0xf5 [<ffffffff803a6cca>] evtchn_do_upcall+0x13b/0x1fb [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff8026df02>] monotonic_clock+0x35/0x7b [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff8026e4e5>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ba22>] xen_idle+0x38/0x4a [<ffffffff8024a803>] cpu_idle+0x97/0xba var/log/messages has the same info, with timestamps - this error repeats itself several times a second for several minutes at a time all suggestions are appreciated! -Gordon