On 03/27/2012 11:59 PM, Ken Smith wrote:> Hi All, I've been trying to trace the cause of a hang on a 5.6 i386
system.
>
> After running for almost a year, it hung last week, when I plugged in a
> screen it was blank, machine was unresponsive to the keyboard, over the
> network ssh and other daemons didn't respond but the thing has two
> network cards and routing from one to the other was still working. So
> the kernel was up and I suspected a dying disk. But smartctl -a revealed
> nothing untoward. Didn't see anything significant in the log files at
> the time. Logging had stopped when the machine hung.
>
> The machine rebooted normally and has run for almost a week and hung
> again with the same symptoms. Again rebooted, nothing untoward in the
> logs and smartctl still OK. But shortly after I left site this was
> logged in /var/log/messages
>
>
> Mar 27 16:52:04 cjcsrv kernel: INFO: task hald-addon-stor:2179 blocked
> for more than 120 seconds.
> Mar 27 16:52:04 cjcsrv kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 27 16:52:04 cjcsrv kernel: hald-addon-st D 00000197 2552 2179
> 2160 2170 (NOTLB)
> Mar 27 16:52:04 cjcsrv kernel: d895bbbc 00000086 a3374d80
> 00000197 e89436f0 d895bbbc c084e894 0000000a
> Mar 27 16:52:04 cjcsrv kernel: d891aaa0 a3468fc0 00000197
> 000f4240 00000000 d891abac c1506800 e43383c0
> Mar 27 16:52:04 cjcsrv kernel: 00000000 00000086 00000000
> e7ad2b80 c061f7ca 00000000 c1506844 d895bc0c
> Mar 27 16:52:04 cjcsrv kernel: Call Trace:
> Mar 27 16:52:04 cjcsrv kernel: [<e89436f0>]
> cdrom_do_pc_continuation+0x0/0x2c [ide_cd]
> Mar 27 16:52:04 cjcsrv kernel: [<c061f7ca>] schedule+0x9c6/0xa4f
> Mar 27 16:52:04 cjcsrv kernel: [<c061f905>]
wait_for_completion+0x6b/0x8f
> Mar 27 16:52:04 cjcsrv kernel: [<c041f80f>]
default_wake_function+0x0/0xc
> Mar 27 16:52:04 cjcsrv kernel: [<c0575b1b>]
ide_do_drive_cmd+0xd7/0xfa
> Mar 27 16:52:04 cjcsrv kernel: [<e894071c>]
> cdrom_queue_packet_command+0x35/0xbc [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel: [<c0488406>] poll_freewait+0x18/0x4c
> Mar 27 16:52:05 cjcsrv kernel: [<c048874e>] do_sys_poll+0x314/0x339
> Mar 27 16:52:05 cjcsrv kernel: [<e8940c16>]
> cdrom_check_status+0x52/0x5d [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel: [<c04e29ee>] blk_end_sync_rq+0x0/0x1d
> Mar 27 16:52:05 cjcsrv kernel: [<e8940c3b>]
> ide_cdrom_check_media_change_real+0x1a/0x34 [ide_cd]
> Mar 27 16:52:05 cjcsrv kernel: [<e88da06e>] media_changed+0x40/0x6e
[cdrom]
> Mar 27 16:52:05 cjcsrv kernel: [<c047de20>]
check_disk_change+0x13/0x3b
> Mar 27 16:52:05 cjcsrv kernel: [<e88ddfe4>] cdrom_open+0x833/0x876
[cdrom]
> Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46
> Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46
> Mar 27 16:52:05 cjcsrv kernel: [<c048c42f>] __d_lookup+0x98/0xdb
> Mar 27 16:52:05 cjcsrv kernel: [<c04c95c3>] avc_has_perm+0x3c/0x46
> Mar 27 16:52:05 cjcsrv kernel: [<c04c9c29>] inode_has_perm+0x54/0x5c
> Mar 27 16:52:05 cjcsrv kernel: [<c04eef8a>] kobject_get+0xf/0x13
> Mar 27 16:52:05 cjcsrv kernel: [<c04e5e51>] get_disk+0x35/0x6e
> Mar 27 16:52:05 cjcsrv kernel: [<c04e5e91>] exact_lock+0x7/0xd
> Mar 27 16:52:05 cjcsrv kernel: [<c056291d>] kobj_lookup+0x10d/0x168
> Mar 27 16:52:05 cjcsrv kernel: [<e8941042>] idecd_open+0x7b/0xa8
[ide_cd]
> Mar 27 16:52:05 cjcsrv kernel: [<c047e448>] do_open+0x89/0x2cc
> Mar 27 16:52:05 cjcsrv kernel: [<c047e7f7>] blkdev_open+0x0/0x44
> Mar 27 16:52:05 cjcsrv kernel: [<c047e813>] blkdev_open+0x1c/0x44
> Mar 27 16:52:05 cjcsrv kernel: [<c0475937>] __dentry_open+0xc7/0x1ab
> Mar 27 16:52:05 cjcsrv kernel: [<c0475a7f>]
nameidata_to_filp+0x19/0x28
> Mar 27 16:52:05 cjcsrv kernel: [<c0475ab9>] do_filp_open+0x2b/0x31
> Mar 27 16:52:05 cjcsrv kernel: [<c0475afd>] do_sys_open+0x3e/0xae
> Mar 27 16:52:05 cjcsrv kernel: [<c0475b9a>] sys_open+0x16/0x18
> Mar 27 16:52:05 cjcsrv kernel: [<c0404f4b>] syscall_call+0x7/0xb
> Mar 27 16:52:05 cjcsrv kernel: ======================> Mar 27 16:52:19
cjcsrv kernel: ide1: reset timed-out, status=0xd0
>
>
> ide1 has a CD attached. Not essential, the CD could be unplugged. ide0
> has the hard disk, hda, attached. Looking back through the logs there is
> another of these recorded before the previous hang. Any clues as to what
> this is telling me - other than something crashed.
The process "hald-addon-storage" got stuck while trying to access the
cd
("ide-cd"). It probably tried to poll the drive to check if there was
a cd
inserted. Unplugging the drive should do the trick although you could try
to disable the polling by creating a file
"/etc/hal/fdi/policy/99-custom.fdi" with the following content:
<?xml version="1.0" encoding="UTF-8"?>
<deviceinfo version="0.2">
<device>
<match key="storage.removable" bool="true">
<remove key="info.addons"
type="strlist">hald-addon-storage</remove>
</match>
</device>
</deviceinfo>
After doing so restart hald or reboot. hald should no longer poll the drive
after this.
Regards,
Dennis