Hi, I run ocfs2/drbd active-active 2node cluster. ocfs2 version is 1.4.7-1 ocfs2-tool version is 1.4.4 Linux version is RHEL 5.4 (2.6.18-164.el5 x86_64) 1 node crash with kernel panic once. What is the cause? The bottom is the analysis of vmcore. ======================================================= Unable to handle kernel NULL pointer dereference at 0000000000000808 RIP: [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb PGD 187e15067 PUD 187e16067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:06:00.0/0000:07:00.0/irq CPU 1 Modules linked in: mptctl mptbase softdog autofs4 ipmi_devintf ipmi_si ipmi_msghandler ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs drbd(U) bonding ipv6 xfrm_nalgo crypto_api bnx2i(U) libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic(U) dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom sg pcspkr serio_raw hpilo bnx2(U) dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache hpahcisr(PU) ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 21924, comm: res Tainted: P 2.6.18-164.el5 #1 RIP: 0010:[<ffffffff80064ae6>] [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb RSP: 0018:ffff81008b1cfae0 EFLAGS: 00010002 RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 RBP: ffff81008b1cfb98 R08: 0000000000000000 R09: 0000000000000000 R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 FS: 0000000000000000(0000) GS:ffff810105d51840(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000808 CR3: 0000000187e14000 CR4: 00000000000006e0 Process res (pid: 21924, threadinfo ffff81008b1ce000, task ffff810187af4040) Stack: ffffffff8001db30 ffff81007f070520 ffffffff885961f3 ffff810105d39400 ffffffff88596323 06ff813231393234 ffff810075463018 ffff810075463018 0000000000000297 ffff81007f070520 ffff810075463028 0000000000000246 Call Trace: [<ffffffff8001db30>] sigprocmask+0x28/0xdb [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 [<ffffffff88596323>] :ocfs2:ocfs2_delete_inode+0x130/0x1691 [<ffffffff88581f16>] :ocfs2:ocfs2_drop_lock+0x67a/0x77b [<ffffffff8858026a>] :ocfs2:ocfs2_remove_lockres_tracking+0x10/0x45 [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 [<ffffffff8002f49e>] generic_delete_inode+0xc6/0x143 [<ffffffff88595c85>] :ocfs2:ocfs2_drop_inode+0xf0/0x161 [<ffffffff8000d46e>] dput+0xf6/0x114 [<ffffffff800e9c44>] prune_one_dentry+0x66/0x76 [<ffffffff8002e958>] prune_dcache+0x10f/0x149 [<ffffffff8004d66e>] shrink_dcache_parent+0x1c/0xe1 [<ffffffff80104f8b>] proc_flush_task+0x17c/0x1f6 [<ffffffff8008fa2c>] sched_exit+0x27/0xb5 [<ffffffff80018024>] release_task+0x387/0x3cb [<ffffffff80015c50>] do_exit+0x865/0x911 [<ffffffff80049281>] cpuset_exit+0x0/0x88 [<ffffffff8002b080>] get_signal_to_deliver+0x42c/0x45a [<ffffffff8005ae7b>] do_notify_resume+0x9c/0x7af [<ffffffff8008b6a2>] deactivate_task+0x28/0x5f [<ffffffff80021f3f>] __up_read+0x19/0x7f [<ffffffff80066b58>] do_page_fault+0x4fe/0x830 [<ffffffff800b65b2>] audit_syscall_exit+0x336/0x362 [<ffffffff8005d32e>] int_signal+0x12/0x17 Code: f0 ff 0f 0f 88 f3 00 00 00 c3 53 48 89 fb e8 33 f5 02 00 f0 RIP [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb RSP <ffff81008b1cfae0> crash> bt PID: 21924 TASK: ffff810187af4040 CPU: 1 COMMAND: "res" #0 [ffff81008b1cf840] crash_kexec at ffffffff800ac5b9 #1 [ffff81008b1cf900] __die at ffffffff80065127 #2 [ffff81008b1cf940] do_page_fault at ffffffff80066da7 #3 [ffff81008b1cfa30] error_exit at ffffffff8005dde9 [exception RIP: _spin_lock_irq+1] RIP: ffffffff80064ae6 RSP: ffff81008b1cfae0 RFLAGS: 00010002 RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 RBP: ffff81008b1cfb98 R8: 0000000000000000 R9: 0000000000000000 R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff81008b1cfae0] sigprocmask at ffffffff8001db30 #5 [ffff81008b1cfb00] ocfs2_delete_inode at ffffffff88596323 #6 [ffff81008b1cfbf0] generic_delete_inode at ffffffff8002f49e #7 [ffff81008b1cfc10] ocfs2_drop_inode at ffffffff88595c85 #8 [ffff81008b1cfc30] dput at ffffffff8000d46e #9 [ffff81008b1cfc50] prune_one_dentry at ffffffff800e9c44 #10 [ffff81008b1cfc70] prune_dcache at ffffffff8002e958 #11 [ffff81008b1cfca0] shrink_dcache_parent at ffffffff8004d66e #12 [ffff81008b1cfcd0] proc_flush_task at ffffffff80104f8b #13 [ffff81008b1cfd30] release_task at ffffffff80018024 #14 [ffff81008b1cfd60] do_exit at ffffffff80015c50 #15 [ffff81008b1cfdc0] get_signal_to_deliver at ffffffff8002b080 #16 [ffff81008b1cfe00] do_notify_resume at ffffffff8005ae7b #17 [ffff81008b1cff50] int_signal at ffffffff8005d32e RIP: 0000003e4becced2 RSP: 000000004124afd0 RFLAGS: 00000202 RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 000000004124b040 RDI: 0000000000000006 RBP: 000000004124b0e0 R8: 000000004124b110 R9: 00000000000055a4 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 000000004124c000 R14: 000000004124b940 R15: 0000000000001000 ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b crash>
int sigprocmask(int how, sigset_t *set, sigset_t *oldset) { int error; spin_lock_irq(¤t->sighand->siglock); <==== CRASH if (oldset) *oldset = current->blocked; ... } current->sighand is NULL. So definitely a race. Generic kernel issue. Ping your kernel vendor. On 10/03/2011 07:49 PM, Hideyasu Kojima wrote:> Hi, > > I run ocfs2/drbd active-active 2node cluster. > > ocfs2 version is 1.4.7-1 > ocfs2-tool version is 1.4.4 > Linux version is RHEL 5.4 (2.6.18-164.el5 x86_64) > > 1 node crash with kernel panic once. > > What is the cause? > > The bottom is the analysis of vmcore. > > =======================================================> > Unable to handle kernel NULL pointer dereference at 0000000000000808 RIP: > [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb > PGD 187e15067 PUD 187e16067 PMD 0 > Oops: 0002 [1] SMP > last sysfs file: > /devices/pci0000:00/0000:00:09.0/0000:06:00.0/0000:07:00.0/irq > CPU 1 > Modules linked in: mptctl mptbase softdog autofs4 ipmi_devintf ipmi_si > ipmi_msghandler ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) > ocfs2_nodemanager(U) configfs drbd(U) bonding ipv6 xfrm_nalgo crypto_api > bnx2i(U) libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic(U) > dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core > button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev > sr_mod cdrom sg pcspkr serio_raw hpilo bnx2(U) dm_raid45 dm_message > dm_region_hash dm_log dm_mod dm_mem_cache hpahcisr(PU) ata_piix libata > shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd > Pid: 21924, comm: res Tainted: P 2.6.18-164.el5 #1 > RIP: 0010:[<ffffffff80064ae6>] [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb > RSP: 0018:ffff81008b1cfae0 EFLAGS: 00010002 > RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 > RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 > RBP: ffff81008b1cfb98 R08: 0000000000000000 R09: 0000000000000000 > R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 > R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 > FS: 0000000000000000(0000) GS:ffff810105d51840(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000808 CR3: 0000000187e14000 CR4: 00000000000006e0 > Process res (pid: 21924, threadinfo ffff81008b1ce000, task ffff810187af4040) > Stack: ffffffff8001db30 ffff81007f070520 ffffffff885961f3 ffff810105d39400 > ffffffff88596323 06ff813231393234 ffff810075463018 ffff810075463018 > 0000000000000297 ffff81007f070520 ffff810075463028 0000000000000246 > Call Trace: > [<ffffffff8001db30>] sigprocmask+0x28/0xdb > [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 > [<ffffffff88596323>] :ocfs2:ocfs2_delete_inode+0x130/0x1691 > [<ffffffff88581f16>] :ocfs2:ocfs2_drop_lock+0x67a/0x77b > [<ffffffff8858026a>] :ocfs2:ocfs2_remove_lockres_tracking+0x10/0x45 > [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 > [<ffffffff8002f49e>] generic_delete_inode+0xc6/0x143 > [<ffffffff88595c85>] :ocfs2:ocfs2_drop_inode+0xf0/0x161 > [<ffffffff8000d46e>] dput+0xf6/0x114 > [<ffffffff800e9c44>] prune_one_dentry+0x66/0x76 > [<ffffffff8002e958>] prune_dcache+0x10f/0x149 > [<ffffffff8004d66e>] shrink_dcache_parent+0x1c/0xe1 > [<ffffffff80104f8b>] proc_flush_task+0x17c/0x1f6 > [<ffffffff8008fa2c>] sched_exit+0x27/0xb5 > [<ffffffff80018024>] release_task+0x387/0x3cb > [<ffffffff80015c50>] do_exit+0x865/0x911 > [<ffffffff80049281>] cpuset_exit+0x0/0x88 > [<ffffffff8002b080>] get_signal_to_deliver+0x42c/0x45a > [<ffffffff8005ae7b>] do_notify_resume+0x9c/0x7af > [<ffffffff8008b6a2>] deactivate_task+0x28/0x5f > [<ffffffff80021f3f>] __up_read+0x19/0x7f > [<ffffffff80066b58>] do_page_fault+0x4fe/0x830 > [<ffffffff800b65b2>] audit_syscall_exit+0x336/0x362 > [<ffffffff8005d32e>] int_signal+0x12/0x17 > > > Code: f0 ff 0f 0f 88 f3 00 00 00 c3 53 48 89 fb e8 33 f5 02 00 f0 > RIP [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb > RSP<ffff81008b1cfae0> > crash> bt > PID: 21924 TASK: ffff810187af4040 CPU: 1 COMMAND: "res" > #0 [ffff81008b1cf840] crash_kexec at ffffffff800ac5b9 > #1 [ffff81008b1cf900] __die at ffffffff80065127 > #2 [ffff81008b1cf940] do_page_fault at ffffffff80066da7 > #3 [ffff81008b1cfa30] error_exit at ffffffff8005dde9 > [exception RIP: _spin_lock_irq+1] > RIP: ffffffff80064ae6 RSP: ffff81008b1cfae0 RFLAGS: 00010002 > RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 > RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 > RBP: ffff81008b1cfb98 R8: 0000000000000000 R9: 0000000000000000 > R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 > R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #4 [ffff81008b1cfae0] sigprocmask at ffffffff8001db30 > #5 [ffff81008b1cfb00] ocfs2_delete_inode at ffffffff88596323 > #6 [ffff81008b1cfbf0] generic_delete_inode at ffffffff8002f49e > #7 [ffff81008b1cfc10] ocfs2_drop_inode at ffffffff88595c85 > #8 [ffff81008b1cfc30] dput at ffffffff8000d46e > #9 [ffff81008b1cfc50] prune_one_dentry at ffffffff800e9c44 > #10 [ffff81008b1cfc70] prune_dcache at ffffffff8002e958 > #11 [ffff81008b1cfca0] shrink_dcache_parent at ffffffff8004d66e > #12 [ffff81008b1cfcd0] proc_flush_task at ffffffff80104f8b > #13 [ffff81008b1cfd30] release_task at ffffffff80018024 > #14 [ffff81008b1cfd60] do_exit at ffffffff80015c50 > #15 [ffff81008b1cfdc0] get_signal_to_deliver at ffffffff8002b080 > #16 [ffff81008b1cfe00] do_notify_resume at ffffffff8005ae7b > #17 [ffff81008b1cff50] int_signal at ffffffff8005d32e > RIP: 0000003e4becced2 RSP: 000000004124afd0 RFLAGS: 00000202 > RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: ffffffffffffffff > RDX: 0000000000000000 RSI: 000000004124b040 RDI: 0000000000000006 > RBP: 000000004124b0e0 R8: 000000004124b110 R9: 00000000000055a4 > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 > R13: 000000004124c000 R14: 000000004124b940 R15: 0000000000001000 > ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b > crash>
Thank you for responding. I think UEK5 is based on RHEL5 kernel. Does the problem same as UEK5 arise? (2011/10/05 1:45), Sunil Mushran wrote:> int sigprocmask(int how, sigset_t *set, sigset_t *oldset) > { > int error; > > spin_lock_irq(¤t->sighand->siglock); <==== CRASH > if (oldset) > *oldset = current->blocked; > ... > } > > current->sighand is NULL. So definitely a race. Generic kernel issue. > Ping your kernel vendor. > > On 10/03/2011 07:49 PM, Hideyasu Kojima wrote: >> Hi, >> >> I run ocfs2/drbd active-active 2node cluster. >> >> ocfs2 version is 1.4.7-1 >> ocfs2-tool version is 1.4.4 >> Linux version is RHEL 5.4 (2.6.18-164.el5 x86_64) >> >> 1 node crash with kernel panic once. >> >> What is the cause? >> >> The bottom is the analysis of vmcore. >> >> =======================================================>> >> Unable to handle kernel NULL pointer dereference at 0000000000000808 RIP: >> [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb >> PGD 187e15067 PUD 187e16067 PMD 0 >> Oops: 0002 [1] SMP >> last sysfs file: >> /devices/pci0000:00/0000:00:09.0/0000:06:00.0/0000:07:00.0/irq >> CPU 1 >> Modules linked in: mptctl mptbase softdog autofs4 ipmi_devintf ipmi_si >> ipmi_msghandler ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) >> ocfs2_nodemanager(U) configfs drbd(U) bonding ipv6 xfrm_nalgo crypto_api >> bnx2i(U) libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic(U) >> dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core >> button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev >> sr_mod cdrom sg pcspkr serio_raw hpilo bnx2(U) dm_raid45 dm_message >> dm_region_hash dm_log dm_mod dm_mem_cache hpahcisr(PU) ata_piix libata >> shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd >> Pid: 21924, comm: res Tainted: P 2.6.18-164.el5 #1 >> RIP: 0010:[<ffffffff80064ae6>] [<ffffffff80064ae6>] >> _spin_lock_irq+0x1/0xb >> RSP: 0018:ffff81008b1cfae0 EFLAGS: 00010002 >> RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 >> RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 >> RBP: ffff81008b1cfb98 R08: 0000000000000000 R09: 0000000000000000 >> R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 >> R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 >> FS: 0000000000000000(0000) GS:ffff810105d51840(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 0000000000000808 CR3: 0000000187e14000 CR4: 00000000000006e0 >> Process res (pid: 21924, threadinfo ffff81008b1ce000, task >> ffff810187af4040) >> Stack: ffffffff8001db30 ffff81007f070520 ffffffff885961f3 >> ffff810105d39400 >> ffffffff88596323 06ff813231393234 ffff810075463018 ffff810075463018 >> 0000000000000297 ffff81007f070520 ffff810075463028 0000000000000246 >> Call Trace: >> [<ffffffff8001db30>] sigprocmask+0x28/0xdb >> [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 >> [<ffffffff88596323>] :ocfs2:ocfs2_delete_inode+0x130/0x1691 >> [<ffffffff88581f16>] :ocfs2:ocfs2_drop_lock+0x67a/0x77b >> [<ffffffff8858026a>] :ocfs2:ocfs2_remove_lockres_tracking+0x10/0x45 >> [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 >> [<ffffffff8002f49e>] generic_delete_inode+0xc6/0x143 >> [<ffffffff88595c85>] :ocfs2:ocfs2_drop_inode+0xf0/0x161 >> [<ffffffff8000d46e>] dput+0xf6/0x114 >> [<ffffffff800e9c44>] prune_one_dentry+0x66/0x76 >> [<ffffffff8002e958>] prune_dcache+0x10f/0x149 >> [<ffffffff8004d66e>] shrink_dcache_parent+0x1c/0xe1 >> [<ffffffff80104f8b>] proc_flush_task+0x17c/0x1f6 >> [<ffffffff8008fa2c>] sched_exit+0x27/0xb5 >> [<ffffffff80018024>] release_task+0x387/0x3cb >> [<ffffffff80015c50>] do_exit+0x865/0x911 >> [<ffffffff80049281>] cpuset_exit+0x0/0x88 >> [<ffffffff8002b080>] get_signal_to_deliver+0x42c/0x45a >> [<ffffffff8005ae7b>] do_notify_resume+0x9c/0x7af >> [<ffffffff8008b6a2>] deactivate_task+0x28/0x5f >> [<ffffffff80021f3f>] __up_read+0x19/0x7f >> [<ffffffff80066b58>] do_page_fault+0x4fe/0x830 >> [<ffffffff800b65b2>] audit_syscall_exit+0x336/0x362 >> [<ffffffff8005d32e>] int_signal+0x12/0x17 >> >> >> Code: f0 ff 0f 0f 88 f3 00 00 00 c3 53 48 89 fb e8 33 f5 02 00 f0 >> RIP [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb >> RSP<ffff81008b1cfae0> >> crash> bt >> PID: 21924 TASK: ffff810187af4040 CPU: 1 COMMAND: "res" >> #0 [ffff81008b1cf840] crash_kexec at ffffffff800ac5b9 >> #1 [ffff81008b1cf900] __die at ffffffff80065127 >> #2 [ffff81008b1cf940] do_page_fault at ffffffff80066da7 >> #3 [ffff81008b1cfa30] error_exit at ffffffff8005dde9 >> [exception RIP: _spin_lock_irq+1] >> RIP: ffffffff80064ae6 RSP: ffff81008b1cfae0 RFLAGS: 00010002 >> RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 >> RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 >> RBP: ffff81008b1cfb98 R8: 0000000000000000 R9: 0000000000000000 >> R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 >> R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> #4 [ffff81008b1cfae0] sigprocmask at ffffffff8001db30 >> #5 [ffff81008b1cfb00] ocfs2_delete_inode at ffffffff88596323 >> #6 [ffff81008b1cfbf0] generic_delete_inode at ffffffff8002f49e >> #7 [ffff81008b1cfc10] ocfs2_drop_inode at ffffffff88595c85 >> #8 [ffff81008b1cfc30] dput at ffffffff8000d46e >> #9 [ffff81008b1cfc50] prune_one_dentry at ffffffff800e9c44 >> #10 [ffff81008b1cfc70] prune_dcache at ffffffff8002e958 >> #11 [ffff81008b1cfca0] shrink_dcache_parent at ffffffff8004d66e >> #12 [ffff81008b1cfcd0] proc_flush_task at ffffffff80104f8b >> #13 [ffff81008b1cfd30] release_task at ffffffff80018024 >> #14 [ffff81008b1cfd60] do_exit at ffffffff80015c50 >> #15 [ffff81008b1cfdc0] get_signal_to_deliver at ffffffff8002b080 >> #16 [ffff81008b1cfe00] do_notify_resume at ffffffff8005ae7b >> #17 [ffff81008b1cff50] int_signal at ffffffff8005d32e >> RIP: 0000003e4becced2 RSP: 000000004124afd0 RFLAGS: 00000202 >> RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: ffffffffffffffff >> RDX: 0000000000000000 RSI: 000000004124b040 RDI: 0000000000000006 >> RBP: 000000004124b0e0 R8: 000000004124b110 R9: 00000000000055a4 >> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 >> R13: 000000004124c000 R14: 000000004124b940 R15: 0000000000001000 >> ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b >> crash> > > >-- ?????????????????? SCSK???? SCS????? ???????? ????? E-Mail: hid.kojima at ms.scsk.jp TEL 052-951-0398 FAX 052-951-0397
uek is a different kernel entirely. It is hard to say whether you will or will not hit it with uek mainly because the underlying code is different. On 10/06/2011 10:33 PM, Hideyasu Kojima wrote:> Thank you for responding. > > I think UEK5 is based on RHEL5 kernel. > Does the problem same as UEK5 arise? > > (2011/10/05 1:45), Sunil Mushran wrote: >> int sigprocmask(int how, sigset_t *set, sigset_t *oldset) >> { >> int error; >> >> spin_lock_irq(¤t->sighand->siglock); <==== CRASH >> if (oldset) >> *oldset = current->blocked; >> ... >> } >> >> current->sighand is NULL. So definitely a race. Generic kernel issue. >> Ping your kernel vendor. >> >> On 10/03/2011 07:49 PM, Hideyasu Kojima wrote: >>> Hi, >>> >>> I run ocfs2/drbd active-active 2node cluster. >>> >>> ocfs2 version is 1.4.7-1 >>> ocfs2-tool version is 1.4.4 >>> Linux version is RHEL 5.4 (2.6.18-164.el5 x86_64) >>> >>> 1 node crash with kernel panic once. >>> >>> What is the cause? >>> >>> The bottom is the analysis of vmcore. >>> >>> =======================================================>>> >>> Unable to handle kernel NULL pointer dereference at 0000000000000808 RIP: >>> [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb >>> PGD 187e15067 PUD 187e16067 PMD 0 >>> Oops: 0002 [1] SMP >>> last sysfs file: >>> /devices/pci0000:00/0000:00:09.0/0000:06:00.0/0000:07:00.0/irq >>> CPU 1 >>> Modules linked in: mptctl mptbase softdog autofs4 ipmi_devintf ipmi_si >>> ipmi_msghandler ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) >>> ocfs2_nodemanager(U) configfs drbd(U) bonding ipv6 xfrm_nalgo crypto_api >>> bnx2i(U) libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic(U) >>> dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core >>> button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev >>> sr_mod cdrom sg pcspkr serio_raw hpilo bnx2(U) dm_raid45 dm_message >>> dm_region_hash dm_log dm_mod dm_mem_cache hpahcisr(PU) ata_piix libata >>> shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd >>> Pid: 21924, comm: res Tainted: P 2.6.18-164.el5 #1 >>> RIP: 0010:[<ffffffff80064ae6>] [<ffffffff80064ae6>] >>> _spin_lock_irq+0x1/0xb >>> RSP: 0018:ffff81008b1cfae0 EFLAGS: 00010002 >>> RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 >>> RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 >>> RBP: ffff81008b1cfb98 R08: 0000000000000000 R09: 0000000000000000 >>> R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 >>> R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 >>> FS: 0000000000000000(0000) GS:ffff810105d51840(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: 0000000000000808 CR3: 0000000187e14000 CR4: 00000000000006e0 >>> Process res (pid: 21924, threadinfo ffff81008b1ce000, task >>> ffff810187af4040) >>> Stack: ffffffff8001db30 ffff81007f070520 ffffffff885961f3 >>> ffff810105d39400 >>> ffffffff88596323 06ff813231393234 ffff810075463018 ffff810075463018 >>> 0000000000000297 ffff81007f070520 ffff810075463028 0000000000000246 >>> Call Trace: >>> [<ffffffff8001db30>] sigprocmask+0x28/0xdb >>> [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 >>> [<ffffffff88596323>] :ocfs2:ocfs2_delete_inode+0x130/0x1691 >>> [<ffffffff88581f16>] :ocfs2:ocfs2_drop_lock+0x67a/0x77b >>> [<ffffffff8858026a>] :ocfs2:ocfs2_remove_lockres_tracking+0x10/0x45 >>> [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691 >>> [<ffffffff8002f49e>] generic_delete_inode+0xc6/0x143 >>> [<ffffffff88595c85>] :ocfs2:ocfs2_drop_inode+0xf0/0x161 >>> [<ffffffff8000d46e>] dput+0xf6/0x114 >>> [<ffffffff800e9c44>] prune_one_dentry+0x66/0x76 >>> [<ffffffff8002e958>] prune_dcache+0x10f/0x149 >>> [<ffffffff8004d66e>] shrink_dcache_parent+0x1c/0xe1 >>> [<ffffffff80104f8b>] proc_flush_task+0x17c/0x1f6 >>> [<ffffffff8008fa2c>] sched_exit+0x27/0xb5 >>> [<ffffffff80018024>] release_task+0x387/0x3cb >>> [<ffffffff80015c50>] do_exit+0x865/0x911 >>> [<ffffffff80049281>] cpuset_exit+0x0/0x88 >>> [<ffffffff8002b080>] get_signal_to_deliver+0x42c/0x45a >>> [<ffffffff8005ae7b>] do_notify_resume+0x9c/0x7af >>> [<ffffffff8008b6a2>] deactivate_task+0x28/0x5f >>> [<ffffffff80021f3f>] __up_read+0x19/0x7f >>> [<ffffffff80066b58>] do_page_fault+0x4fe/0x830 >>> [<ffffffff800b65b2>] audit_syscall_exit+0x336/0x362 >>> [<ffffffff8005d32e>] int_signal+0x12/0x17 >>> >>> >>> Code: f0 ff 0f 0f 88 f3 00 00 00 c3 53 48 89 fb e8 33 f5 02 00 f0 >>> RIP [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb >>> RSP<ffff81008b1cfae0> >>> crash> bt >>> PID: 21924 TASK: ffff810187af4040 CPU: 1 COMMAND: "res" >>> #0 [ffff81008b1cf840] crash_kexec at ffffffff800ac5b9 >>> #1 [ffff81008b1cf900] __die at ffffffff80065127 >>> #2 [ffff81008b1cf940] do_page_fault at ffffffff80066da7 >>> #3 [ffff81008b1cfa30] error_exit at ffffffff8005dde9 >>> [exception RIP: _spin_lock_irq+1] >>> RIP: ffffffff80064ae6 RSP: ffff81008b1cfae0 RFLAGS: 00010002 >>> RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80 >>> RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808 >>> RBP: ffff81008b1cfb98 R8: 0000000000000000 R9: 0000000000000000 >>> R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8 >>> R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8 >>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>> #4 [ffff81008b1cfae0] sigprocmask at ffffffff8001db30 >>> #5 [ffff81008b1cfb00] ocfs2_delete_inode at ffffffff88596323 >>> #6 [ffff81008b1cfbf0] generic_delete_inode at ffffffff8002f49e >>> #7 [ffff81008b1cfc10] ocfs2_drop_inode at ffffffff88595c85 >>> #8 [ffff81008b1cfc30] dput at ffffffff8000d46e >>> #9 [ffff81008b1cfc50] prune_one_dentry at ffffffff800e9c44 >>> #10 [ffff81008b1cfc70] prune_dcache at ffffffff8002e958 >>> #11 [ffff81008b1cfca0] shrink_dcache_parent at ffffffff8004d66e >>> #12 [ffff81008b1cfcd0] proc_flush_task at ffffffff80104f8b >>> #13 [ffff81008b1cfd30] release_task at ffffffff80018024 >>> #14 [ffff81008b1cfd60] do_exit at ffffffff80015c50 >>> #15 [ffff81008b1cfdc0] get_signal_to_deliver at ffffffff8002b080 >>> #16 [ffff81008b1cfe00] do_notify_resume at ffffffff8005ae7b >>> #17 [ffff81008b1cff50] int_signal at ffffffff8005d32e >>> RIP: 0000003e4becced2 RSP: 000000004124afd0 RFLAGS: 00000202 >>> RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: ffffffffffffffff >>> RDX: 0000000000000000 RSI: 000000004124b040 RDI: 0000000000000006 >>> RBP: 000000004124b0e0 R8: 000000004124b110 R9: 00000000000055a4 >>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 >>> R13: 000000004124c000 R14: 000000004124b940 R15: 0000000000001000 >>> ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b >>> crash> >> >> >> > >