José Costa
2007-Feb-26 10:08 UTC
[Ocfs2-users] Problems with ocfs2 when rebooting the first node.
Hello, I'm using 2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp with OCFS2 1.2.4. If I start the node1 and then the node2... everything works. If I reboot the node1, it gives this error to node2 and I can't mount on node1 when it comes up and can't do anything on node2 ocfs2 mounts and also in /sys/kernel/cluster/*. I've 8 ocfs2 partitions. (don't ask why) Here's the kernel bug. Feb 26 17:39:42 system2 kernel: (3903,1):dlm_deref_lockres_handler:2353 ERROR: 5400F4D01A9E4561961EFD460CE743B9:M000000000000 0000000005e6b4c612: node 0 trying to drop ref but it is already dropped! Feb 26 17:39:42 system2 kernel: ------------[ cut here ]------------ Feb 26 17:39:42 system2 kernel: kernel BUG at fs/ocfs2/dlm/dlmdebug.c:304! Feb 26 17:39:42 system2 kernel: invalid opcode: 0000 [#1] Feb 26 17:39:42 system2 kernel: SMP Feb 26 17:39:42 system2 kernel: last sysfs file: /devices/pci0000:00/0000:00:05.0/resource Feb 26 17:39:42 system2 kernel: Modules linked in: ocfs2 af_packet ocfs2_user_heartbeat ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanag er configfs bonding button battery ac apparmor aamatch_pcre loop dm_mod i2c_piix4 i2c_core ohci_hcd sworks_agp usbcore agpgar t e100 mii e1000 shpchp pci_hotplug ide_cd cdrom parport_pc lp parport ext3 jbd edd fan thermal processor i2o_block i2o_core qla2xxx firmware_class scsi_transport_fc sg st aic7xxx scsi_transport_spi serverworks sd_mod scsi_mod ide_disk ide_core Feb 26 17:39:42 system2 kernel: CPU: 1 Feb 26 17:39:42 system2 kernel: EIP: 0060:[<f9356fe2>] Tainted: G U VLI Feb 26 17:39:42 system2 kernel: EFLAGS: 00010202 (2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp #1) Feb 26 17:39:42 system2 kernel: EIP is at __dlm_print_one_lock_resource+0x12/0x729 [ocfs2_dlm] Feb 26 17:39:42 system2 kernel: eax: f70ae401 ebx: 00000000 ecx: 00000000 edx: 00000282 Feb 26 17:39:42 system2 kernel: esi: f70ae460 edi: 0000001f ebp: 00000001 esp: f682de54 Feb 26 17:39:42 system2 kernel: ds: 007b es: 007b ss: 0068 Feb 26 17:39:42 system2 kernel: Process o2net (pid: 3903, threadinfo=f682c000 task=f4a81910) Feb 26 17:39:42 system2 kernel: Stack: <0>00000000 00000002 f70ae460 c0130f57 f682de64 f682de64 00000005 00000082 Feb 26 17:39:42 system2 kernel: f682de9c f65a9d64 f65a9d54 00000000 f70ae460 0000001f 00000001 c0120a80 Feb 26 17:39:42 system2 kernel: f9374221 f682dea8 f682dea8 f936527b f9374221 00000f3f 00000001 f936e576 Feb 26 17:39:42 system2 kernel: Call Trace: Feb 26 17:39:42 system2 kernel: [<c0130f57>] autoremove_wake_function+0x0/0x2d Feb 26 17:39:42 system2 kernel: [<c0120a80>] printk+0x14/0x18 Feb 26 17:39:42 system2 kernel: [<f936527b>] dlm_deref_lockres_handler+0x2a6/0x3df [ocfs2_dlm] Feb 26 17:39:42 system2 kernel: [<f9365285>] dlm_deref_lockres_handler+0x2b0/0x3df [ocfs2_dlm] Feb 26 17:39:42 system2 kernel: [<f92fc792>] o2net_process_message+0x3e7/0x598 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<f92fba1d>] o2net_recv_tcp_msg+0x55/0x60 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<f92fe2e6>] o2net_rx_until_empty+0x64d/0x773 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<c012de26>] run_workqueue+0x78/0xb5 Feb 26 17:39:42 system2 kernel: [<f92fdc99>] o2net_rx_until_empty+0x0/0x773 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<c012e679>] worker_thread+0x0/0x10d Feb 26 17:39:42 system2 kernel: [<c012e755>] worker_thread+0xdc/0x10d Feb 26 17:39:42 system2 kernel: [<c011a53d>] default_wake_function+0x0/0xc Feb 26 17:39:42 system2 kernel: [<c0130e75>] kthread+0x9d/0xc9 Feb 26 17:39:42 system2 kernel: [<c0130dd8>] kthread+0x0/0xc9 Feb 26 17:39:42 system2 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb Feb 26 17:39:42 system2 kernel: Code: 64 d1 37 f9 0f 85 96 fe ff ff b0 01 86 05 60 d1 37 f9 89 e8 5b 5e 5f 5d c3 55 57 56 53 83 ec 60 89 44 24 08 8a 40 48 84 c0 7e 08 <0f> 0b 30 01 10 f4 36 f9 f6 05 81 de 30 f9 01 75 14 a1 84 de 30 Feb 26 17:39:43 system2 kernel: <5>(6543,1):dlm_get_lock_resource:920 575FC4A619124A3BA677F994DF3B18F2:$RECOVERY: at least o ne node (0) torecover before lock mastery can begin
Sunil Mushran
2007-Feb-26 17:17 UTC
[Ocfs2-users] Problems with ocfs2 when rebooting the first node.
Check out this bug: http://oss.oracle.com/bugzilla/show_bug.cgi?id=854 Jos? Costa wrote:> Hello, > > I'm using 2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp with OCFS2 > 1.2.4. > > If I start the node1 and then the node2... everything works. If I > reboot the node1, it gives this error to node2 and I can't mount on > node1 when it comes up and can't do anything on node2 ocfs2 mounts and > also in /sys/kernel/cluster/*. > > I've 8 ocfs2 partitions. (don't ask why) > > Here's the kernel bug. > > Feb 26 17:39:42 system2 kernel: > (3903,1):dlm_deref_lockres_handler:2353 ERROR: > 5400F4D01A9E4561961EFD460CE743B9:M000000000000 > 0000000005e6b4c612: node 0 trying to drop ref but it is already dropped! > Feb 26 17:39:42 system2 kernel: ------------[ cut here ]------------ > Feb 26 17:39:42 system2 kernel: kernel BUG at > fs/ocfs2/dlm/dlmdebug.c:304! > Feb 26 17:39:42 system2 kernel: invalid opcode: 0000 [#1] > Feb 26 17:39:42 system2 kernel: SMP > Feb 26 17:39:42 system2 kernel: last sysfs file: > /devices/pci0000:00/0000:00:05.0/resource > Feb 26 17:39:42 system2 kernel: Modules linked in: ocfs2 af_packet > ocfs2_user_heartbeat ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanag > er configfs bonding button battery ac apparmor aamatch_pcre loop > dm_mod i2c_piix4 i2c_core ohci_hcd sworks_agp usbcore agpgar > t e100 mii e1000 shpchp pci_hotplug ide_cd cdrom parport_pc lp parport > ext3 jbd edd fan thermal processor i2o_block i2o_core > qla2xxx firmware_class scsi_transport_fc sg st aic7xxx > scsi_transport_spi serverworks sd_mod scsi_mod ide_disk ide_core > Feb 26 17:39:42 system2 kernel: CPU: 1 > Feb 26 17:39:42 system2 kernel: EIP: 0060:[<f9356fe2>] Tainted: > G U VLI > Feb 26 17:39:42 system2 kernel: EFLAGS: 00010202 > (2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp #1) > Feb 26 17:39:42 system2 kernel: EIP is at > __dlm_print_one_lock_resource+0x12/0x729 [ocfs2_dlm] > Feb 26 17:39:42 system2 kernel: eax: f70ae401 ebx: 00000000 ecx: > 00000000 edx: 00000282 > Feb 26 17:39:42 system2 kernel: esi: f70ae460 edi: 0000001f ebp: > 00000001 esp: f682de54 > Feb 26 17:39:42 system2 kernel: ds: 007b es: 007b ss: 0068 > Feb 26 17:39:42 system2 kernel: Process o2net (pid: 3903, > threadinfo=f682c000 task=f4a81910) > Feb 26 17:39:42 system2 kernel: Stack: <0>00000000 00000002 f70ae460 > c0130f57 f682de64 f682de64 00000005 00000082 > Feb 26 17:39:42 system2 kernel: f682de9c f65a9d64 f65a9d54 > 00000000 f70ae460 0000001f 00000001 c0120a80 > Feb 26 17:39:42 system2 kernel: f9374221 f682dea8 f682dea8 > f936527b f9374221 00000f3f 00000001 f936e576 > Feb 26 17:39:42 system2 kernel: Call Trace: > Feb 26 17:39:42 system2 kernel: [<c0130f57>] > autoremove_wake_function+0x0/0x2d > Feb 26 17:39:42 system2 kernel: [<c0120a80>] printk+0x14/0x18 > Feb 26 17:39:42 system2 kernel: [<f936527b>] > dlm_deref_lockres_handler+0x2a6/0x3df [ocfs2_dlm] > Feb 26 17:39:42 system2 kernel: [<f9365285>] > dlm_deref_lockres_handler+0x2b0/0x3df [ocfs2_dlm] > Feb 26 17:39:42 system2 kernel: [<f92fc792>] > o2net_process_message+0x3e7/0x598 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<f92fba1d>] > o2net_recv_tcp_msg+0x55/0x60 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<f92fe2e6>] > o2net_rx_until_empty+0x64d/0x773 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<c012de26>] run_workqueue+0x78/0xb5 > Feb 26 17:39:42 system2 kernel: [<f92fdc99>] > o2net_rx_until_empty+0x0/0x773 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<c012e679>] worker_thread+0x0/0x10d > Feb 26 17:39:42 system2 kernel: [<c012e755>] worker_thread+0xdc/0x10d > Feb 26 17:39:42 system2 kernel: [<c011a53d>] > default_wake_function+0x0/0xc > Feb 26 17:39:42 system2 kernel: [<c0130e75>] kthread+0x9d/0xc9 > Feb 26 17:39:42 system2 kernel: [<c0130dd8>] kthread+0x0/0xc9 > Feb 26 17:39:42 system2 kernel: [<c0102005>] > kernel_thread_helper+0x5/0xb > Feb 26 17:39:42 system2 kernel: Code: 64 d1 37 f9 0f 85 96 fe ff ff b0 > 01 86 05 60 d1 37 f9 89 e8 5b 5e 5f 5d c3 55 57 56 53 > 83 ec 60 89 44 24 08 8a 40 48 84 c0 7e 08 <0f> 0b 30 01 10 f4 36 f9 f6 > 05 81 de 30 f9 01 75 14 a1 84 de 30 > Feb 26 17:39:43 system2 kernel: <5>(6543,1):dlm_get_lock_resource:920 > 575FC4A619124A3BA677F994DF3B18F2:$RECOVERY: at least o > ne node (0) torecover before lock mastery can begin > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Sebastian Reitenbach
2007-Feb-26 22:18 UTC
[Ocfs2-users] Problems with ocfs2 when rebooting the first node.
Hi, Sunil Mushran <Sunil.Mushran@oracle.com> wrote:> Check out this bug: > http://oss.oracle.com/bugzilla/show_bug.cgi?id=854 >thanks a lot! Sebastian
José Costa
2007-Mar-01 14:57 UTC
[Ocfs2-users] Problems with ocfs2 when rebooting the first node.
Hello, I'm using 2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp with OCFS2 1.2.4. If I start the node1 and then the node2... everything works. If I reboot the node1, it gives this error to node2 and I can't mount on node1 when it comes up and can't do anything on node2 ocfs2 mounts and also in /sys/kernel/cluster/*. I've 8 ocfs2 partitions. (don't ask why) Here's the kernel bug. Feb 26 17:39:42 system2 kernel: (3903,1):dlm_deref_lockres_handler:2353 ERROR: 5400F4D01A9E4561961EFD460CE743B9:M000000000000 0000000005e6b4c612: node 0 trying to drop ref but it is already dropped! Feb 26 17:39:42 system2 kernel: ------------[ cut here ]------------ Feb 26 17:39:42 system2 kernel: kernel BUG at fs/ocfs2/dlm/dlmdebug.c:304! Feb 26 17:39:42 system2 kernel: invalid opcode: 0000 [#1] Feb 26 17:39:42 system2 kernel: SMP Feb 26 17:39:42 system2 kernel: last sysfs file: /devices/pci0000:00/0000:00:05.0/resource Feb 26 17:39:42 system2 kernel: Modules linked in: ocfs2 af_packet ocfs2_user_heartbeat ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanag er configfs bonding button battery ac apparmor aamatch_pcre loop dm_mod i2c_piix4 i2c_core ohci_hcd sworks_agp usbcore agpgar t e100 mii e1000 shpchp pci_hotplug ide_cd cdrom parport_pc lp parport ext3 jbd edd fan thermal processor i2o_block i2o_core qla2xxx firmware_class scsi_transport_fc sg st aic7xxx scsi_transport_spi serverworks sd_mod scsi_mod ide_disk ide_core Feb 26 17:39:42 system2 kernel: CPU: 1 Feb 26 17:39:42 system2 kernel: EIP: 0060:[<f9356fe2>] Tainted: G U VLI Feb 26 17:39:42 system2 kernel: EFLAGS: 00010202 (2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp #1) Feb 26 17:39:42 system2 kernel: EIP is at __dlm_print_one_lock_resource+0x12/0x729 [ocfs2_dlm] Feb 26 17:39:42 system2 kernel: eax: f70ae401 ebx: 00000000 ecx: 00000000 edx: 00000282 Feb 26 17:39:42 system2 kernel: esi: f70ae460 edi: 0000001f ebp: 00000001 esp: f682de54 Feb 26 17:39:42 system2 kernel: ds: 007b es: 007b ss: 0068 Feb 26 17:39:42 system2 kernel: Process o2net (pid: 3903, threadinfo=f682c000 task=f4a81910) Feb 26 17:39:42 system2 kernel: Stack: <0>00000000 00000002 f70ae460 c0130f57 f682de64 f682de64 00000005 00000082 Feb 26 17:39:42 system2 kernel: f682de9c f65a9d64 f65a9d54 00000000 f70ae460 0000001f 00000001 c0120a80 Feb 26 17:39:42 system2 kernel: f9374221 f682dea8 f682dea8 f936527b f9374221 00000f3f 00000001 f936e576 Feb 26 17:39:42 system2 kernel: Call Trace: Feb 26 17:39:42 system2 kernel: [<c0130f57>] autoremove_wake_function+0x0/0x2d Feb 26 17:39:42 system2 kernel: [<c0120a80>] printk+0x14/0x18 Feb 26 17:39:42 system2 kernel: [<f936527b>] dlm_deref_lockres_handler+0x2a6/0x3df [ocfs2_dlm] Feb 26 17:39:42 system2 kernel: [<f9365285>] dlm_deref_lockres_handler+0x2b0/0x3df [ocfs2_dlm] Feb 26 17:39:42 system2 kernel: [<f92fc792>] o2net_process_message+0x3e7/0x598 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<f92fba1d>] o2net_recv_tcp_msg+0x55/0x60 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<f92fe2e6>] o2net_rx_until_empty+0x64d/0x773 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<c012de26>] run_workqueue+0x78/0xb5 Feb 26 17:39:42 system2 kernel: [<f92fdc99>] o2net_rx_until_empty+0x0/0x773 [ocfs2_nodemanager] Feb 26 17:39:42 system2 kernel: [<c012e679>] worker_thread+0x0/0x10d Feb 26 17:39:42 system2 kernel: [<c012e755>] worker_thread+0xdc/0x10d Feb 26 17:39:42 system2 kernel: [<c011a53d>] default_wake_function+0x0/0xc Feb 26 17:39:42 system2 kernel: [<c0130e75>] kthread+0x9d/0xc9 Feb 26 17:39:42 system2 kernel: [<c0130dd8>] kthread+0x0/0xc9 Feb 26 17:39:42 system2 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb Feb 26 17:39:42 system2 kernel: Code: 64 d1 37 f9 0f 85 96 fe ff ff b0 01 86 05 60 d1 37 f9 89 e8 5b 5e 5f 5d c3 55 57 56 53 83 ec 60 89 44 24 08 8a 40 48 84 c0 7e 08 <0f> 0b 30 01 10 f4 36 f9 f6 05 81 de 30 f9 01 75 14 a1 84 de 30 Feb 26 17:39:43 system2 kernel: <5>(6543,1):dlm_get_lock_resource:920 575FC4A619124A3BA677F994DF3B18F2:$RECOVERY: at least o ne node (0) torecover before lock mastery can begin
Alexei_Roudnev
2007-Mar-01 17:31 UTC
[Ocfs2-users] Re: Few panics with OCFSv2, SLES9 Sp3, kernel 282
In addition: File (Inode 842694 ) is log file. IT was written in parallel from 2 nodes before failure. After system panicked and rebooted (and mounted FS again), node try to add logs again and it caused one more failure. When I remounted FS on both nodes, everything became pretty well. Looks as a bug in syncronization. ----- Original Message ----- From: "Alexei_Roudnev" <Alexei_Roudnev@exigengroup.com> To: <ocfs2-users@oss.oracle.com> Sent: Thursday, March 01, 2007 5:05 PM Subject: Few panics with OCFSv2, SLES9 Sp3, kernel 282 Saw it few times until I unmounted FS on all nodes, run fsck (show nothing) and then mounted back: Do we have any errors/bugs, explaining this: Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: (6001,1):ocfs2_extend_file:789 ERROR: bug expression: i_size_read(inode) !(le64_to_cpu(fe->i_size) - *bytes_extended) Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: (6001,1):ocfs2_extend_file:789 ERROR: Inode 842694 i_size = 1270197, dinode i_size = 1476996, bytes_extended = 0, new_i_size = 1270198 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: ----------- [cut here ] --------- [please bite here ] --------- Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Kernel BUG at file:789 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: invalid operand: 0000 [1] SMP Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: CPU 1 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Pid: 6001, comm: perl Tainted: G U (2.6.5-7.282-smp SLES9_SP3_BRANCH-20060829104040) Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RIP: 0010:[<ffffffffa0356bc4>] <ffffffffa0356bc4>{:ocfs2:ocfs2_extend_file+772} Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RSP: 0018:00000100b3f05cd8 EFLAGS: 00010216 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RAX: 000000000000008a RBX: 000001007b4e7000 RCX: 000000000003ffff Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RDX: 0000000000000000 RSI: 00000000000162e2 RDI: 00000000001361b5 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RBP: 0000000000000000 R08: 0000000000000033 R09: 0000000000000006 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: R10: 00000000ffffffff R11: 0000000000000000 R12: 000001001f6a33d8 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: R13: 000001010b683e80 R14: 000001001f6a33d8 R15: 000001013940e000 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: FS: 0000002a95d3d6e0(0000) GS:ffffffff8057e600(0000) knlGS:0000000000000000 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: CR2: 0000000001043928 CR3: 00000000bff04000 CR4: 00000000000006e0 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Process perl (pid: 6001, threadinfo 00000100b3f04000, task 000001001c16d620) Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Stack: 00000000001361b5 0000000000168984 0000000000000000 00000000001361b6 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: 0000000000000000 0000000000000000 00000100b3f05dd0 00000000001361b6 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: 0000000000000216 0000000000000000 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Call Trace:<ffffffffa0364538>{:ocfs2:ocfs2_lock_buffer_inodes+536} Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: <ffffffffa03657f5>{:ocfs2:ocfs2_write_lock_maybe_extend+2517} Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: <ffffffffa035562e>{:ocfs2:ocfs2_file_write+414} <ffffffff80197d9c>{vfs_fstat+204} Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: <ffffffff8018d734>{vfs_write+244} <ffffffff8018d98d>{sys_write+157} Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: <ffffffff80110f79>{error_exit+0} <ffffffff801106b4>{system_call+124} Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Code: 0f 0b 7f c0 37 a0 ff ff ff ff 15 03 48 39 7c 24 38 0f 83 d7 Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RIP <ffffffffa0356bc4>{:ocfs2:ocfs2_extend_file+772} RSP <00000100b3f05cd8> Mar 1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: <0>Kernel panic: Oops Ma----- Original Message ----- From: "Jos? Costa" <meetra@gmail.com> To: <ocfs2-users@oss.oracle.com> Sent: Thursday, March 01, 2007 3:23 PM Subject: [Ocfs2-users] Problems with ocfs2 when rebooting the first node.> Hello, > > I'm using 2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp with OCFS2 1.2.4. > > If I start the node1 and then the node2... everything works. If I > reboot the node1, it gives this error to node2 and I can't mount on > node1 when it comes up and can't do anything on node2 ocfs2 mounts and > also in /sys/kernel/cluster/*. > > I've 8 ocfs2 partitions. (don't ask why) > > Here's the kernel bug. > > Feb 26 17:39:42 system2 kernel: > (3903,1):dlm_deref_lockres_handler:2353 ERROR: > 5400F4D01A9E4561961EFD460CE743B9:M000000000000 > 0000000005e6b4c612: node 0 trying to drop ref but it is already dropped! > Feb 26 17:39:42 system2 kernel: ------------[ cut here ]------------ > Feb 26 17:39:42 system2 kernel: kernel BUG at fs/ocfs2/dlm/dlmdebug.c:304! > Feb 26 17:39:42 system2 kernel: invalid opcode: 0000 [#1] > Feb 26 17:39:42 system2 kernel: SMP > Feb 26 17:39:42 system2 kernel: last sysfs file: > /devices/pci0000:00/0000:00:05.0/resource > Feb 26 17:39:42 system2 kernel: Modules linked in: ocfs2 af_packet > ocfs2_user_heartbeat ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanag > er configfs bonding button battery ac apparmor aamatch_pcre loop > dm_mod i2c_piix4 i2c_core ohci_hcd sworks_agp usbcore agpgar > t e100 mii e1000 shpchp pci_hotplug ide_cd cdrom parport_pc lp parport > ext3 jbd edd fan thermal processor i2o_block i2o_core > qla2xxx firmware_class scsi_transport_fc sg st aic7xxx > scsi_transport_spi serverworks sd_mod scsi_mod ide_disk ide_core > Feb 26 17:39:42 system2 kernel: CPU: 1 > Feb 26 17:39:42 system2 kernel: EIP: 0060:[<f9356fe2>] Tainted: > G U VLI > Feb 26 17:39:42 system2 kernel: EFLAGS: 00010202 > (2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp #1) > Feb 26 17:39:42 system2 kernel: EIP is at > __dlm_print_one_lock_resource+0x12/0x729 [ocfs2_dlm] > Feb 26 17:39:42 system2 kernel: eax: f70ae401 ebx: 00000000 ecx: > 00000000 edx: 00000282 > Feb 26 17:39:42 system2 kernel: esi: f70ae460 edi: 0000001f ebp: > 00000001 esp: f682de54 > Feb 26 17:39:42 system2 kernel: ds: 007b es: 007b ss: 0068 > Feb 26 17:39:42 system2 kernel: Process o2net (pid: 3903, > threadinfo=f682c000 task=f4a81910) > Feb 26 17:39:42 system2 kernel: Stack: <0>00000000 00000002 f70ae460 > c0130f57 f682de64 f682de64 00000005 00000082 > Feb 26 17:39:42 system2 kernel: f682de9c f65a9d64 f65a9d54 > 00000000 f70ae460 0000001f 00000001 c0120a80 > Feb 26 17:39:42 system2 kernel: f9374221 f682dea8 f682dea8 > f936527b f9374221 00000f3f 00000001 f936e576 > Feb 26 17:39:42 system2 kernel: Call Trace: > Feb 26 17:39:42 system2 kernel: [<c0130f57>]autoremove_wake_function+0x0/0x2d> Feb 26 17:39:42 system2 kernel: [<c0120a80>] printk+0x14/0x18 > Feb 26 17:39:42 system2 kernel: [<f936527b>] > dlm_deref_lockres_handler+0x2a6/0x3df [ocfs2_dlm] > Feb 26 17:39:42 system2 kernel: [<f9365285>] > dlm_deref_lockres_handler+0x2b0/0x3df [ocfs2_dlm] > Feb 26 17:39:42 system2 kernel: [<f92fc792>] > o2net_process_message+0x3e7/0x598 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<f92fba1d>] > o2net_recv_tcp_msg+0x55/0x60 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<f92fe2e6>] > o2net_rx_until_empty+0x64d/0x773 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<c012de26>] run_workqueue+0x78/0xb5 > Feb 26 17:39:42 system2 kernel: [<f92fdc99>] > o2net_rx_until_empty+0x0/0x773 [ocfs2_nodemanager] > Feb 26 17:39:42 system2 kernel: [<c012e679>] worker_thread+0x0/0x10d > Feb 26 17:39:42 system2 kernel: [<c012e755>] worker_thread+0xdc/0x10d > Feb 26 17:39:42 system2 kernel: [<c011a53d>]default_wake_function+0x0/0xc> Feb 26 17:39:42 system2 kernel: [<c0130e75>] kthread+0x9d/0xc9 > Feb 26 17:39:42 system2 kernel: [<c0130dd8>] kthread+0x0/0xc9 > Feb 26 17:39:42 system2 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb > Feb 26 17:39:42 system2 kernel: Code: 64 d1 37 f9 0f 85 96 fe ff ff b0 > 01 86 05 60 d1 37 f9 89 e8 5b 5e 5f 5d c3 55 57 56 53 > 83 ec 60 89 44 24 08 8a 40 48 84 c0 7e 08 <0f> 0b 30 01 10 f4 36 f9 f6 > 05 81 de 30 f9 01 75 14 a1 84 de 30 > Feb 26 17:39:43 system2 kernel: <5>(6543,1):dlm_get_lock_resource:920 > 575FC4A619124A3BA677F994DF3B18F2:$RECOVERY: at least o > ne node (0) torecover before lock mastery can begin > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >