Hi again, OS: Ubuntu 8.04 x64 Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC 2009 x86_64 GNU/Linux 10 Node Cluster OCFS2 Version: 1.3.9-0ubuntu1 I received this panic on the 5th Oct, I cannot work out why this has started to happen. Please please can you provide directions. Let me know if you require any further details or information. Oct 5 10:21:22 n1 kernel: [1006473.993681] (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: inode->i_generation != le32_to_cpu(fe->i_generation) Oct 5 10:21:22 n1 kernel: [1006473.993756] (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode 3064741 disk generation: 1309441612 inode->i_generation: 13 09441501 Oct 5 10:21:22 n1 kernel: [1006473.993865] ------------[ cut here ]------------ Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: 0000 [3] SMP Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules linked in: ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs iptable_filter ip_tables x_tables xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc parport_pc lp parport loop serio_raw psmouse i2c_piix4 i2c_core dcdbas evdev button k8temp shpchp pci_hotplug pcspkr ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic pata_acpi usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd libata scsi_mod usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, comm: R Tainted: G D 2.6.24-24-server #1 Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: 0018:ffff8101238f9d58 EFLAGS: 00010296 Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: 0000000000000093 RBX: ffff8102eaf03000 RCX: 00000000ffffffff Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff8058ffa4 Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: 0000000100080000 R08: 0000000000000000 R09: 00000000ffffffff Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: 0000000000000000 R11: 0000000000000000 R12: ffff81012599ee00 Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: ffff81012599ef08 R14: ffff81012599f2b8 R15: ffff81012599ef08 Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: 00002b3802fed670(0000) GS:ffff810418022c80(0000) knlGS:00000000f546bb90 Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: 00007f5db5542000 CR3: 0000000167ddf000 CR4: 00000000000006e0 Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: 1387, threadinfo ffff8101238f8000, task ffff8104110cc000) Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8 00000000802c34c0 Oct 5 10:21:22 n1 kernel: [1006473.995212] 0000000000000000 0000000100000000 ffffffff80680c00 00000000804715e2 Oct 5 10:21:22 n1 kernel: [1006473.995272] 0000000100000000 ffff8101238f9e48 ffff810245558b80 ffff81031e358680 Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: Oct 5 10:21:22 n1 kernel: [1006473.995380] [<ffffffff8857d03f>] :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 Oct 5 10:21:22 n1 kernel: [1006473.995427] [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0 Oct 5 10:21:22 n1 kernel: [1006473.995470] [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80 Oct 5 10:21:22 n1 kernel: [1006473.995514] [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 Oct 5 10:21:22 n1 kernel: [1006473.995553] [filldir+0x0/0xf0] filldir+0x0/0xf0 Oct 5 10:21:22 n1 kernel: [1006473.995594] [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230 Oct 5 10:21:22 n1 kernel: [1006473.995631] [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50 Oct 5 10:21:22 n1 kernel: [1006473.995664] [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0 Oct 5 10:21:22 n1 kernel: [1006473.995699] [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0 Oct 5 10:21:22 n1 kernel: [1006473.997568] [system_call+0x7e/0x83] system_call+0x7e/0x83 Oct 5 10:21:22 n1 kernel: [1006473.997605] Oct 5 10:21:22 n1 kernel: [1006473.997627] Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b eb fe 83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP [<ffffffff8856c404>] :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP <ffff8101238f9d58> Thanks Laurence -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20091007/16ad173b/attachment.html
Are you exporting this volume via nfs? We fixed a small race (in the nfs access path) that could lead to this oops. Laurence Mayer wrote:> Hi again, > > OS: Ubuntu 8.04 x64 > Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC 2009 > x86_64 GNU/Linux > 10 Node Cluster > OCFS2 Version: 1.3.9-0ubuntu1 > > I received this panic on the 5th Oct, I cannot work out why this has > started to happen. > Please please can you provide directions. > Let me know if you require any further details or information. > > Oct 5 10:21:22 n1 kernel: [1006473.993681] > (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: > inode->i_generation != le32_to_cpu(fe->i_generation) > Oct 5 10:21:22 n1 kernel: [1006473.993756] > (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode 3064741 > disk generation: 1309441612 inode->i_generation: 13 > 09441501 > Oct 5 10:21:22 n1 kernel: [1006473.993865] ------------[ cut here > ]------------ > Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at > /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! > Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: 0000 [3] SMP > Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 > Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules linked in: ocfs2 > crc32c libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf ipmi_si > ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs > iptable_filter ip_tables x_tables xfs ib_iser rdma_cm ib_cm iw_cm > ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi > nfs lockd nfs_acl sunrpc parport_pc lp parport loop serio_raw psmouse > i2c_piix4 i2c_core dcdbas evdev button k8temp shpchp pci_hotplug > pcspkr ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic pata_acpi > usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd libata > scsi_mod usbcore thermal processor fan fbcon tileblit font bitblit > softcursor fuse > Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, comm: R > Tainted: G D 2.6.24-24-server #1 > Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: > 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] > :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 > Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: > 0018:ffff8101238f9d58 EFLAGS: 00010296 > Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: 0000000000000093 RBX: > ffff8102eaf03000 RCX: 00000000ffffffff > Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: 00000000ffffffff RSI: > 0000000000000000 RDI: ffffffff8058ffa4 > Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: 0000000100080000 R08: > 0000000000000000 R09: 00000000ffffffff > Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: 0000000000000000 R11: > 0000000000000000 R12: ffff81012599ee00 > Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: ffff81012599ef08 R14: > ffff81012599f2b8 R15: ffff81012599ef08 > Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: > 00002b3802fed670(0000) GS:ffff810418022c80(0000) knlGS:00000000f546bb90 > Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 DS: 0000 ES: > 0000 CR0: 000000008005003b > Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: 00007f5db5542000 CR3: > 0000000167ddf000 CR4: 00000000000006e0 > Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: 1387, > threadinfo ffff8101238f8000, task ffff8104110cc000) > Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: 000000004e0c7e4c > ffff81044e0c7ddd ffff8101a3b4d2b8 00000000802c34c0 > Oct 5 10:21:22 n1 kernel: [1006473.995212] 0000000000000000 > 0000000100000000 ffffffff80680c00 00000000804715e2 > Oct 5 10:21:22 n1 kernel: [1006473.995272] 0000000100000000 > ffff8101238f9e48 ffff810245558b80 ffff81031e358680 > Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: > Oct 5 10:21:22 n1 kernel: [1006473.995380] [<ffffffff8857d03f>] > :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 > Oct 5 10:21:22 n1 kernel: [1006473.995427] [<ffffffff88577fe6>] > :ocfs2:ocfs2_getattr+0x56/0x1c0 > Oct 5 10:21:22 n1 kernel: [1006473.995470] [vfs_stat_fd+0x46/0x80] > vfs_stat_fd+0x46/0x80 > Oct 5 10:21:22 n1 kernel: [1006473.995514] [<ffffffff88569634>] > :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 > Oct 5 10:21:22 n1 kernel: [1006473.995553] [filldir+0x0/0xf0] > filldir+0x0/0xf0 > Oct 5 10:21:22 n1 kernel: [1006473.995594] [<ffffffff8856799e>] > :ocfs2:ocfs2_readdir+0xce/0x230 > Oct 5 10:21:22 n1 kernel: [1006473.995631] [sys_newstat+0x27/0x50] > sys_newstat+0x27/0x50 > Oct 5 10:21:22 n1 kernel: [1006473.995664] [vfs_readdir+0xa5/0xd0] > vfs_readdir+0xa5/0xd0 > Oct 5 10:21:22 n1 kernel: [1006473.995699] [sys_getdents+0xcf/0xe0] > sys_getdents+0xcf/0xe0 > Oct 5 10:21:22 n1 kernel: [1006473.997568] [system_call+0x7e/0x83] > system_call+0x7e/0x83 > Oct 5 10:21:22 n1 kernel: [1006473.997605] > Oct 5 10:21:22 n1 kernel: [1006473.997627] > Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b eb fe 83 fd fe > 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f > Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP [<ffffffff8856c404>] > :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 > Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP <ffff8101238f9d58> > > > Thanks > Laurence > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Yes. We have setup 10 node cluster, with one of the nodes exporting the NFS to the workstations. Please expand your answer. Thanks Laurence On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:> Are you exporting this volume via nfs? We fixed a small race (in the nfs > access path) that could lead to this oops. > > Laurence Mayer wrote: > >> Hi again, >> OS: Ubuntu 8.04 x64 >> Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC 2009 x86_64 >> GNU/Linux >> 10 Node Cluster >> OCFS2 Version: 1.3.9-0ubuntu1 >> I received this panic on the 5th Oct, I cannot work out why this has >> started to happen. >> Please please can you provide directions. >> Let me know if you require any further details or information. >> Oct 5 10:21:22 n1 kernel: [1006473.993681] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: >> inode->i_generation != le32_to_cpu(fe->i_generation) >> Oct 5 10:21:22 n1 kernel: [1006473.993756] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode 3064741 disk >> generation: 1309441612 inode->i_generation: 13 >> 09441501 >> Oct 5 10:21:22 n1 kernel: [1006473.993865] ------------[ cut here >> ]------------ >> Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at >> /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! >> Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: 0000 [3] SMP >> Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 >> Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules linked in: ocfs2 >> crc32c libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf ipmi_si >> ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs >> iptable_filter ip_tables x_tables xfs ib_iser rdma_cm ib_cm iw_cm ib_sa >> ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd >> nfs_acl sunrpc parport_pc lp parport loop serio_raw psmouse i2c_piix4 >> i2c_core dcdbas evdev button k8temp shpchp pci_hotplug pcspkr ext3 jbd >> mbcache sg sr_mod cdrom sd_mod ata_generic pata_acpi usbhid hid ehci_hcd tg3 >> sata_svw pata_serverworks ohci_hcd libata scsi_mod usbcore thermal processor >> fan fbcon tileblit font bitblit softcursor fuse >> Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, comm: R Tainted: G >> D 2.6.24-24-server #1 >> Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: 0010:[<ffffffff8856c404>] >> [<ffffffff8856c404>] :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: 0018:ffff8101238f9d58 >> EFLAGS: 00010296 >> Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: 0000000000000093 RBX: >> ffff8102eaf03000 RCX: 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: 00000000ffffffff RSI: >> 0000000000000000 RDI: ffffffff8058ffa4 >> Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: 0000000100080000 R08: >> 0000000000000000 R09: 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: 0000000000000000 R11: >> 0000000000000000 R12: ffff81012599ee00 >> Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: ffff81012599ef08 R14: >> ffff81012599f2b8 R15: ffff81012599ef08 >> Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: 00002b3802fed670(0000) >> GS:ffff810418022c80(0000) knlGS:00000000f546bb90 >> Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 DS: 0000 ES: 0000 >> CR0: 000000008005003b >> Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: 00007f5db5542000 CR3: >> 0000000167ddf000 CR4: 00000000000006e0 >> Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: 0000000000000000 DR1: >> 0000000000000000 DR2: 0000000000000000 >> Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: 0000000000000000 DR6: >> 00000000ffff0ff0 DR7: 0000000000000400 >> Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: 1387, >> threadinfo ffff8101238f8000, task ffff8104110cc000) >> Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: 000000004e0c7e4c >> ffff81044e0c7ddd ffff8101a3b4d2b8 00000000802c34c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995212] 0000000000000000 >> 0000000100000000 ffffffff80680c00 00000000804715e2 >> Oct 5 10:21:22 n1 kernel: [1006473.995272] 0000000100000000 >> ffff8101238f9e48 ffff810245558b80 ffff81031e358680 >> Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: >> Oct 5 10:21:22 n1 kernel: [1006473.995380] [<ffffffff8857d03f>] >> :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 >> Oct 5 10:21:22 n1 kernel: [1006473.995427] [<ffffffff88577fe6>] >> :ocfs2:ocfs2_getattr+0x56/0x1c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995470] [vfs_stat_fd+0x46/0x80] >> vfs_stat_fd+0x46/0x80 >> Oct 5 10:21:22 n1 kernel: [1006473.995514] [<ffffffff88569634>] >> :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 >> Oct 5 10:21:22 n1 kernel: [1006473.995553] [filldir+0x0/0xf0] >> filldir+0x0/0xf0 >> Oct 5 10:21:22 n1 kernel: [1006473.995594] [<ffffffff8856799e>] >> :ocfs2:ocfs2_readdir+0xce/0x230 >> Oct 5 10:21:22 n1 kernel: [1006473.995631] [sys_newstat+0x27/0x50] >> sys_newstat+0x27/0x50 >> Oct 5 10:21:22 n1 kernel: [1006473.995664] [vfs_readdir+0xa5/0xd0] >> vfs_readdir+0xa5/0xd0 >> Oct 5 10:21:22 n1 kernel: [1006473.995699] [sys_getdents+0xcf/0xe0] >> sys_getdents+0xcf/0xe0 >> Oct 5 10:21:22 n1 kernel: [1006473.997568] [system_call+0x7e/0x83] >> system_call+0x7e/0x83 >> Oct 5 10:21:22 n1 kernel: [1006473.997605] >> Oct 5 10:21:22 n1 kernel: [1006473.997627] >> Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b eb fe 83 fd fe 0f >> 84 73 fc ff ff 81 fd 00 fe ff ff 0f >> Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP [<ffffffff8856c404>] >> :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP <ffff8101238f9d58> >> Thanks >> Laurence >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20091007/a9768628/attachment.html
And does the node exporting the volume encounter the oops? If so, the likeliest candidate would be: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a If it is on another node, I am currently unsure whether a nfs export on one node could cause this to occur on another. Need more coffee. The problem in short is due to how nfs bypasses the normal fs lookup to access files. It uses the file handle to directly access the inode, bypassing the locking. Normally that is not a problem. The race window is if the file is deleted (on any node in the cluster) and nfs reads that inode without the lock. In the oops we see the disk generation is greater than the in-memory inode generation. That means the inode was deleted and reused. The fix closes the race window. Sunil Laurence Mayer wrote:> Yes. > We have setup 10 node cluster, with one of the nodes exporting the NFS > to the workstations. > > Please expand your answer. > > Thanks > Laurence > > > > On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran > <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>> wrote: > > Are you exporting this volume via nfs? We fixed a small race (in > the nfs > access path) that could lead to this oops. > > Laurence Mayer wrote: > > Hi again, > OS: Ubuntu 8.04 x64 > Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC > 2009 x86_64 GNU/Linux > 10 Node Cluster > OCFS2 Version: 1.3.9-0ubuntu1 > I received this panic on the 5th Oct, I cannot work out why > this has started to happen. > Please please can you provide directions. > Let me know if you require any further details or information. > Oct 5 10:21:22 n1 kernel: [1006473.993681] > (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: > inode->i_generation != le32_to_cpu(fe->i_generation) > Oct 5 10:21:22 n1 kernel: [1006473.993756] > (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode > 3064741 disk generation: 1309441612 inode->i_generation: 13 > 09441501 > Oct 5 10:21:22 n1 kernel: [1006473.993865] ------------[ cut > here ]------------ > Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at > /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! > Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: > 0000 [3] SMP > Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 > Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules linked in: > ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf > ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm > ocfs2_nodemanager configfs iptable_filter ip_tables x_tables > xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr > iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl > sunrpc parport_pc lp parport loop serio_raw psmouse i2c_piix4 > i2c_core dcdbas evdev button k8temp shpchp pci_hotplug pcspkr > ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic pata_acpi > usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd > libata scsi_mod usbcore thermal processor fan fbcon tileblit > font bitblit softcursor fuse > Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, comm: R > Tainted: G D 2.6.24-24-server #1 > Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: > 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] > :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 > Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: > 0018:ffff8101238f9d58 EFLAGS: 00010296 > Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: > 0000000000000093 RBX: ffff8102eaf03000 RCX: 00000000ffffffff > Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: > 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff8058ffa4 > Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: > 0000000100080000 R08: 0000000000000000 R09: 00000000ffffffff > Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: > 0000000000000000 R11: 0000000000000000 R12: ffff81012599ee00 > Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: > ffff81012599ef08 R14: ffff81012599f2b8 R15: ffff81012599ef08 > Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: > 00002b3802fed670(0000) GS:ffff810418022c80(0000) > knlGS:00000000f546bb90 > Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 DS: 0000 > ES: 0000 CR0: 000000008005003b > Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: > 00007f5db5542000 CR3: 0000000167ddf000 CR4: 00000000000006e0 > Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: > 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: > 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: > 1387, threadinfo ffff8101238f8000, task ffff8104110cc000) > Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: > 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8 > 00000000802c34c0 > Oct 5 10:21:22 n1 kernel: [1006473.995212] 0000000000000000 > 0000000100000000 ffffffff80680c00 00000000804715e2 > Oct 5 10:21:22 n1 kernel: [1006473.995272] 0000000100000000 > ffff8101238f9e48 ffff810245558b80 ffff81031e358680 > Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: > Oct 5 10:21:22 n1 kernel: [1006473.995380] > [<ffffffff8857d03f>] :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 > Oct 5 10:21:22 n1 kernel: [1006473.995427] > [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0 > Oct 5 10:21:22 n1 kernel: [1006473.995470] > [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80 > Oct 5 10:21:22 n1 kernel: [1006473.995514] > [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 > Oct 5 10:21:22 n1 kernel: [1006473.995553] > [filldir+0x0/0xf0] filldir+0x0/0xf0 > Oct 5 10:21:22 n1 kernel: [1006473.995594] > [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230 > Oct 5 10:21:22 n1 kernel: [1006473.995631] > [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50 > Oct 5 10:21:22 n1 kernel: [1006473.995664] > [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0 > Oct 5 10:21:22 n1 kernel: [1006473.995699] > [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0 > Oct 5 10:21:22 n1 kernel: [1006473.997568] > [system_call+0x7e/0x83] system_call+0x7e/0x83 > Oct 5 10:21:22 n1 kernel: [1006473.997605] > Oct 5 10:21:22 n1 kernel: [1006473.997627] > Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b eb fe > 83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f > Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP > [<ffffffff8856c404>] :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 > Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP > <ffff8101238f9d58> > Thanks > Laurence > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com> > http://oss.oracle.com/mailman/listinfo/ocfs2-users > > >
Nope, the node that crashed is not the NFS server. How should I proceed? What do you suggest? Could this happen again? On Wed, Oct 7, 2009 at 8:16 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:> And does the node exporting the volume encounter the oops? > > If so, the likeliest candidate would be: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a > > If it is on another node, I am currently unsure whether a nfs > export on one node could cause this to occur on another. Need more > coffee. > > The problem in short is due to how nfs bypasses the normal fs lookup > to access files. It uses the file handle to directly access the inode, > bypassing the locking. Normally that is not a problem. The race window > is if the file is deleted (on any node in the cluster) and nfs reads that > inode without the lock. In the oops we see the disk generation is greater > than the in-memory inode generation. That means the inode was deleted and > reused. The fix closes the race window. > > Sunil > > Laurence Mayer wrote: > >> Yes. >> We have setup 10 node cluster, with one of the nodes exporting the NFS to >> the workstations. >> Please expand your answer. >> Thanks >> Laurence >> >> >> On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran <sunil.mushran at oracle.com<mailto: >> sunil.mushran at oracle.com>> wrote: >> >> Are you exporting this volume via nfs? We fixed a small race (in >> the nfs >> access path) that could lead to this oops. >> >> Laurence Mayer wrote: >> >> Hi again, >> OS: Ubuntu 8.04 x64 >> Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC >> 2009 x86_64 GNU/Linux >> 10 Node Cluster >> OCFS2 Version: 1.3.9-0ubuntu1 >> I received this panic on the 5th Oct, I cannot work out why >> this has started to happen. >> Please please can you provide directions. >> Let me know if you require any further details or information. >> Oct 5 10:21:22 n1 kernel: [1006473.993681] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: >> inode->i_generation != le32_to_cpu(fe->i_generation) >> Oct 5 10:21:22 n1 kernel: [1006473.993756] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode >> 3064741 disk generation: 1309441612 inode->i_generation: 13 >> 09441501 >> Oct 5 10:21:22 n1 kernel: [1006473.993865] ------------[ cut >> here ]------------ >> Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at >> /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! >> Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: >> 0000 [3] SMP >> Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 >> Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules linked in: >> ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs ipmi_devintf >> ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm >> ocfs2_nodemanager configfs iptable_filter ip_tables x_tables >> xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr >> iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl >> sunrpc parport_pc lp parport loop serio_raw psmouse i2c_piix4 >> i2c_core dcdbas evdev button k8temp shpchp pci_hotplug pcspkr >> ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic pata_acpi >> usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd >> libata scsi_mod usbcore thermal processor fan fbcon tileblit >> font bitblit softcursor fuse >> Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, comm: R >> Tainted: G D 2.6.24-24-server #1 >> Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: >> 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] >> :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: >> 0018:ffff8101238f9d58 EFLAGS: 00010296 >> Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: >> 0000000000000093 RBX: ffff8102eaf03000 RCX: 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: >> 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff8058ffa4 >> Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: >> 0000000100080000 R08: 0000000000000000 R09: 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: >> 0000000000000000 R11: 0000000000000000 R12: ffff81012599ee00 >> Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: >> ffff81012599ef08 R14: ffff81012599f2b8 R15: ffff81012599ef08 >> Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: >> 00002b3802fed670(0000) GS:ffff810418022c80(0000) >> knlGS:00000000f546bb90 >> Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 DS: 0000 >> ES: 0000 CR0: 000000008005003b >> Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: >> 00007f5db5542000 CR3: 0000000167ddf000 CR4: 00000000000006e0 >> Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: >> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: >> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: >> 1387, threadinfo ffff8101238f8000, task ffff8104110cc000) >> Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: >> 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8 >> 00000000802c34c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995212] 0000000000000000 >> 0000000100000000 ffffffff80680c00 00000000804715e2 >> Oct 5 10:21:22 n1 kernel: [1006473.995272] 0000000100000000 >> ffff8101238f9e48 ffff810245558b80 ffff81031e358680 >> Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: >> Oct 5 10:21:22 n1 kernel: [1006473.995380] >> [<ffffffff8857d03f>] :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 >> Oct 5 10:21:22 n1 kernel: [1006473.995427] >> [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995470] >> [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80 >> Oct 5 10:21:22 n1 kernel: [1006473.995514] >> [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 >> Oct 5 10:21:22 n1 kernel: [1006473.995553] >> [filldir+0x0/0xf0] filldir+0x0/0xf0 >> Oct 5 10:21:22 n1 kernel: [1006473.995594] >> [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230 >> Oct 5 10:21:22 n1 kernel: [1006473.995631] >> [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50 >> Oct 5 10:21:22 n1 kernel: [1006473.995664] >> [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0 >> Oct 5 10:21:22 n1 kernel: [1006473.995699] >> [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0 >> Oct 5 10:21:22 n1 kernel: [1006473.997568] >> [system_call+0x7e/0x83] system_call+0x7e/0x83 >> Oct 5 10:21:22 n1 kernel: [1006473.997605] >> Oct 5 10:21:22 n1 kernel: [1006473.997627] >> Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b eb fe >> 83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f >> Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP >> [<ffffffff8856c404>] :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP >> <ffff8101238f9d58> >> Thanks >> Laurence >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20091007/a74ba8e2/attachment.html
It could be the stale inode info was propagated by the nfs node to the oopsing node via the lvb. But I am not sure about that. In any event, applying the fix would be a step forward. The fix has been in mainline for quite sometime now. Laurence Mayer wrote:> Nope, the node that crashed is not the NFS server. > > How should I proceed? > > What do you suggest? > > Could this happen again? > > On Wed, Oct 7, 2009 at 8:16 PM, Sunil Mushran > <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>> wrote: > > And does the node exporting the volume encounter the oops? > > If so, the likeliest candidate would be: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a > > If it is on another node, I am currently unsure whether a nfs > export on one node could cause this to occur on another. Need more > coffee. > > The problem in short is due to how nfs bypasses the normal fs lookup > to access files. It uses the file handle to directly access the inode, > bypassing the locking. Normally that is not a problem. The race window > is if the file is deleted (on any node in the cluster) and nfs > reads that > inode without the lock. In the oops we see the disk generation is > greater > than the in-memory inode generation. That means the inode was > deleted and > reused. The fix closes the race window. > > Sunil > > Laurence Mayer wrote: > > Yes. > We have setup 10 node cluster, with one of the nodes exporting > the NFS to the workstations. > Please expand your answer. > Thanks > Laurence > > > On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran > <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com> > <mailto:sunil.mushran at oracle.com > <mailto:sunil.mushran at oracle.com>>> wrote: > > Are you exporting this volume via nfs? We fixed a small > race (in > the nfs > access path) that could lead to this oops. > > Laurence Mayer wrote: > > Hi again, > OS: Ubuntu 8.04 x64 > Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 > 19:39:36 UTC > 2009 x86_64 GNU/Linux > 10 Node Cluster > OCFS2 Version: 1.3.9-0ubuntu1 > I received this panic on the 5th Oct, I cannot work > out why > this has started to happen. > Please please can you provide directions. > Let me know if you require any further details or > information. > Oct 5 10:21:22 n1 kernel: [1006473.993681] > (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: > inode->i_generation != le32_to_cpu(fe->i_generation) > Oct 5 10:21:22 n1 kernel: [1006473.993756] > (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode > 3064741 disk generation: 1309441612 inode->i_generation: 13 > 09441501 > Oct 5 10:21:22 n1 kernel: [1006473.993865] > ------------[ cut > here ]------------ > Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at > /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! > Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: > 0000 [3] SMP > Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 > Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules > linked in: > ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs > ipmi_devintf > ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm > ocfs2_nodemanager configfs iptable_filter ip_tables > x_tables > xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core > ib_addr > iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl > sunrpc parport_pc lp parport loop serio_raw psmouse > i2c_piix4 > i2c_core dcdbas evdev button k8temp shpchp pci_hotplug > pcspkr > ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic > pata_acpi > usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd > libata scsi_mod usbcore thermal processor fan fbcon > tileblit > font bitblit softcursor fuse > Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, > comm: R > Tainted: G D 2.6.24-24-server #1 > Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: > 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] > :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 > Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: > 0018:ffff8101238f9d58 EFLAGS: 00010296 > Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: > 0000000000000093 RBX: ffff8102eaf03000 RCX: > 00000000ffffffff > Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: > 00000000ffffffff RSI: 0000000000000000 RDI: > ffffffff8058ffa4 > Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: > 0000000100080000 R08: 0000000000000000 R09: > 00000000ffffffff > Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: > 0000000000000000 R11: 0000000000000000 R12: > ffff81012599ee00 > Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: > ffff81012599ef08 R14: ffff81012599f2b8 R15: > ffff81012599ef08 > Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: > 00002b3802fed670(0000) GS:ffff810418022c80(0000) > knlGS:00000000f546bb90 > Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 > DS: 0000 > ES: 0000 CR0: 000000008005003b > Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: > 00007f5db5542000 CR3: 0000000167ddf000 CR4: > 00000000000006e0 > Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: > 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: > 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: > 1387, threadinfo ffff8101238f8000, task ffff8104110cc000) > Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: > 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8 > 00000000802c34c0 > Oct 5 10:21:22 n1 kernel: [1006473.995212] > 0000000000000000 > 0000000100000000 ffffffff80680c00 00000000804715e2 > Oct 5 10:21:22 n1 kernel: [1006473.995272] > 0000000100000000 > ffff8101238f9e48 ffff810245558b80 ffff81031e358680 > Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: > Oct 5 10:21:22 n1 kernel: [1006473.995380] > [<ffffffff8857d03f>] > :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 > Oct 5 10:21:22 n1 kernel: [1006473.995427] > [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0 > Oct 5 10:21:22 n1 kernel: [1006473.995470] > [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80 > Oct 5 10:21:22 n1 kernel: [1006473.995514] > [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 > Oct 5 10:21:22 n1 kernel: [1006473.995553] > [filldir+0x0/0xf0] filldir+0x0/0xf0 > Oct 5 10:21:22 n1 kernel: [1006473.995594] > [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230 > Oct 5 10:21:22 n1 kernel: [1006473.995631] > [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50 > Oct 5 10:21:22 n1 kernel: [1006473.995664] > [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0 > Oct 5 10:21:22 n1 kernel: [1006473.995699] > [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0 > Oct 5 10:21:22 n1 kernel: [1006473.997568] > [system_call+0x7e/0x83] system_call+0x7e/0x83 > Oct 5 10:21:22 n1 kernel: [1006473.997605] > Oct 5 10:21:22 n1 kernel: [1006473.997627] > Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b > eb fe > 83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f > Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP > [<ffffffff8856c404>] > :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 > Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP > <ffff8101238f9d58> > Thanks > Laurence > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > <mailto:Ocfs2-users at oss.oracle.com> > <mailto:Ocfs2-users at oss.oracle.com > <mailto:Ocfs2-users at oss.oracle.com>> > > http://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > >
Yet another Panic again today: Oct 8 12:36:00 n9 kernel: [79230.175890] Unable to handle kernel NULL pointer dereference at 0000000000000258 RIP: Oct 8 12:36:00 n9 kernel: [79230.175917] [<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20 Oct 8 12:36:00 n9 kernel: [79230.176023] PGD 3d08c5067 PUD 331112067 PMD 0 Oct 8 12:36:00 n9 kernel: [79230.176059] Oops: 0000 [1] SMP Oct 8 12:36:00 n9 kernel: [79230.176091] CPU 3 Oct 8 12:36:00 n9 kernel: [79230.176117] Modules linked in: nfs lockd nfs_acl sunrpc ocfs2 crc32c libcrc32c ipmi_devintf ipmi_si ipmi_msghandler ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs iptabl e_filter ip_tables x_tables xfs ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi parport_pc lp parport loop i2c_piix4 dcdbas i2c_core psmouse button shpchp pci_hotplug k8temp serio_raw pcspkr evdev ext3 jbd mbcache sr_mod cdrom sg sd_mod pata_serverworks usbhid hid ata_generic tg3 ehci_hcd pata_acpi sata_svw ohci_hcd libata scsi_mod usbcore therma l processor fan fbcon tileblit font bitblit softcursor fuse Oct 8 12:36:00 n9 kernel: [79230.176537] Pid: 4915, comm: o2net Not tainted 2.6.24-24-server #1 Oct 8 12:36:00 n9 kernel: [79230.176571] RIP: 0010:[<ffffffff88473a7e>] [<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20 Oct 8 12:36:00 n9 kernel: [79230.176636] RSP: 0000:ffff8104119b3ca8 EFLAGS: 00010282 Oct 8 12:36:00 n9 kernel: [79230.176667] RAX: 0000000000000000 RBX: ffff8103def84018 RCX: 0000000000000005 Oct 8 12:36:00 n9 kernel: [79230.176703] RDX: ffff8103def83100 RSI: 0000000000000005 RDI: ffff8103def84018 Oct 8 12:36:00 n9 kernel: [79230.176738] RBP: ffff8103def84400 R08: ffff8103def84400 R09: ffff8103dee43a00 Oct 8 12:36:00 n9 kernel: [79230.176774] R10: 000000000000004e R11: ffffffff8847b580 R12: 0900000000007aa4 Oct 8 12:36:00 n9 kernel: [79230.176809] R13: 0000000000000005 R14: 0000000000000000 R15: 000000000000001f Oct 8 12:36:00 n9 kernel: [79230.176845] FS: 00002ad989b79670(0000) GS:ffff810416d4ac80(0000) knlGS:00000000f5420b90 Oct 8 12:36:00 n9 kernel: [79230.176899] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Oct 8 12:36:00 n9 kernel: [79230.176931] CR2: 0000000000000258 CR3: 0000000370517000 CR4: 00000000000006e0 Oct 8 12:36:00 n9 kernel: [79230.176966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 8 12:36:00 n9 kernel: [79230.177002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 8 12:36:00 n9 kernel: [79230.177037] Process o2net (pid: 4915, threadinfo ffff8104119b2000, task ffff8104115247f0) Oct 8 12:36:00 n9 kernel: [79230.177092] Stack: ffffffff8847b5a6 ffff810411440400 00000000161974a2 ffff8104114c1028 Oct 8 12:36:00 n9 kernel: [79230.177155] 0000000000000000 ffff8103def84400 0900000000007aa4 ffff8104114c1018 Oct 8 12:36:00 n9 kernel: [79230.177215] 0000000000000000 000000000000001f ffffffff8840bef4 000000000000012c Oct 8 12:36:00 n9 kernel: [79230.177256] Call Trace: Oct 8 12:36:00 n9 kernel: [79230.177312] [<ffffffff8847b5a6>] :ocfs2:ocfs2_blocking_ast+0x26/0x310 Oct 8 12:36:00 n9 kernel: [79230.177366] [ocfs2_dlm:dlm_proxy_ast_handler+0x824/0x830] :ocfs2_dlm:dlm_proxy_ast_handler+0x824/0x830 Oct 8 12:36:00 n9 kernel: [79230.177427] [ocfs2_nodemanager:do_gettimeofday+0x2f/0x2fb90] do_gettimeofday+0x2f/0xc0 Oct 8 12:36:00 n9 kernel: [79230.177481] [ocfs2_nodemanager:o2net_process_message+0x4cc/0x5b0] :ocfs2_nodemanager:o2net_process_message+0x4cc/0x5b0 Oct 8 12:36:00 n9 kernel: [79230.177540] [__dequeue_entity+0x3d/0x50] __dequeue_entity+0x3d/0x50 Oct 8 12:36:00 n9 kernel: [79230.177580] [ocfs2_nodemanager:o2net_recv_tcp_msg+0x65/0x80] :ocfs2_nodemanager:o2net_recv_tcp_msg+0x65/0x80 Oct 8 12:36:00 n9 kernel: [79230.177643] [ocfs2_nodemanager:o2net_rx_until_empty+0x38b/0x900] :ocfs2_nodemanager:o2net_rx_until_empty+0x38b/0x900 Oct 8 12:36:00 n9 kernel: [79230.177707] [ocfs2_nodemanager:o2net_rx_until_empty+0x0/0x900] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0x900 Oct 8 12:36:00 n9 kernel: [79230.177765] [run_workqueue+0xcc/0x170] run_workqueue+0xcc/0x170 Oct 8 12:36:00 n9 kernel: [79230.177799] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.177832] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.177865] [worker_thread+0xa3/0x110] worker_thread+0xa3/0x110 Oct 8 12:36:00 n9 kernel: [79230.177899] [<ffffffff80254510>] autoremove_wake_function+0x0/0x30 Oct 8 12:36:00 n9 kernel: [79230.177935] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.177969] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.178001] [kthread+0x4b/0x80] kthread+0x4b/0x80 Oct 8 12:36:00 n9 kernel: [79230.178036] [child_rip+0xa/0x12] child_rip+0xa/0x12 Oct 8 12:36:00 n9 kernel: [79230.177969] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 Oct 8 12:36:00 n9 kernel: [79230.178001] [kthread+0x4b/0x80] kthread+0x4b/0x80 Oct 8 12:36:00 n9 kernel: [79230.178036] [child_rip+0xa/0x12] child_rip+0xa/0x12 Oct 8 12:36:00 n9 kernel: [79230.178073] [kthread+0x0/0x80] kthread+0x0/0x80 Oct 8 12:36:00 n9 kernel: [79230.178104] [child_rip+0x0/0x12] child_rip+0x0/0x12 Oct 8 12:36:00 n9 kernel: [79230.179971] Oct 8 12:36:00 n9 kernel: [79230.179993] Oct 8 12:36:00 n9 kernel: [79230.179993] Code: 48 8b 80 58 02 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 8b 47 Oct 8 12:36:00 n9 kernel: [79230.180111] RIP [<ffffffff88473a7e>] :ocfs2:ocfs2_get_dentry_osb+0xe/0x20 Oct 8 12:36:00 n9 kernel: [79230.180156] RSP <ffff8104119b3ca8> Oct 8 12:36:00 n9 kernel: [79230.180183] CR2: 0000000000000258 Oct 8 12:36:00 n9 kernel: [79230.180566] ---[ end trace ae9a4fee19ded66d ]--- : On Wed, Oct 7, 2009 at 8:31 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:> It could be the stale inode info was propagated by the nfs node > to the oopsing node via the lvb. But I am not sure about that. > > In any event, applying the fix would be a step forward. The fix > has been in mainline for quite sometime now. > > Laurence Mayer wrote: > >> Nope, the node that crashed is not the NFS server. >> How should I proceed? >> What do you suggest? >> Could this happen again? >> >> On Wed, Oct 7, 2009 at 8:16 PM, Sunil Mushran <sunil.mushran at oracle.com<mailto: >> sunil.mushran at oracle.com>> wrote: >> >> And does the node exporting the volume encounter the oops? >> >> If so, the likeliest candidate would be: >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ca497a83e592d64e050c4d04b6dedb8c915f39a >> >> If it is on another node, I am currently unsure whether a nfs >> export on one node could cause this to occur on another. Need more >> coffee. >> >> The problem in short is due to how nfs bypasses the normal fs lookup >> to access files. It uses the file handle to directly access the inode, >> bypassing the locking. Normally that is not a problem. The race window >> is if the file is deleted (on any node in the cluster) and nfs >> reads that >> inode without the lock. In the oops we see the disk generation is >> greater >> than the in-memory inode generation. That means the inode was >> deleted and >> reused. The fix closes the race window. >> >> Sunil >> >> Laurence Mayer wrote: >> >> Yes. >> We have setup 10 node cluster, with one of the nodes exporting >> the NFS to the workstations. >> Please expand your answer. >> Thanks >> Laurence >> >> >> On Wed, Oct 7, 2009 at 7:12 PM, Sunil Mushran >> <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com> >> <mailto:sunil.mushran at oracle.com >> <mailto:sunil.mushran at oracle.com>>> wrote: >> >> Are you exporting this volume via nfs? We fixed a small >> race (in >> the nfs >> access path) that could lead to this oops. >> >> Laurence Mayer wrote: >> >> Hi again, >> OS: Ubuntu 8.04 x64 >> Kern: Linux n1 2.6.24-24-server #1 SMP Tue Jul 7 >> 19:39:36 UTC >> 2009 x86_64 GNU/Linux >> 10 Node Cluster >> OCFS2 Version: 1.3.9-0ubuntu1 >> I received this panic on the 5th Oct, I cannot work >> out why >> this has started to happen. >> Please please can you provide directions. >> Let me know if you require any further details or >> information. >> Oct 5 10:21:22 n1 kernel: [1006473.993681] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: bug expression: >> inode->i_generation != le32_to_cpu(fe->i_generation) >> Oct 5 10:21:22 n1 kernel: [1006473.993756] >> (1387,3):ocfs2_meta_lock_update:1675 ERROR: Invalid dinode >> 3064741 disk generation: 1309441612 inode->i_generation: 13 >> 09441501 >> Oct 5 10:21:22 n1 kernel: [1006473.993865] >> ------------[ cut >> here ]------------ >> Oct 5 10:21:22 n1 kernel: [1006473.993896] kernel BUG at >> /build/buildd/linux-2.6.24/fs/ocfs2/dlmglue.c:1675! >> Oct 5 10:21:22 n1 kernel: [1006473.993949] invalid opcode: >> 0000 [3] SMP >> Oct 5 10:21:22 n1 kernel: [1006473.993982] CPU 3 >> Oct 5 10:21:22 n1 kernel: [1006473.994008] Modules >> linked in: >> ocfs2 crc32c libcrc32c nfsd auth_rpcgss exportfs >> ipmi_devintf >> ipmi_si ipmi_msghandler ipv6 ocfs2_dlmfs ocfs2_dlm >> ocfs2_nodemanager configfs iptable_filter ip_tables >> x_tables >> xfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core >> ib_addr >> iscsi_tcp libiscsi scsi_transport_iscsi nfs lockd nfs_acl >> sunrpc parport_pc lp parport loop serio_raw psmouse >> i2c_piix4 >> i2c_core dcdbas evdev button k8temp shpchp pci_hotplug >> pcspkr >> ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_generic >> pata_acpi >> usbhid hid ehci_hcd tg3 sata_svw pata_serverworks ohci_hcd >> libata scsi_mod usbcore thermal processor fan fbcon >> tileblit >> font bitblit softcursor fuse >> Oct 5 10:21:22 n1 kernel: [1006473.994445] Pid: 1387, >> comm: R >> Tainted: G D 2.6.24-24-server #1 >> Oct 5 10:21:22 n1 kernel: [1006473.994479] RIP: >> 0010:[<ffffffff8856c404>] [<ffffffff8856c404>] >> :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.994558] RSP: >> 0018:ffff8101238f9d58 EFLAGS: 00010296 >> Oct 5 10:21:22 n1 kernel: [1006473.994590] RAX: >> 0000000000000093 RBX: ffff8102eaf03000 RCX: >> 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994642] RDX: >> 00000000ffffffff RSI: 0000000000000000 RDI: >> ffffffff8058ffa4 >> Oct 5 10:21:22 n1 kernel: [1006473.994694] RBP: >> 0000000100080000 R08: 0000000000000000 R09: >> 00000000ffffffff >> Oct 5 10:21:22 n1 kernel: [1006473.994746] R10: >> 0000000000000000 R11: 0000000000000000 R12: >> ffff81012599ee00 >> Oct 5 10:21:22 n1 kernel: [1006473.994799] R13: >> ffff81012599ef08 R14: ffff81012599f2b8 R15: >> ffff81012599ef08 >> Oct 5 10:21:22 n1 kernel: [1006473.994851] FS: >> 00002b3802fed670(0000) GS:ffff810418022c80(0000) >> knlGS:00000000f546bb90 >> Oct 5 10:21:22 n1 kernel: [1006473.994906] CS: 0010 >> DS: 0000 >> ES: 0000 CR0: 000000008005003b >> Oct 5 10:21:22 n1 kernel: [1006473.994938] CR2: >> 00007f5db5542000 CR3: 0000000167ddf000 CR4: >> 00000000000006e0 >> Oct 5 10:21:22 n1 kernel: [1006473.994990] DR0: >> 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> Oct 5 10:21:22 n1 kernel: [1006473.995042] DR3: >> 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> Oct 5 10:21:22 n1 kernel: [1006473.995095] Process R (pid: >> 1387, threadinfo ffff8101238f8000, task ffff8104110cc000) >> Oct 5 10:21:22 n1 kernel: [1006473.995148] Stack: >> 000000004e0c7e4c ffff81044e0c7ddd ffff8101a3b4d2b8 >> 00000000802c34c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995212] >> 0000000000000000 >> 0000000100000000 ffffffff80680c00 00000000804715e2 >> Oct 5 10:21:22 n1 kernel: [1006473.995272] >> 0000000100000000 >> ffff8101238f9e48 ffff810245558b80 ffff81031e358680 >> Oct 5 10:21:22 n1 kernel: [1006473.995313] Call Trace: >> Oct 5 10:21:22 n1 kernel: [1006473.995380] >> [<ffffffff8857d03f>] >> :ocfs2:ocfs2_inode_revalidate+0x5f/0x290 >> Oct 5 10:21:22 n1 kernel: [1006473.995427] >> [<ffffffff88577fe6>] :ocfs2:ocfs2_getattr+0x56/0x1c0 >> Oct 5 10:21:22 n1 kernel: [1006473.995470] >> [vfs_stat_fd+0x46/0x80] vfs_stat_fd+0x46/0x80 >> Oct 5 10:21:22 n1 kernel: [1006473.995514] >> [<ffffffff88569634>] :ocfs2:ocfs2_meta_unlock+0x1b4/0x210 >> Oct 5 10:21:22 n1 kernel: [1006473.995553] >> [filldir+0x0/0xf0] filldir+0x0/0xf0 >> Oct 5 10:21:22 n1 kernel: [1006473.995594] >> [<ffffffff8856799e>] :ocfs2:ocfs2_readdir+0xce/0x230 >> Oct 5 10:21:22 n1 kernel: [1006473.995631] >> [sys_newstat+0x27/0x50] sys_newstat+0x27/0x50 >> Oct 5 10:21:22 n1 kernel: [1006473.995664] >> [vfs_readdir+0xa5/0xd0] vfs_readdir+0xa5/0xd0 >> Oct 5 10:21:22 n1 kernel: [1006473.995699] >> [sys_getdents+0xcf/0xe0] sys_getdents+0xcf/0xe0 >> Oct 5 10:21:22 n1 kernel: [1006473.997568] >> [system_call+0x7e/0x83] system_call+0x7e/0x83 >> Oct 5 10:21:22 n1 kernel: [1006473.997605] >> Oct 5 10:21:22 n1 kernel: [1006473.997627] >> Oct 5 10:21:22 n1 kernel: [1006473.997628] Code: 0f 0b >> eb fe >> 83 fd fe 0f 84 73 fc ff ff 81 fd 00 fe ff ff 0f >> Oct 5 10:21:22 n1 kernel: [1006473.997745] RIP >> [<ffffffff8856c404>] >> :ocfs2:ocfs2_meta_lock_full+0x6a4/0xec0 >> Oct 5 10:21:22 n1 kernel: [1006473.997808] RSP >> <ffff8101238f9d58> >> Thanks >> Laurence >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> <mailto:Ocfs2-users at oss.oracle.com> >> <mailto:Ocfs2-users at oss.oracle.com >> <mailto:Ocfs2-users at oss.oracle.com>> >> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> >> >> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20091008/fdd104e6/attachment-0001.html