Hello, I got a crashed server when using ocfs2 on SLES10 with Kernel 2.6.16.60-0.42.8-smp I know, you'll tell me to ask Novell for support, but I want to make sure this is not some bug that also exists in the main-line. The only information I got is in syslog directly before the server crashed. Maybe you have any hint, who could make use of this information. Regards Georg Sep 21 09:10:12 host2 kernel: ----------- [cut here ] --------- [please bite here ] --------- Sep 21 09:10:12 host2 kernel: Kernel BUG at fs/jbd/transaction.c:1114 Sep 21 09:10:12 host2 kernel: invalid opcode: 0000 [1] SMP Sep 21 09:10:12 host2 kernel: last sysfs file: /devices/pci0000:00/0000:00:03.0/0000:01:00.0/0000:02:0e.0/host0/target0:2:0/0:2:0:0/rev Sep 21 09:10:12 host2 kernel: CPU 2 Sep 21 09:10:12 host2 kernel: Modules linked in: joydev af_packet ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs nfsd exportfs nfs lockd nfs_acl sunrpc bonding ipv6 apparmor loop sr_mod cdrom usb_storage dm_round_robin usbhid dm_emc dm_multipath dm_mod uhci_hcd ehci_hcd hw_random shpchp usbcore pci_hotplug bnx2 ext3 jbd qla2xxx firmware_class scsi_transport_fc sg megaraid_sas edd processor siimage megaraid_mbox megaraid_mm sd_mod scsi_mod ide_disk ide_core Sep 21 09:10:12 host2 kernel: Pid: 10412, comm: nfsd Not tainted 2.6.16.60-0.42.8-smp #1 Sep 21 09:10:12 host2 kernel: RIP: 0010:[<ffffffff8810b1e0>] <ffffffff8810b1e0>{:jbd:journal_dirty_metadata+200} Sep 21 09:10:12 host2 kernel: RSP: 0018:ffff81012283dc18 EFLAGS: 00010292 Sep 21 09:10:12 host2 kernel: RAX: 000000000000006e RBX: ffff8101210fd880 RCX: ffffffff8035b9e8 Sep 21 09:10:12 host2 kernel: RDX: ffffffff8035b9e8 RSI: 0000000000000296 RDI: ffffffff8035b9e0 Sep 21 09:10:12 host2 kernel: RBP: ffff8100aecc4818 R08: ffffffff8035b9e8 R09: ffff810127b21600 Sep 21 09:10:12 host2 kernel: R10: ffff81000103b780 R11: 0000000000000292 R12: ffff81006db33660 Sep 21 09:10:12 host2 kernel: R13: ffff810019491e50 R14: ffff810127cbcc00 R15: ffff8100b5ec2c8c Sep 21 09:10:12 host2 kernel: FS: 00002b2e914eb6d0(0000) GS:ffff81012bd6d740(0000) knlGS:0000000000000000 Sep 21 09:10:12 host2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 21 09:10:12 host2 kernel: CR2: 00002b483db48000 CR3: 0000000106c87000 CR4: 00000000000006e0 Sep 21 09:10:12 host2 kernel: Process nfsd (pid: 10412, threadinfo ffff81012283c000, task ffff81012b0ae820) Sep 21 09:10:12 host2 kernel: Stack: ffff8100aecc4818 ffff81006db33660 0000000000000000 0000000000000003 Sep 21 09:10:12 host2 kernel: ffff8101174c1000 ffffffff88463612 0000000000000000 ffff8101174c13b0 Sep 21 09:10:12 host2 kernel: ffff8100bfe5f6a0 ffffffff8844c441 Sep 21 09:10:12 host2 kernel: Call Trace: <ffffffff88463612>{:ocfs2:ocfs2_journal_dirty+106} Sep 21 09:10:12 host2 kernel: <ffffffff8844c441>{:ocfs2:__ocfs2_add_entry+745} <ffffffff8846a671>{:ocfs2:ocfs2_mknod+1710} Sep 21 09:10:12 host2 kernel: <ffffffff8846a950>{:ocfs2:ocfs2_mkdir+127} <ffffffff80193435>{vfs_mkdir+346} Sep 21 09:10:12 host2 kernel: <ffffffff88373ef5>{:nfsd:nfsd_create+753} <ffffffff8837ac63>{:nfsd:nfsd3_proc_mkdir+217} Sep 21 09:10:12 host2 kernel: <ffffffff8836f0ea>{:nfsd:nfsd_dispatch+216} <ffffffff882e2813>{:sunrpc:svc_process+982} Sep 21 09:10:12 host2 kernel: <ffffffff802eb816>{__down_read+21} <ffffffff8836f454>{:nfsd:nfsd+0} Sep 21 09:10:12 host2 kernel: <ffffffff8836f623>{:nfsd:nfsd+463} <ffffffff80137987>{do_exit+2300} Sep 21 09:10:12 host2 kernel: <ffffffff8010bea6>{child_rip+8} <ffffffff8836f454>{:nfsd:nfsd+0} Sep 21 09:10:12 host2 kernel: <ffffffff8836f454>{:nfsd:nfsd+0} <ffffffff8010be9e>{child_rip+0} Sep 21 09:10:12 host2 kernel:
There should have been another message possible just above the "cut here" saying possibly that the there were not enough credits, or something about a running or committing transaction. On 09/21/2010 12:38 AM, Georg H?llrigl wrote:> Hello, > > I got a crashed server when using ocfs2 on SLES10 with Kernel 2.6.16.60-0.42.8-smp > > I know, you'll tell me to ask Novell for support, but I want to make sure this is not some bug that > also exists in the main-line. The only information I got is in syslog directly before the server > crashed. Maybe you have any hint, who could make use of this information. > > > Regards > > Georg > > > > > > Sep 21 09:10:12 host2 kernel: ----------- [cut here ] --------- [please bite here ] --------- > Sep 21 09:10:12 host2 kernel: Kernel BUG at fs/jbd/transaction.c:1114 > Sep 21 09:10:12 host2 kernel: invalid opcode: 0000 [1] SMP > Sep 21 09:10:12 host2 kernel: last sysfs file: > /devices/pci0000:00/0000:00:03.0/0000:01:00.0/0000:02:0e.0/host0/target0:2:0/0:2:0:0/rev > Sep 21 09:10:12 host2 kernel: CPU 2 > Sep 21 09:10:12 host2 kernel: Modules linked in: joydev af_packet ocfs2 ocfs2_dlmfs ocfs2_dlm > ocfs2_nodemanager configfs nfsd exportfs nfs lockd nfs_acl sunrpc bonding ipv6 apparmor loop sr_mod > cdrom usb_storage dm_round_robin usbhid dm_emc dm_multipath dm_mod uhci_hcd ehci_hcd hw_random > shpchp usbcore pci_hotplug bnx2 ext3 jbd qla2xxx firmware_class scsi_transport_fc sg megaraid_sas > edd processor siimage megaraid_mbox megaraid_mm sd_mod scsi_mod ide_disk ide_core > Sep 21 09:10:12 host2 kernel: Pid: 10412, comm: nfsd Not tainted 2.6.16.60-0.42.8-smp #1 > Sep 21 09:10:12 host2 kernel: RIP: 0010:[<ffffffff8810b1e0>] > <ffffffff8810b1e0>{:jbd:journal_dirty_metadata+200} > Sep 21 09:10:12 host2 kernel: RSP: 0018:ffff81012283dc18 EFLAGS: 00010292 > Sep 21 09:10:12 host2 kernel: RAX: 000000000000006e RBX: ffff8101210fd880 RCX: ffffffff8035b9e8 > Sep 21 09:10:12 host2 kernel: RDX: ffffffff8035b9e8 RSI: 0000000000000296 RDI: ffffffff8035b9e0 > Sep 21 09:10:12 host2 kernel: RBP: ffff8100aecc4818 R08: ffffffff8035b9e8 R09: ffff810127b21600 > Sep 21 09:10:12 host2 kernel: R10: ffff81000103b780 R11: 0000000000000292 R12: ffff81006db33660 > Sep 21 09:10:12 host2 kernel: R13: ffff810019491e50 R14: ffff810127cbcc00 R15: ffff8100b5ec2c8c > Sep 21 09:10:12 host2 kernel: FS: 00002b2e914eb6d0(0000) GS:ffff81012bd6d740(0000) > knlGS:0000000000000000 > Sep 21 09:10:12 host2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Sep 21 09:10:12 host2 kernel: CR2: 00002b483db48000 CR3: 0000000106c87000 CR4: 00000000000006e0 > Sep 21 09:10:12 host2 kernel: Process nfsd (pid: 10412, threadinfo ffff81012283c000, task > ffff81012b0ae820) > Sep 21 09:10:12 host2 kernel: Stack: ffff8100aecc4818 ffff81006db33660 0000000000000000 0000000000000003 > Sep 21 09:10:12 host2 kernel: ffff8101174c1000 ffffffff88463612 0000000000000000 ffff8101174c13b0 > Sep 21 09:10:12 host2 kernel: ffff8100bfe5f6a0 ffffffff8844c441 > Sep 21 09:10:12 host2 kernel: Call Trace:<ffffffff88463612>{:ocfs2:ocfs2_journal_dirty+106} > Sep 21 09:10:12 host2 kernel:<ffffffff8844c441>{:ocfs2:__ocfs2_add_entry+745} > <ffffffff8846a671>{:ocfs2:ocfs2_mknod+1710} > Sep 21 09:10:12 host2 kernel:<ffffffff8846a950>{:ocfs2:ocfs2_mkdir+127} > <ffffffff80193435>{vfs_mkdir+346} > Sep 21 09:10:12 host2 kernel:<ffffffff88373ef5>{:nfsd:nfsd_create+753} > <ffffffff8837ac63>{:nfsd:nfsd3_proc_mkdir+217} > Sep 21 09:10:12 host2 kernel:<ffffffff8836f0ea>{:nfsd:nfsd_dispatch+216} > <ffffffff882e2813>{:sunrpc:svc_process+982} > Sep 21 09:10:12 host2 kernel:<ffffffff802eb816>{__down_read+21}<ffffffff8836f454>{:nfsd:nfsd+0} > Sep 21 09:10:12 host2 kernel:<ffffffff8836f623>{:nfsd:nfsd+463}<ffffffff80137987>{do_exit+2300} > Sep 21 09:10:12 host2 kernel:<ffffffff8010bea6>{child_rip+8}<ffffffff8836f454>{:nfsd:nfsd+0} > Sep 21 09:10:12 host2 kernel:<ffffffff8836f454>{:nfsd:nfsd+0}<ffffffff8010be9e>{child_rip+0} > Sep 21 09:10:12 host2 kernel: > > > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
One possibility is the following fix. But this is a 3 years old fix. Another possibility is that you are encountering a new issue. Either way Novell support will be best placed to help you. =======================================commit e051fda4fd14fe878e6d2183b3a4640febe9e9a8 Author: Mark Fasheh <mark.fasheh at oracle.com> Date: Thu Feb 1 11:40:16 2007 -0800 ocfs2: ocfs2_link() journal credits update Commit 592282cf2eaa33409c6511ddd3f3ecaa57daeaaa fixed some missing directory c/mtime updates in part by introducing a dinode update in ocfs2_add_entry(). Unfortunately, ocfs2_link() (which didn't update the directory inode before) is now missing a single journal credit. Fix this by doubling the number of inode updates expected during hard link creation. Signed-off-by: Mark Fasheh <mark.fasheh at oracle.com> ======================================= On 09/22/2010 12:37 AM, Georg H?llrigl wrote:> There is a line above: > > Sep 21 09:10:12 host2 kernel: Assertion failure in > journal_dirty_metadata() at fs/jbd/transaction.c:1114: > "handle->h_buffer_credits > 0" > Sep 21 09:10:12 host2 kernel: ----------- [cut here ] --------- > [please bite here ] --------- > > > > > Am 21.09.2010 22:47, schrieb Sunil Mushran: >> There should have been another message possible just above the "cut >> here" >> saying possibly that the there were not enough credits, or something >> about >> a running or committing transaction. >> >> On 09/21/2010 12:38 AM, Georg H?llrigl wrote: >>> Hello, >>> >>> I got a crashed server when using ocfs2 on SLES10 with Kernel >>> 2.6.16.60-0.42.8-smp >>> >>> I know, you'll tell me to ask Novell for support, but I want to make >>> sure this is not some bug that >>> also exists in the main-line. The only information I got is in >>> syslog directly before the server >>> crashed. Maybe you have any hint, who could make use of this >>> information. >>> >>> >>> Regards >>> >>> Georg >>> >>> >>> >>> >>> >>> Sep 21 09:10:12 host2 kernel: ----------- [cut here ] --------- >>> [please bite here ] --------- >>> Sep 21 09:10:12 host2 kernel: Kernel BUG at fs/jbd/transaction.c:1114 >>> Sep 21 09:10:12 host2 kernel: invalid opcode: 0000 [1] SMP >>> Sep 21 09:10:12 host2 kernel: last sysfs file: >>> /devices/pci0000:00/0000:00:03.0/0000:01:00.0/0000:02:0e.0/host0/target0:2:0/0:2:0:0/rev >>> >>> Sep 21 09:10:12 host2 kernel: CPU 2 >>> Sep 21 09:10:12 host2 kernel: Modules linked in: joydev af_packet >>> ocfs2 ocfs2_dlmfs ocfs2_dlm >>> ocfs2_nodemanager configfs nfsd exportfs nfs lockd nfs_acl sunrpc >>> bonding ipv6 apparmor loop sr_mod >>> cdrom usb_storage dm_round_robin usbhid dm_emc dm_multipath dm_mod >>> uhci_hcd ehci_hcd hw_random >>> shpchp usbcore pci_hotplug bnx2 ext3 jbd qla2xxx firmware_class >>> scsi_transport_fc sg megaraid_sas >>> edd processor siimage megaraid_mbox megaraid_mm sd_mod scsi_mod >>> ide_disk ide_core >>> Sep 21 09:10:12 host2 kernel: Pid: 10412, comm: nfsd Not tainted >>> 2.6.16.60-0.42.8-smp #1 >>> Sep 21 09:10:12 host2 kernel: RIP: 0010:[<ffffffff8810b1e0>] >>> <ffffffff8810b1e0>{:jbd:journal_dirty_metadata+200} >>> Sep 21 09:10:12 host2 kernel: RSP: 0018:ffff81012283dc18 EFLAGS: >>> 00010292 >>> Sep 21 09:10:12 host2 kernel: RAX: 000000000000006e RBX: >>> ffff8101210fd880 RCX: ffffffff8035b9e8 >>> Sep 21 09:10:12 host2 kernel: RDX: ffffffff8035b9e8 RSI: >>> 0000000000000296 RDI: ffffffff8035b9e0 >>> Sep 21 09:10:12 host2 kernel: RBP: ffff8100aecc4818 R08: >>> ffffffff8035b9e8 R09: ffff810127b21600 >>> Sep 21 09:10:12 host2 kernel: R10: ffff81000103b780 R11: >>> 0000000000000292 R12: ffff81006db33660 >>> Sep 21 09:10:12 host2 kernel: R13: ffff810019491e50 R14: >>> ffff810127cbcc00 R15: ffff8100b5ec2c8c >>> Sep 21 09:10:12 host2 kernel: FS: 00002b2e914eb6d0(0000) >>> GS:ffff81012bd6d740(0000) >>> knlGS:0000000000000000 >>> Sep 21 09:10:12 host2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >>> 000000008005003b >>> Sep 21 09:10:12 host2 kernel: CR2: 00002b483db48000 CR3: >>> 0000000106c87000 CR4: 00000000000006e0 >>> Sep 21 09:10:12 host2 kernel: Process nfsd (pid: 10412, threadinfo >>> ffff81012283c000, task >>> ffff81012b0ae820) >>> Sep 21 09:10:12 host2 kernel: Stack: ffff8100aecc4818 >>> ffff81006db33660 0000000000000000 >>> 0000000000000003 >>> Sep 21 09:10:12 host2 kernel: ffff8101174c1000 ffffffff88463612 >>> 0000000000000000 ffff8101174c13b0 >>> Sep 21 09:10:12 host2 kernel: ffff8100bfe5f6a0 ffffffff8844c441 >>> Sep 21 09:10:12 host2 kernel: Call >>> Trace:<ffffffff88463612>{:ocfs2:ocfs2_journal_dirty+106} >>> Sep 21 09:10:12 host2 >>> kernel:<ffffffff8844c441>{:ocfs2:__ocfs2_add_entry+745} >>> <ffffffff8846a671>{:ocfs2:ocfs2_mknod+1710} >>> Sep 21 09:10:12 host2 kernel:<ffffffff8846a950>{:ocfs2:ocfs2_mkdir+127} >>> <ffffffff80193435>{vfs_mkdir+346} >>> Sep 21 09:10:12 host2 kernel:<ffffffff88373ef5>{:nfsd:nfsd_create+753} >>> <ffffffff8837ac63>{:nfsd:nfsd3_proc_mkdir+217} >>> Sep 21 09:10:12 host2 >>> kernel:<ffffffff8836f0ea>{:nfsd:nfsd_dispatch+216} >>> <ffffffff882e2813>{:sunrpc:svc_process+982} >>> Sep 21 09:10:12 host2 >>> kernel:<ffffffff802eb816>{__down_read+21}<ffffffff8836f454>{:nfsd:nfsd+0} >>> >>> Sep 21 09:10:12 host2 >>> kernel:<ffffffff8836f623>{:nfsd:nfsd+463}<ffffffff80137987>{do_exit+2300} >>> >>> Sep 21 09:10:12 host2 >>> kernel:<ffffffff8010bea6>{child_rip+8}<ffffffff8836f454>{:nfsd:nfsd+0} >>> Sep 21 09:10:12 host2 >>> kernel:<ffffffff8836f454>{:nfsd:nfsd+0}<ffffffff8010be9e>{child_rip+0} >>> Sep 21 09:10:12 host2 kernel: >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users at oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> >