Hi We have a two-node RAC cluster, which uses ASM for the database storage, but is using OCFS2 to mount a couple of file systems for a) the RMAN backups, and b) Oracle Data files, i.e. files read from or written to DBA Directories. One of the servers in the cluster crashed, and checks revealed the following error messages: Jun 5 17:00:27 cadbbe2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000bc4 Jun 5 17:00:27 cadbbe2 kernel: printing eip: Jun 5 17:00:27 cadbbe2 kernel: f9125eb2 Jun 5 17:00:27 cadbbe2 kernel: *pde = 09ac4001 Jun 5 17:00:27 cadbbe2 kernel: Oops: 0002 [#1] Jun 5 17:00:27 cadbbe2 kernel: SMP Jun 5 17:00:27 cadbbe2 kernel: Modules linked in: hangcheck_timer parport_pc lp parport oracleasm(U) autofs4 i2c_dev i2c_core ocfs$ Jun 5 17:00:27 cadbbe2 kernel: CPU: 3 Jun 5 17:00:27 cadbbe2 kernel: EIP: 0060:[<f9125eb2>] Tainted: P VLI Jun 5 17:00:27 cadbbe2 kernel: EFLAGS: 00010246 (2.6.9-42.ELsmp) Jun 5 17:00:27 cadbbe2 kernel: EIP is at ocfs2_free_suballoc_bits+0x4ca/0x766 [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: eax: 00000000 ebx: 00000000 ecx: 0000000b edx: 00000000 Jun 5 17:00:27 cadbbe2 kernel: esi: 00005c28 edi: 000000ff ebp: d828b000 esp: d643cdb0 Jun 5 17:00:27 cadbbe2 kernel: ds: 007b es: 007b ss: 0068 Jun 5 17:00:27 cadbbe2 kernel: Process rm (pid: 27647, threadinfo=d643c000 task=d520f6b0) Jun 5 17:00:27 cadbbe2 kernel: Stack: 0000000b 00000000 00000000 00000001 f19e9d48 f2fa80c0 f2fa8000 d721489c Jun 5 17:00:27 cadbbe2 kernel: f301a728 f6e2fa40 f19e9d48 f3dd5880 0000000c 00000000 f3dd5880 f912646f Jun 5 17:00:27 cadbbe2 kernel: 00005b29 0d478a00 00000000 00000100 0d47e529 f6d60c00 d721489c f301a728 Jun 5 17:00:27 cadbbe2 kernel: Call Trace: Jun 5 17:00:27 cadbbe2 kernel: [<f912646f>] ocfs2_free_clusters+0x2ad/0x37a [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f90f872d>] ocfs2_replay_truncate_records+0x2e4/0x3c7 [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f90f8b80>] __ocfs2_flush_truncate_log+0x370/0x45b [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f90fad8d>] ocfs2_commit_truncate+0x508/0x8a3 [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f910f347>] ocfs2_truncate_for_delete+0x255/0x312 [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f910fb54>] ocfs2_wipe_inode+0x1aa/0x2ee [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f9110555>] ocfs2_delete_inode+0x2d6/0x450 [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<f911027f>] ocfs2_delete_inode+0x0/0x450 [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<c0171954>] generic_delete_inode+0xa2/0x104 Jun 5 17:00:27 cadbbe2 kernel: [<f9111407>] ocfs2_drop_inode+0xe6/0x12a [ocfs2] Jun 5 17:00:27 cadbbe2 kernel: [<c0171b30>] iput+0x5f/0x61 Jun 5 17:00:27 cadbbe2 kernel: [<c0168e46>] sys_unlink+0xd7/0x132 Jun 5 17:00:27 cadbbe2 kernel: [<c015bcbc>] fget+0x3b/0x42 Jun 5 17:00:27 cadbbe2 kernel: [<c016add6>] sys_ioctl+0x227/0x269 Jun 5 17:00:27 cadbbe2 kernel: [<c016ae0c>] sys_ioctl+0x25d/0x269 Jun 5 17:00:27 cadbbe2 kernel: [<c02d4703>] syscall_call+0x7/0xb Jun 5 17:00:27 cadbbe2 kernel: Code: 00 8b 44 24 20 89 14 24 89 4c 24 04 8b 88 d8 fd ff ff 8b 98 dc fd ff ff 8b 54 24 04 8b 04 24 $ Jun 5 17:00:27 cadbbe2 kernel: <0>Fatal exception: panic in 5 seconds The server was manually rebooted, and the filesystems mounted automatically. No issues have been experienced since this date. The only thing that would have been running at this time was an RMAN deletion of old backups from disk. Has anybody seen this issue before, or is there any advice on how to troubleshoot this. This is a production system, so I am limited in the tests that I can actually carry out. Thanks in advance Stuart IMPORTANT NOTICE: This message is intended for the addressee only. The content may be confidential, legally privileged and protected by law. Unauthorised use, copying or disclosure of any of it may be unlawful. If you are not the intended recipient please notify the sender and remove it from your system. Internet emails are not necessarily secure. Although we have taken steps to ensure this email and attachments are free from any virus, we advise that in keeping with good computing practice you should ensure they are actually virus free. The right to monitor email communications through our network is reserved by us. Sopra Group Limited (Registered in England, No. 1588948) with Registered Offices at: Middlesex House, Meadway Technology Park, Rutherford Close, Stevenage, Hertfordshire, SG1 2EF. VAT No. 366 9784 84. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090617/73ee580c/attachment.html
Please file a bugzilla and _attach_ this oops trace. Also mention all the version numbers. On Jun 17, 2009, at 2:30 AM, "McDonald, Stuart" <smcdonald at uk.sopragroup.com > wrote:> Hi > > We have a two-node RAC cluster, which uses ASM for the database > storage, but is using OCFS2 to mount a couple of file systems for a) > the RMAN backups, and b) Oracle Data files, i.e. files read from or > written to DBA Directories. > > One of the servers in the cluster crashed, and checks revealed the > following error messages: > > Jun 5 17:00:27 cadbbe2 kernel: Unable to handle kernel NULL pointer > dereference at virtual address 00000bc4 > Jun 5 17:00:27 cadbbe2 kernel: printing eip: > Jun 5 17:00:27 cadbbe2 kernel: f9125eb2 > Jun 5 17:00:27 cadbbe2 kernel: *pde = 09ac4001 > Jun 5 17:00:27 cadbbe2 kernel: Oops: 0002 [#1] > Jun 5 17:00:27 cadbbe2 kernel: SMP > Jun 5 17:00:27 cadbbe2 kernel: Modules linked in: hangcheck_timer > parport_pc lp parport oracleasm(U) autofs4 i2c_dev i2c_core ocfs$ > > Jun 5 17:00:27 cadbbe2 kernel: CPU: 3 > Jun 5 17:00:27 cadbbe2 kernel: EIP: 0060:[<f9125eb2>] Tainted: P VLI > Jun 5 17:00:27 cadbbe2 kernel: EFLAGS: 00010246 (2.6.9-42.ELsmp) > Jun 5 17:00:27 cadbbe2 kernel: EIP is at ocfs2_free_suballoc_bits > +0x4ca/0x766 [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: eax: 00000000 ebx: 00000000 ecx: 0000000 > b edx: 00000000 > Jun 5 17:00:27 cadbbe2 kernel: esi: 00005c28 edi: 000000ff ebp: > d828b000 esp: d643cdb0 > Jun 5 17:00:27 cadbbe2 kernel: ds: 007b es: 007b ss: 0068 > Jun 5 17:00:27 cadbbe2 kernel: Process rm (pid: 27647, > threadinfo=d643c000 task=d520f6b0) > Jun 5 17:00:27 cadbbe2 kernel: Stack: 0000000b 00000000 00000000 > 00000001 f19e9d48 f2fa80c0 f2fa8000 d721489c > Jun 5 17:00:27 cadbbe2 kernel: f301a728 f6e2fa40 f19e9d48 f3dd5880 0000000 > c 00000000 f3dd5880 f912646f > Jun 5 17:00:27 cadbbe2 kernel: 00005b29 0d478a00 00000000 00000100 > 0d47e529 f6d60c00 d721489c f301a728 > Jun 5 17:00:27 cadbbe2 kernel: Call Trace: > Jun 5 17:00:27 cadbbe2 kernel: [<f912646f>] ocfs2_free_clusters > +0x2ad/0x37a [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f90f872d>] > ocfs2_replay_truncate_records+0x2e4/0x3c7 [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f90f8b80>] > __ocfs2_flush_truncate_log+0x370/0x45b [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f90fad8d>] ocfs2_commit_truncate > +0x508/0x8a3 [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f910f347>] ocfs2_truncate_for_delete > +0x255/0x312 [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f910fb54>] ocfs2_wipe_inode+0x1aa/ > 0x2ee [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f9110555>] ocfs2_delete_inode > +0x2d6/0x450 [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<f911027f>] ocfs2_delete_inode > +0x0/0x450 [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<c0171954>] generic_delete_inode > +0xa2/0x104 > Jun 5 17:00:27 cadbbe2 kernel: [<f9111407>] ocfs2_drop_inode > +0xe6/0x12a [ocfs2] > Jun 5 17:00:27 cadbbe2 kernel: [<c0171b30>] iput+0x5f/0x61 > Jun 5 17:00:27 cadbbe2 kernel: [<c0168e46>] sys_unlink+0xd7/0x132 > Jun 5 17:00:27 cadbbe2 kernel: [<c015bcbc>] fget+0x3b/0x42 > Jun 5 17:00:27 cadbbe2 kernel: [<c016add6>] sys_ioctl+0x227/0x269 > Jun 5 17:00:27 cadbbe2 kernel: [<c016ae0c>] sys_ioctl+0x25d/0x269 > Jun 5 17:00:27 cadbbe2 kernel: [<c02d4703>] syscall_call+0x7/0xb > Jun 5 17:00:27 cadbbe2 kernel: Code: 00 8b 44 24 20 89 14 24 89 4c > 24 04 8b 88 d8 fd ff ff 8b 98 dc fd ff ff 8b 54 24 04 8b 04 24 $ > > Jun 5 17:00:27 cadbbe2 kernel: <0>Fatal exception: panic in 5 seconds > > The server was manually rebooted, and the filesystems mounted > automatically. No issues have been experienced since this date. > > The only thing that would have been running at this time was an RMAN > deletion of old backups from disk. > > Has anybody seen this issue before, or is there any advice on how to > troubleshoot this. > > This is a production system, so I am limited in the tests that I can > actually carry out. > > Thanks in advance > Stuart > > > IMPORTANT NOTICE: This message is intended for the addressee only. > The content > may be confidential, legally privileged and protected by law. > Unauthorised > use, copying or disclosure of any of it may be unlawful. If you are > not the > intended recipient please notify the sender and remove it from your > system. > Internet emails are not necessarily secure. Although we have taken > steps to > ensure this email and attachments are free from any virus, we advise > that in > keeping with good computing practice you should ensure they are > actually virus > free. The right to monitor email communications through our network is > reserved by us. > > Sopra Group Limited (Registered in England, No. 1588948) with > Registered > Offices at: Middlesex House, Meadway Technology Park, Rutherford > Close, > Stevenage, Hertfordshire, SG1 2EF. VAT No. 366 9784 84. > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090617/abc64f28/attachment-0001.html
Herbert van den Bergh
2009-Jun-17 15:43 UTC
[Ocfs2-users] OCFS2 Caused RAC server to crash
Since you are using Oracle RAC with OCFS2, you can request support for OCFS2 via Metalink using your Oracle CSI. Thanks, Herbert, Sunil Mushran wrote:> Please file a bugzilla and _attach_ this oops trace. Also mention all > the version numbers. > > > On Jun 17, 2009, at 2:30 AM, "McDonald, Stuart" > <smcdonald at uk.sopragroup.com <mailto:smcdonald at uk.sopragroup.com>> wrote: > >> Hi >> >> We have a two-node RAC cluster, which uses ASM for the database >> storage, but is using OCFS2 to mount a couple of file systems for a) >> the RMAN backups, and b) Oracle Data files, i.e. files read from or >> written to DBA Directories. >> >> One of the servers in the cluster crashed, and checks revealed the >> following error messages: >> >> Jun 5 17:00:27 cadbbe2 kernel: Unable to handle kernel NULL pointer >> dereference at virtual address 00000bc4 >> Jun 5 17:00:27 cadbbe2 kernel: printing eip: >> Jun 5 17:00:27 cadbbe2 kernel: f9125eb2 >> Jun 5 17:00:27 cadbbe2 kernel: *pde = 09ac4001 >> Jun 5 17:00:27 cadbbe2 kernel: Oops: 0002 [#1] >> Jun 5 17:00:27 cadbbe2 kernel: SMP >> Jun 5 17:00:27 cadbbe2 kernel: Modules linked in: hangcheck_timer >> parport_pc lp parport oracleasm(U) autofs4 i2c_dev i2c_core ocfs$ >> >> Jun 5 17:00:27 cadbbe2 kernel: CPU: 3 >> Jun 5 17:00:27 cadbbe2 kernel: EIP: 0060:[<f9125eb2>] Tainted: P VLI >> Jun 5 17:00:27 cadbbe2 kernel: EFLAGS: 00010246 (2.6.9-42.ELsmp) >> Jun 5 17:00:27 cadbbe2 kernel: EIP is at >> ocfs2_free_suballoc_bits+0x4ca/0x766 [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: eax: 00000000 ebx: 00000000 ecx: >> 0000000b edx: 00000000 >> Jun 5 17:00:27 cadbbe2 kernel: esi: 00005c28 edi: 000000ff ebp: >> d828b000 esp: d643cdb0 >> Jun 5 17:00:27 cadbbe2 kernel: ds: 007b es: 007b ss: 0068 >> Jun 5 17:00:27 cadbbe2 kernel: Process rm (pid: 27647, >> threadinfo=d643c000 task=d520f6b0) >> Jun 5 17:00:27 cadbbe2 kernel: Stack: 0000000b 00000000 00000000 >> 00000001 f19e9d48 f2fa80c0 f2fa8000 d721489c >> Jun 5 17:00:27 cadbbe2 kernel: f301a728 f6e2fa40 f19e9d48 f3dd5880 >> 0000000c 00000000 f3dd5880 f912646f >> Jun 5 17:00:27 cadbbe2 kernel: 00005b29 0d478a00 00000000 00000100 >> 0d47e529 f6d60c00 d721489c f301a728 >> Jun 5 17:00:27 cadbbe2 kernel: Call Trace: >> Jun 5 17:00:27 cadbbe2 kernel: [<f912646f>] >> ocfs2_free_clusters+0x2ad/0x37a [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f90f872d>] >> ocfs2_replay_truncate_records+0x2e4/0x3c7 [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f90f8b80>] >> __ocfs2_flush_truncate_log+0x370/0x45b [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f90fad8d>] >> ocfs2_commit_truncate+0x508/0x8a3 [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f910f347>] >> ocfs2_truncate_for_delete+0x255/0x312 [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f910fb54>] >> ocfs2_wipe_inode+0x1aa/0x2ee [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f9110555>] >> ocfs2_delete_inode+0x2d6/0x450 [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<f911027f>] >> ocfs2_delete_inode+0x0/0x450 [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<c0171954>] >> generic_delete_inode+0xa2/0x104 >> Jun 5 17:00:27 cadbbe2 kernel: [<f9111407>] >> ocfs2_drop_inode+0xe6/0x12a [ocfs2] >> Jun 5 17:00:27 cadbbe2 kernel: [<c0171b30>] iput+0x5f/0x61 >> Jun 5 17:00:27 cadbbe2 kernel: [<c0168e46>] sys_unlink+0xd7/0x132 >> Jun 5 17:00:27 cadbbe2 kernel: [<c015bcbc>] fget+0x3b/0x42 >> Jun 5 17:00:27 cadbbe2 kernel: [<c016add6>] sys_ioctl+0x227/0x269 >> Jun 5 17:00:27 cadbbe2 kernel: [<c016ae0c>] sys_ioctl+0x25d/0x269 >> Jun 5 17:00:27 cadbbe2 kernel: [<c02d4703>] syscall_call+0x7/0xb >> Jun 5 17:00:27 cadbbe2 kernel: Code: 00 8b 44 24 20 89 14 24 89 4c 24 >> 04 8b 88 d8 fd ff ff 8b 98 dc fd ff ff 8b 54 24 04 8b 04 24 $ >> >> Jun 5 17:00:27 cadbbe2 kernel: <0>Fatal exception: panic in 5 seconds >> >> The server was manually rebooted, and the filesystems mounted >> automatically. No issues have been experienced since this date. >> >> The only thing that would have been running at this time was an RMAN >> deletion of old backups from disk. >> >> Has anybody seen this issue before, or is there any advice on how to >> troubleshoot this. >> >> This is a production system, so I am limited in the tests that I can >> actually carry out. >> >> Thanks in advance >> Stuart >> >> >> IMPORTANT NOTICE: This message is intended for the addressee only. The content >> may be confidential, legally privileged and protected by law. Unauthorised >> use, copying or disclosure of any of it may be unlawful. If you are not the >> intended recipient please notify the sender and remove it from your system. >> Internet emails are not necessarily secure. Although we have taken steps to >> ensure this email and attachments are free from any virus, we advise that in >> keeping with good computing practice you should ensure they are actually virus >> free. The right to monitor email communications through our network is >> reserved by us. >> >> Sopra Group Limited (Registered in England, No. 1588948) with Registered >> Offices at: Middlesex House, Meadway Technology Park, Rutherford Close, >> Stevenage, Hertfordshire, SG1 2EF. VAT No. 366 9784 84. >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090617/9ad2c44b/attachment.html