thr3ads.net - Ocfs2 users - [Ocfs2-users] OCFS2 Caused RAC server to crash [Jun 2009]

If this information is useful, please help other people find it:
Share via:

McDonald, Stuart

2009-Jun-17 09:30 UTC

[Ocfs2-users] OCFS2 Caused RAC server to crash

Hi

We have a two-node RAC cluster, which uses ASM for the database storage,
but is using OCFS2 to mount a couple of file systems for a) the RMAN
backups, and b) Oracle Data files, i.e. files read from or written to
DBA Directories.

One of the servers in the cluster crashed, and checks revealed the
following error messages:

Jun 5 17:00:27 cadbbe2 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000bc4
Jun 5 17:00:27 cadbbe2 kernel: printing eip:
Jun 5 17:00:27 cadbbe2 kernel: f9125eb2
Jun 5 17:00:27 cadbbe2 kernel: *pde = 09ac4001
Jun 5 17:00:27 cadbbe2 kernel: Oops: 0002 [#1]
Jun 5 17:00:27 cadbbe2 kernel: SMP
Jun 5 17:00:27 cadbbe2 kernel: Modules linked in: hangcheck_timer
parport_pc lp parport oracleasm(U) autofs4 i2c_dev i2c_core ocfs$
Jun 5 17:00:27 cadbbe2 kernel: CPU: 3
Jun 5 17:00:27 cadbbe2 kernel: EIP: 0060:[<f9125eb2>] Tainted: P VLI
Jun 5 17:00:27 cadbbe2 kernel: EFLAGS: 00010246 (2.6.9-42.ELsmp)
Jun 5 17:00:27 cadbbe2 kernel: EIP is at
ocfs2_free_suballoc_bits+0x4ca/0x766 [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: eax: 00000000 ebx: 00000000 ecx: 0000000b
edx: 00000000
Jun 5 17:00:27 cadbbe2 kernel: esi: 00005c28 edi: 000000ff ebp: d828b000
esp: d643cdb0
Jun 5 17:00:27 cadbbe2 kernel: ds: 007b es: 007b ss: 0068
Jun 5 17:00:27 cadbbe2 kernel: Process rm (pid: 27647,
threadinfo=d643c000 task=d520f6b0)
Jun 5 17:00:27 cadbbe2 kernel: Stack: 0000000b 00000000 00000000
00000001 f19e9d48 f2fa80c0 f2fa8000 d721489c
Jun 5 17:00:27 cadbbe2 kernel: f301a728 f6e2fa40 f19e9d48 f3dd5880
0000000c 00000000 f3dd5880 f912646f
Jun 5 17:00:27 cadbbe2 kernel: 00005b29 0d478a00 00000000 00000100
0d47e529 f6d60c00 d721489c f301a728
Jun 5 17:00:27 cadbbe2 kernel: Call Trace:
Jun 5 17:00:27 cadbbe2 kernel: [<f912646f>]
ocfs2_free_clusters+0x2ad/0x37a [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f90f872d>]
ocfs2_replay_truncate_records+0x2e4/0x3c7 [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f90f8b80>]
__ocfs2_flush_truncate_log+0x370/0x45b [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f90fad8d>]
ocfs2_commit_truncate+0x508/0x8a3 [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f910f347>]
ocfs2_truncate_for_delete+0x255/0x312 [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f910fb54>] ocfs2_wipe_inode+0x1aa/0x2ee
[ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f9110555>]
ocfs2_delete_inode+0x2d6/0x450 [ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<f911027f>] ocfs2_delete_inode+0x0/0x450
[ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<c0171954>]
generic_delete_inode+0xa2/0x104
Jun 5 17:00:27 cadbbe2 kernel: [<f9111407>] ocfs2_drop_inode+0xe6/0x12a
[ocfs2]
Jun 5 17:00:27 cadbbe2 kernel: [<c0171b30>] iput+0x5f/0x61
Jun 5 17:00:27 cadbbe2 kernel: [<c0168e46>] sys_unlink+0xd7/0x132
Jun 5 17:00:27 cadbbe2 kernel: [<c015bcbc>] fget+0x3b/0x42
Jun 5 17:00:27 cadbbe2 kernel: [<c016add6>] sys_ioctl+0x227/0x269
Jun 5 17:00:27 cadbbe2 kernel: [<c016ae0c>] sys_ioctl+0x25d/0x269
Jun 5 17:00:27 cadbbe2 kernel: [<c02d4703>] syscall_call+0x7/0xb
Jun 5 17:00:27 cadbbe2 kernel: Code: 00 8b 44 24 20 89 14 24 89 4c 24 04
8b 88 d8 fd ff ff 8b 98 dc fd ff ff 8b 54 24 04 8b 04 24 $
Jun 5 17:00:27 cadbbe2 kernel: <0>Fatal exception: panic in 5 seconds

The server was manually rebooted, and the filesystems mounted
automatically. No issues have been experienced since this date.

The only thing that would have been running at this time was an RMAN
deletion of old backups from disk.

Has anybody seen this issue before, or is there any advice on how to
troubleshoot this. 

This is a production system, so I am limited in the tests that I can
actually carry out.

Thanks in advance
Stuart



IMPORTANT NOTICE: This message is intended for the addressee only. The content
may be confidential, legally privileged and protected by law. Unauthorised
use, copying or disclosure of any of it may be unlawful. If you are not the
intended recipient please notify the sender and remove it from your system.
Internet emails are not necessarily secure.  Although we have taken steps to
ensure this email and attachments are free from any virus, we advise that in
keeping with good computing practice you should ensure they are actually virus
free. The right to monitor email communications through our network is
reserved by us. 
 
Sopra Group Limited (Registered in England, No. 1588948) with Registered
Offices at: Middlesex House, Meadway Technology Park, Rutherford Close,
Stevenage, Hertfordshire, SG1 2EF.  VAT No. 366 9784 84.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090617/73ee580c/attachment.html

Sunil Mushran

2009-Jun-17 13:10 UTC

head link

[Ocfs2-users] OCFS2 Caused RAC server to crash

Please file a bugzilla and _attach_ this oops trace. Also mention all  
the version numbers.


On Jun 17, 2009, at 2:30 AM, "McDonald, Stuart" <smcdonald at
uk.sopragroup.com
 > wrote:
> Hi
>
> We have a two-node RAC cluster, which uses ASM for the database  
> storage, but is using OCFS2 to mount a couple of file systems for a)  
> the RMAN backups, and b) Oracle Data files, i.e. files read from or  
> written to DBA Directories.
>
> One of the servers in the cluster crashed, and checks revealed the  
> following error messages:
>
> Jun 5 17:00:27 cadbbe2 kernel: Unable to handle kernel NULL pointer  
> dereference at virtual address 00000bc4
> Jun 5 17:00:27 cadbbe2 kernel: printing eip:
> Jun 5 17:00:27 cadbbe2 kernel: f9125eb2
> Jun 5 17:00:27 cadbbe2 kernel: *pde = 09ac4001
> Jun 5 17:00:27 cadbbe2 kernel: Oops: 0002 [#1]
> Jun 5 17:00:27 cadbbe2 kernel: SMP
> Jun 5 17:00:27 cadbbe2 kernel: Modules linked in: hangcheck_timer  
> parport_pc lp parport oracleasm(U) autofs4 i2c_dev i2c_core ocfs$
>
> Jun 5 17:00:27 cadbbe2 kernel: CPU: 3
> Jun 5 17:00:27 cadbbe2 kernel: EIP: 0060:[<f9125eb2>] Tainted: P VLI
> Jun 5 17:00:27 cadbbe2 kernel: EFLAGS: 00010246 (2.6.9-42.ELsmp)
> Jun 5 17:00:27 cadbbe2 kernel: EIP is at ocfs2_free_suballoc_bits 
> +0x4ca/0x766 [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: eax: 00000000 ebx: 00000000 ecx: 0000000 
> b edx: 00000000
> Jun 5 17:00:27 cadbbe2 kernel: esi: 00005c28 edi: 000000ff ebp:  
> d828b000 esp: d643cdb0
> Jun 5 17:00:27 cadbbe2 kernel: ds: 007b es: 007b ss: 0068
> Jun 5 17:00:27 cadbbe2 kernel: Process rm (pid: 27647,  
> threadinfo=d643c000 task=d520f6b0)
> Jun 5 17:00:27 cadbbe2 kernel: Stack: 0000000b 00000000 00000000  
> 00000001 f19e9d48 f2fa80c0 f2fa8000 d721489c
> Jun 5 17:00:27 cadbbe2 kernel: f301a728 f6e2fa40 f19e9d48 f3dd5880 0000000 
> c 00000000 f3dd5880 f912646f
> Jun 5 17:00:27 cadbbe2 kernel: 00005b29 0d478a00 00000000 00000100  
> 0d47e529 f6d60c00 d721489c f301a728
> Jun 5 17:00:27 cadbbe2 kernel: Call Trace:
> Jun 5 17:00:27 cadbbe2 kernel: [<f912646f>] ocfs2_free_clusters 
> +0x2ad/0x37a [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f90f872d>]  
> ocfs2_replay_truncate_records+0x2e4/0x3c7 [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f90f8b80>]  
> __ocfs2_flush_truncate_log+0x370/0x45b [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f90fad8d>] ocfs2_commit_truncate 
> +0x508/0x8a3 [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f910f347>] ocfs2_truncate_for_delete
> +0x255/0x312 [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f910fb54>] ocfs2_wipe_inode+0x1aa/ 
> 0x2ee [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f9110555>] ocfs2_delete_inode 
> +0x2d6/0x450 [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<f911027f>] ocfs2_delete_inode 
> +0x0/0x450 [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<c0171954>] generic_delete_inode 
> +0xa2/0x104
> Jun 5 17:00:27 cadbbe2 kernel: [<f9111407>] ocfs2_drop_inode 
> +0xe6/0x12a [ocfs2]
> Jun 5 17:00:27 cadbbe2 kernel: [<c0171b30>] iput+0x5f/0x61
> Jun 5 17:00:27 cadbbe2 kernel: [<c0168e46>] sys_unlink+0xd7/0x132
> Jun 5 17:00:27 cadbbe2 kernel: [<c015bcbc>] fget+0x3b/0x42
> Jun 5 17:00:27 cadbbe2 kernel: [<c016add6>] sys_ioctl+0x227/0x269
> Jun 5 17:00:27 cadbbe2 kernel: [<c016ae0c>] sys_ioctl+0x25d/0x269
> Jun 5 17:00:27 cadbbe2 kernel: [<c02d4703>] syscall_call+0x7/0xb
> Jun 5 17:00:27 cadbbe2 kernel: Code: 00 8b 44 24 20 89 14 24 89 4c  
> 24 04 8b 88 d8 fd ff ff 8b 98 dc fd ff ff 8b 54 24 04 8b 04 24 $
>
> Jun 5 17:00:27 cadbbe2 kernel: <0>Fatal exception: panic in 5 seconds
>
> The server was manually rebooted, and the filesystems mounted  
> automatically. No issues have been experienced since this date.
>
> The only thing that would have been running at this time was an RMAN  
> deletion of old backups from disk.
>
> Has anybody seen this issue before, or is there any advice on how to  
> troubleshoot this.
>
> This is a production system, so I am limited in the tests that I can  
> actually carry out.
>
> Thanks in advance
> Stuart
>
>
> IMPORTANT NOTICE: This message is intended for the addressee only.  
> The content
> may be confidential, legally privileged and protected by law.  
> Unauthorised
> use, copying or disclosure of any of it may be unlawful. If you are  
> not the
> intended recipient please notify the sender and remove it from your  
> system.
> Internet emails are not necessarily secure.  Although we have taken  
> steps to
> ensure this email and attachments are free from any virus, we advise  
> that in
> keeping with good computing practice you should ensure they are  
> actually virus
> free. The right to monitor email communications through our network is
> reserved by us.
>
> Sopra Group Limited (Registered in England, No. 1588948) with  
> Registered
> Offices at: Middlesex House, Meadway Technology Park, Rutherford  
> Close,
> Stevenage, Hertfordshire, SG1 2EF.  VAT No. 366 9784 84.
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090617/abc64f28/attachment-0001.html

Herbert van den Bergh

2009-Jun-17 15:43 UTC

head link

[Ocfs2-users] OCFS2 Caused RAC server to crash

Since you are using Oracle RAC with OCFS2, you can request support for 
OCFS2 via Metalink using your Oracle CSI. 

Thanks,
Herbert,


Sunil Mushran wrote:> Please file a bugzilla and _attach_ this oops trace. Also mention all 
> the version numbers.
>
>
> On Jun 17, 2009, at 2:30 AM, "McDonald, Stuart" 
> <smcdonald at uk.sopragroup.com <mailto:smcdonald at
uk.sopragroup.com>> wrote:
>
>> Hi
>>
>> We have a two-node RAC cluster, which uses ASM for the database 
>> storage, but is using OCFS2 to mount a couple of file systems for a) 
>> the RMAN backups, and b) Oracle Data files, i.e. files read from or 
>> written to DBA Directories.
>>
>> One of the servers in the cluster crashed, and checks revealed the 
>> following error messages:
>>
>> Jun 5 17:00:27 cadbbe2 kernel: Unable to handle kernel NULL pointer 
>> dereference at virtual address 00000bc4
>> Jun 5 17:00:27 cadbbe2 kernel: printing eip:
>> Jun 5 17:00:27 cadbbe2 kernel: f9125eb2
>> Jun 5 17:00:27 cadbbe2 kernel: *pde = 09ac4001
>> Jun 5 17:00:27 cadbbe2 kernel: Oops: 0002 [#1]
>> Jun 5 17:00:27 cadbbe2 kernel: SMP
>> Jun 5 17:00:27 cadbbe2 kernel: Modules linked in: hangcheck_timer 
>> parport_pc lp parport oracleasm(U) autofs4 i2c_dev i2c_core ocfs$
>>
>> Jun 5 17:00:27 cadbbe2 kernel: CPU: 3
>> Jun 5 17:00:27 cadbbe2 kernel: EIP: 0060:[<f9125eb2>] Tainted: P
VLI
>> Jun 5 17:00:27 cadbbe2 kernel: EFLAGS: 00010246 (2.6.9-42.ELsmp)
>> Jun 5 17:00:27 cadbbe2 kernel: EIP is at 
>> ocfs2_free_suballoc_bits+0x4ca/0x766 [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: eax: 00000000 ebx: 00000000 ecx: 
>> 0000000b edx: 00000000
>> Jun 5 17:00:27 cadbbe2 kernel: esi: 00005c28 edi: 000000ff ebp: 
>> d828b000 esp: d643cdb0
>> Jun 5 17:00:27 cadbbe2 kernel: ds: 007b es: 007b ss: 0068
>> Jun 5 17:00:27 cadbbe2 kernel: Process rm (pid: 27647, 
>> threadinfo=d643c000 task=d520f6b0)
>> Jun 5 17:00:27 cadbbe2 kernel: Stack: 0000000b 00000000 00000000 
>> 00000001 f19e9d48 f2fa80c0 f2fa8000 d721489c
>> Jun 5 17:00:27 cadbbe2 kernel: f301a728 f6e2fa40 f19e9d48 f3dd5880 
>> 0000000c 00000000 f3dd5880 f912646f
>> Jun 5 17:00:27 cadbbe2 kernel: 00005b29 0d478a00 00000000 00000100 
>> 0d47e529 f6d60c00 d721489c f301a728
>> Jun 5 17:00:27 cadbbe2 kernel: Call Trace:
>> Jun 5 17:00:27 cadbbe2 kernel: [<f912646f>] 
>> ocfs2_free_clusters+0x2ad/0x37a [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f90f872d>] 
>> ocfs2_replay_truncate_records+0x2e4/0x3c7 [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f90f8b80>] 
>> __ocfs2_flush_truncate_log+0x370/0x45b [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f90fad8d>] 
>> ocfs2_commit_truncate+0x508/0x8a3 [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f910f347>] 
>> ocfs2_truncate_for_delete+0x255/0x312 [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f910fb54>] 
>> ocfs2_wipe_inode+0x1aa/0x2ee [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f9110555>] 
>> ocfs2_delete_inode+0x2d6/0x450 [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<f911027f>] 
>> ocfs2_delete_inode+0x0/0x450 [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<c0171954>] 
>> generic_delete_inode+0xa2/0x104
>> Jun 5 17:00:27 cadbbe2 kernel: [<f9111407>] 
>> ocfs2_drop_inode+0xe6/0x12a [ocfs2]
>> Jun 5 17:00:27 cadbbe2 kernel: [<c0171b30>] iput+0x5f/0x61
>> Jun 5 17:00:27 cadbbe2 kernel: [<c0168e46>] sys_unlink+0xd7/0x132
>> Jun 5 17:00:27 cadbbe2 kernel: [<c015bcbc>] fget+0x3b/0x42
>> Jun 5 17:00:27 cadbbe2 kernel: [<c016add6>] sys_ioctl+0x227/0x269
>> Jun 5 17:00:27 cadbbe2 kernel: [<c016ae0c>] sys_ioctl+0x25d/0x269
>> Jun 5 17:00:27 cadbbe2 kernel: [<c02d4703>] syscall_call+0x7/0xb
>> Jun 5 17:00:27 cadbbe2 kernel: Code: 00 8b 44 24 20 89 14 24 89 4c 24 
>> 04 8b 88 d8 fd ff ff 8b 98 dc fd ff ff 8b 54 24 04 8b 04 24 $
>>
>> Jun 5 17:00:27 cadbbe2 kernel: <0>Fatal exception: panic in 5
seconds
>>
>> The server was manually rebooted, and the filesystems mounted 
>> automatically. No issues have been experienced since this date.
>>
>> The only thing that would have been running at this time was an RMAN 
>> deletion of old backups from disk.
>>
>> Has anybody seen this issue before, or is there any advice on how to 
>> troubleshoot this.
>>
>> This is a production system, so I am limited in the tests that I can 
>> actually carry out.
>>
>> Thanks in advance
>> Stuart
>>
>>
>> IMPORTANT NOTICE: This message is intended for the addressee only. The
content
>> may be confidential, legally privileged and protected by law.
Unauthorised
>> use, copying or disclosure of any of it may be unlawful. If you are not
the
>> intended recipient please notify the sender and remove it from your
system.
>> Internet emails are not necessarily secure.  Although we have taken
steps to
>> ensure this email and attachments are free from any virus, we advise
that in
>> keeping with good computing practice you should ensure they are
actually virus
>> free. The right to monitor email communications through our network is
>> reserved by us. 
>>  
>> Sopra Group Limited (Registered in England, No. 1588948) with
Registered
>> Offices at: Middlesex House, Meadway Technology Park, Rutherford Close,
>> Stevenage, Hertfordshire, SG1 2EF.  VAT No. 366 9784 84.
>>     
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at
oss.oracle.com>
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090617/9ad2c44b/attachment.html

Ocfs2 users - Jun 2009 - OCFS2 Caused RAC server to crash

[Ocfs2-users] OCFS2 Caused RAC server to crash

[Ocfs2-users] OCFS2 Caused RAC server to crash

[Ocfs2-users] OCFS2 Caused RAC server to crash