thr3ads.net - Ocfs2 users - [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Saul Gabay

2009-Jun-16 11:44 UTC

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

We have a 2 node OCFS2 cluster running Oracle 10g, both nodes crashed.

 

Node 1 because it panic running IOSTAT, the second node crashed with
this error message you can see below. 

 

I was hoping to see a newer version of OCFS2 so I could proceed with the
upgrade if necessary.

 

Have you seen this problem, anybody has resolved this?

 

Node 1 reboots

reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:26          (02:14)

 

Node 2 reboots

reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:29          (02:10)

 

Running Kernel

Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux

 

OCFS2 Version installed

ocfs2console-1.4.1-1.el5

ocfs2-tools-1.4.1-1.el5

ocfs2-2.6.18-92.el5-1.4.1-1.el5

 

Crash analysis:

 

KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux

    DUMPFILE: vmcore  [PARTIAL DUMP]

        CPUS: 8

        DATE: Tue Jun 16 09:15:26 2009

      UPTIME: 2 days, 02:17:01

LOAD AVERAGE: 0.22, 0.31, 0.21

       TASKS: 570

    NODENAME: uscosprdvrtxdb02

     RELEASE: 2.6.18-92.el5

     VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008

     MACHINE: x86_64  (2666 Mhz)

      MEMORY: 11.8 GB

       PANIC: ""

         PID: 28123

     COMMAND: "oracle"

        TASK: ffff8102e25e97e0  [THREAD_INFO: ffff8102cf0ba000]

         CPU: 3

       STATE: TASK_RUNNING (PANIC)

 

 

Kernel messages:

o2net: connection to node uscosprdvrtxdb01 (num 0) at 192.168.5.1:7000
has been idle for 60.0 seconds, shutting it down.

(0,0):o2net_idle_timer:1476 here are some times that might help debug
the situation: (tmr 1245143657.942607 now 1245143717.944198 dr
1245143657.942600 adv 12

45143657.942608:1245143657.942609 func (5010bc9a:505)
1245128670.144972:1245128670.144981)

o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at
192.168.5.1:7000

(28123,3):dlm_do_master_request:1330 ERROR: unhandled error!-----------
[cut here ] --------- [please bite here ] ---------

Kernel BUG at ...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331

invalid opcode: 0000 [1] SMP

last sysfs file:
/devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/
0000:15:00.0/irq

CPU 3

Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase ipmi_si(U)
ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth ocfs2(U)
ocfs2_dlmfs(U

) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U) bonding
ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU)
dm_mirror dm_mul

tipath dm_mod video sbs backlight i2c_ec i2c_core button battery
asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc
bnx2 sg serio_raw shpc

hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod scsi_mod
ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 28123, comm: oracle Tainted: P      2.6.18-92.el5 #1

RIP: 0010:[<ffffffff88652f8a>]  [<ffffffff88652f8a>]
:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c

RSP: 0018:ffff8102cf0bba38  EFLAGS: 00010286

RAX: 000000000000003f RBX: 00000000fffffe00 RCX: ffffffff802ec9a8

RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0

RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09: 0000000000000046

R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000

R13: ffff810316df5c00 R14: ffff810316df5c00 R15: ffff8101bc0625c0

FS:  00002b806dfccc40(0000) GS:ffff81032ff24640(0000)
knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 00000000086229b8 CR3: 00000002cf119000 CR4: 00000000000006e0

Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task
ffff8102e25e97e0)

Stack:  0000000000001f01 3030303030303057 3030303030303030
3435303061323030

 0061316437626364 0000000000000000 0000000000000000 0000000000000000

 0000000000000000 000000008865344a 0000000116df5c00 0000000000000000

Call Trace:

 [<ffffffff88658669>] :ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913

 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186

 [<ffffffff8865dde5>] :ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116

 [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9

 [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad

 [<ffffffff80009523>] __d_lookup+0xb0/0xff

 [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259

 [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d

 [<ffffffff8000cb46>] do_lookup+0x65/0x1d4

 [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb

 [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486

 [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1

 [<ffffffff801458b9>] snprintf+0x44/0x4c

 [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6

 [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394

 [<ffffffff886a75eb>] :ocfs2:ocfs2_add_lockres_tracking+0x73/0x81

 [<ffffffff8000caa4>] do_sync_read+0xc7/0x104

 [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a

 [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc

 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e

 [<ffffffff80027338>] do_filp_open+0x2a/0x38

 [<ffffffff8000b337>] vfs_read+0xcb/0x171

 [<ffffffff800130a3>] sys_pread64+0x50/0x70

 [<ffffffff8005d229>] tracesys+0x71/0xe0

 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

 

 

Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00 00

RIP  [<ffffffff88652f8a>] :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c

 RSP <ffff8102cf0bba38>

 

 

 

 

Saul J. Gabay

Sr. Linux Engineer

IT Infrastructure & Operations

Herbalife International Inc.

310-410-9600 x24341

saulg at herbalife.com

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090616/2b4a8690/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 1120 bytes
Desc: image001.jpg
Url :
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090616/2b4a8690/attachment-0001.jpe

Sunil Mushran

2009-Jun-16 18:58 UTC

head link

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

Please file a bugzilla in oss.oracle.com/bugzilla.

Saul Gabay wrote:>
> We have a 2 node OCFS2 cluster running Oracle 10g, both nodes crashed.
>
>  
>
> Node 1 because it panic running IOSTAT, the second node crashed with 
> this error message you can see below.
>
>  
>
> I was hoping to see a newer version of OCFS2 so I could proceed with 
> the upgrade if necessary.
>
>  
>
> Have you seen this problem, anybody has resolved this?
>
>  
>
> Node 1 reboots
>
> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:26          (02:14)
>
>  
>
> Node 2 reboots
>
> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:29          (02:10)
>
>  
>
> Running Kernel
>
> Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
>
>  
>
> OCFS2 Version installed
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
>  
>
> *_Crash analysis:_*
>
>  
>
> KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
>
>     DUMPFILE: vmcore  [PARTIAL DUMP]
>
>         CPUS: 8
>
>         DATE: Tue Jun 16 09:15:26 2009
>
>       UPTIME: 2 days, 02:17:01
>
> LOAD AVERAGE: 0.22, 0.31, 0.21
>
>        TASKS: 570
>
>     NODENAME: uscosprdvrtxdb02
>
>      RELEASE: 2.6.18-92.el5
>
>      VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
>
>      MACHINE: x86_64  (2666 Mhz)
>
>       MEMORY: 11.8 GB
>
>        *PANIC: ""*
>
> *         PID: 28123*
>
> *     COMMAND: "oracle"*
>
>         TASK: ffff8102e25e97e0  [THREAD_INFO: ffff8102cf0ba000]
>
>          CPU: 3
>
>        *STATE: TASK_RUNNING (PANIC)*
>
>  
>
> *_ _*
>
> *_Kernel messages:_*
>
> *o2net: connection to node uscosprdvrtxdb01 (num 0) at 
> 192.168.5.1:7000 has been idle for 60.0 seconds, shutting it down.*
>
> *(0,0):o2net_idle_timer:1476 here are some times that might help debug 
> the situation: (tmr 1245143657.942607 now 1245143717.944198 dr 
> 1245143657.942600 adv 12*
>
> *45143657.942608:1245143657.942609 func (5010bc9a:505) 
> 1245128670.144972:1245128670.144981)*
>
> *o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at 
> 192.168.5.1:7000*
>
> *(28123,3):dlm_do_master_request:1330 ERROR: unhandled 
> error!----------- [cut here ] --------- [please bite here ] ---------*
>
> *Kernel BUG at ...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331*
>
> *invalid opcode: 0000 [1] SMP*
>
> last sysfs file: 
>
/devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/0000:15:00.0/irq
>
> CPU 3
>
> Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase ipmi_si(U) 
> ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth 
> ocfs2(U) ocfs2_dlmfs(U
>
> ) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U) bonding 
> ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU) 
> dm_mirror dm_mul
>
> tipath dm_mod video sbs backlight i2c_ec i2c_core button battery 
> asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc 
> bnx2 sg serio_raw shpc
>
> hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod scsi_mod 
> ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>
> *Pid: 28123, comm: oracle Tainted: P      2.6.18-92.el5 #1*
>
> RIP: 0010:[<ffffffff88652f8a>]  [<ffffffff88652f8a>] 
> :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>
> RSP: 0018:ffff8102cf0bba38  EFLAGS: 00010286
>
> RAX: 000000000000003f RBX: 00000000fffffe00 RCX: ffffffff802ec9a8
>
> RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0
>
> RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09: 0000000000000046
>
> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
>
> R13: ffff810316df5c00 R14: ffff810316df5c00 R15: ffff8101bc0625c0
>
> FS:  00002b806dfccc40(0000) GS:ffff81032ff24640(0000) 
> knlGS:0000000000000000
>
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>
> CR2: 00000000086229b8 CR3: 00000002cf119000 CR4: 00000000000006e0
>
> Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task 
> ffff8102e25e97e0)
>
> Stack:  0000000000001f01 3030303030303057 3030303030303030 
> 3435303061323030
>
>  0061316437626364 0000000000000000 0000000000000000 0000000000000000
>
>  0000000000000000 000000008865344a 0000000116df5c00 0000000000000000
>
> Call Trace:
>
>  [<ffffffff88658669>] :ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913
>
>  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
>
>  [<ffffffff8865dde5>] :ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116
>
>  [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9
>
>  [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad
>
>  [<ffffffff80009523>] __d_lookup+0xb0/0xff
>
>  [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259
>
>  [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d
>
>  [<ffffffff8000cb46>] do_lookup+0x65/0x1d4
>
>  [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb
>
>  [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486
>
>  [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1
>
>  [<ffffffff801458b9>] snprintf+0x44/0x4c
>
>  [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6
>
>  [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394
>
>  [<ffffffff886a75eb>] :ocfs2:ocfs2_add_lockres_tracking+0x73/0x81
>
>  [<ffffffff8000caa4>] do_sync_read+0xc7/0x104
>
>  [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a
>
>  [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc
>
>  [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
>
>  [<ffffffff80027338>] do_filp_open+0x2a/0x38
>
>  [<ffffffff8000b337>] vfs_read+0xcb/0x171
>
>  [<ffffffff800130a3>] sys_pread64+0x50/0x70
>
>  [<ffffffff8005d229>] tracesys+0x71/0xe0
>
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>  
>
>  
>
> Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00 00
>
> RIP  [<ffffffff88652f8a>]
:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>
>  RSP <ffff8102cf0bba38>
>
> * *
>
>  
>
>  
>
> /*/Saul J. Gabay/*/**
>
> //Sr. Linux Engineer//////
>
> //IT Infrastructure & Operations////
>
> //Herbalife International Inc.////
>
> //310-410-9600 x24341//
>
> //saulg at herbalife.com//
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Saul Gabay

2009-Jun-16 19:50 UTC

head link

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

I reported this incident as a new BUG #1130.

Please treat this as urgent is affecting our production environment
repetitively.

Let me know if more information is needed.

Thank you

Saul

-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Tuesday, June 16, 2009 11:58 AM
To: Saul Gabay
Cc: ocfs2-users at oss.oracle.com; Server Ops_Linux
Subject: Re: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

Please file a bugzilla in oss.oracle.com/bugzilla.

Saul Gabay wrote:>
> We have a 2 node OCFS2 cluster running Oracle 10g, both nodes crashed.
>
>  
>
> Node 1 because it panic running IOSTAT, the second node crashed with 
> this error message you can see below.
>
>  
>
> I was hoping to see a newer version of OCFS2 so I could proceed with 
> the upgrade if necessary.
>
>  
>
> Have you seen this problem, anybody has resolved this?
>
>  
>
> Node 1 reboots
>
> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:26
(02:14)>
>  
>
> Node 2 reboots
>
> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:29
(02:10)>
>  
>
> Running Kernel
>
> Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
>
>  
>
> OCFS2 Version installed
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
>  
>
> *_Crash analysis:_*
>
>  
>
> KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
>
>     DUMPFILE: vmcore  [PARTIAL DUMP]
>
>         CPUS: 8
>
>         DATE: Tue Jun 16 09:15:26 2009
>
>       UPTIME: 2 days, 02:17:01
>
> LOAD AVERAGE: 0.22, 0.31, 0.21
>
>        TASKS: 570
>
>     NODENAME: uscosprdvrtxdb02
>
>      RELEASE: 2.6.18-92.el5
>
>      VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
>
>      MACHINE: x86_64  (2666 Mhz)
>
>       MEMORY: 11.8 GB
>
>        *PANIC: ""*
>
> *         PID: 28123*
>
> *     COMMAND: "oracle"*
>
>         TASK: ffff8102e25e97e0  [THREAD_INFO: ffff8102cf0ba000]
>
>          CPU: 3
>
>        *STATE: TASK_RUNNING (PANIC)*
>
>  
>
> *_ _*
>
> *_Kernel messages:_*
>
> *o2net: connection to node uscosprdvrtxdb01 (num 0) at 
> 192.168.5.1:7000 has been idle for 60.0 seconds, shutting it down.*
>
> *(0,0):o2net_idle_timer:1476 here are some times that might help debug
> the situation: (tmr 1245143657.942607 now 1245143717.944198 dr 
> 1245143657.942600 adv 12*
>
> *45143657.942608:1245143657.942609 func (5010bc9a:505) 
> 1245128670.144972:1245128670.144981)*
>
> *o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at 
> 192.168.5.1:7000*
>
> *(28123,3):dlm_do_master_request:1330 ERROR: unhandled 
> error!----------- [cut here ] --------- [please bite here ] ---------*
>
> *Kernel BUG at
...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331*>
> *invalid opcode: 0000 [1] SMP*
>
> last sysfs file: 
>/devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/
0000:15:00.0/irq>
> CPU 3
>
> Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase ipmi_si(U)
> ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth 
> ocfs2(U) ocfs2_dlmfs(U
>
> ) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U) bonding 
> ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU)
> dm_mirror dm_mul
>
> tipath dm_mod video sbs backlight i2c_ec i2c_core button battery 
> asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc 
> bnx2 sg serio_raw shpc
>
> hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod scsi_mod 
> ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>
> *Pid: 28123, comm: oracle Tainted: P      2.6.18-92.el5 #1*
>
> RIP: 0010:[<ffffffff88652f8a>]  [<ffffffff88652f8a>] 
> :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>
> RSP: 0018:ffff8102cf0bba38  EFLAGS: 00010286
>
> RAX: 000000000000003f RBX: 00000000fffffe00 RCX: ffffffff802ec9a8
>
> RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0
>
> RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09: 0000000000000046
>
> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
>
> R13: ffff810316df5c00 R14: ffff810316df5c00 R15: ffff8101bc0625c0
>
> FS:  00002b806dfccc40(0000) GS:ffff81032ff24640(0000) 
> knlGS:0000000000000000
>
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>
> CR2: 00000000086229b8 CR3: 00000002cf119000 CR4: 00000000000006e0
>
> Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task 
> ffff8102e25e97e0)
>
> Stack:  0000000000001f01 3030303030303057 3030303030303030 
> 3435303061323030
>
>  0061316437626364 0000000000000000 0000000000000000 0000000000000000
>
>  0000000000000000 000000008865344a 0000000116df5c00 0000000000000000
>
> Call Trace:
>
>  [<ffffffff88658669>] :ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913
>
>  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
>
>  [<ffffffff8865dde5>] :ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116
>
>  [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9
>
>  [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad
>
>  [<ffffffff80009523>] __d_lookup+0xb0/0xff
>
>  [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259
>
>  [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d
>
>  [<ffffffff8000cb46>] do_lookup+0x65/0x1d4
>
>  [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb
>
>  [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486
>
>  [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1
>
>  [<ffffffff801458b9>] snprintf+0x44/0x4c
>
>  [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6
>
>  [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394
>
>  [<ffffffff886a75eb>] :ocfs2:ocfs2_add_lockres_tracking+0x73/0x81
>
>  [<ffffffff8000caa4>] do_sync_read+0xc7/0x104
>
>  [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a
>
>  [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc
>
>  [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
>
>  [<ffffffff80027338>] do_filp_open+0x2a/0x38
>
>  [<ffffffff8000b337>] vfs_read+0xcb/0x171
>
>  [<ffffffff800130a3>] sys_pread64+0x50/0x70
>
>  [<ffffffff8005d229>] tracesys+0x71/0xe0
>
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>  
>
>  
>
> Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00 00
>
> RIP  [<ffffffff88652f8a>]
:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>
>  RSP <ffff8102cf0bba38>
>
> * *
>
>  
>
>  
>
> /*/Saul J. Gabay/*/**
>
> //Sr. Linux Engineer//////
>
> //IT Infrastructure & Operations////
>
> //Herbalife International Inc.////
>
> //310-410-9600 x24341//
>
> //saulg at herbalife.com//
>
>  
>
>
------------------------------------------------------------------------>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Saul Gabay

2009-Jun-16 20:27 UTC

head link

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

The error was shown on KDUMP VMCORE browsing the kernel messages log,
the 
output was included on the same BUG #1130

-----Original Message-----
From: Saul Gabay 
Sent: Tuesday, June 16, 2009 12:50 PM
To: Sunil Mushran
Cc: ocfs2-users at oss.oracle.com; Server Ops_Linux
Subject: RE: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

I reported this incident as a new BUG #1130.

Please treat this as urgent is affecting our production environment
repetitively.

Let me know if more information is needed.

Thank you

Saul

-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Tuesday, June 16, 2009 11:58 AM
To: Saul Gabay
Cc: ocfs2-users at oss.oracle.com; Server Ops_Linux
Subject: Re: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

Please file a bugzilla in oss.oracle.com/bugzilla.

Saul Gabay wrote:>
> We have a 2 node OCFS2 cluster running Oracle 10g, both nodes crashed.
>
>  
>
> Node 1 because it panic running IOSTAT, the second node crashed with 
> this error message you can see below.
>
>  
>
> I was hoping to see a newer version of OCFS2 so I could proceed with 
> the upgrade if necessary.
>
>  
>
> Have you seen this problem, anybody has resolved this?
>
>  
>
> Node 1 reboots
>
> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:26
(02:14)>
>  
>
> Node 2 reboots
>
> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:29
(02:10)>
>  
>
> Running Kernel
>
> Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
>
>  
>
> OCFS2 Version installed
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
>  
>
> *_Crash analysis:_*
>
>  
>
> KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
>
>     DUMPFILE: vmcore  [PARTIAL DUMP]
>
>         CPUS: 8
>
>         DATE: Tue Jun 16 09:15:26 2009
>
>       UPTIME: 2 days, 02:17:01
>
> LOAD AVERAGE: 0.22, 0.31, 0.21
>
>        TASKS: 570
>
>     NODENAME: uscosprdvrtxdb02
>
>      RELEASE: 2.6.18-92.el5
>
>      VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
>
>      MACHINE: x86_64  (2666 Mhz)
>
>       MEMORY: 11.8 GB
>
>        *PANIC: ""*
>
> *         PID: 28123*
>
> *     COMMAND: "oracle"*
>
>         TASK: ffff8102e25e97e0  [THREAD_INFO: ffff8102cf0ba000]
>
>          CPU: 3
>
>        *STATE: TASK_RUNNING (PANIC)*
>
>  
>
> *_ _*
>
> *_Kernel messages:_*
>
> *o2net: connection to node uscosprdvrtxdb01 (num 0) at 
> 192.168.5.1:7000 has been idle for 60.0 seconds, shutting it down.*
>
> *(0,0):o2net_idle_timer:1476 here are some times that might help debug
> the situation: (tmr 1245143657.942607 now 1245143717.944198 dr 
> 1245143657.942600 adv 12*
>
> *45143657.942608:1245143657.942609 func (5010bc9a:505) 
> 1245128670.144972:1245128670.144981)*
>
> *o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at 
> 192.168.5.1:7000*
>
> *(28123,3):dlm_do_master_request:1330 ERROR: unhandled 
> error!----------- [cut here ] --------- [please bite here ] ---------*
>
> *Kernel BUG at
...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331*>
> *invalid opcode: 0000 [1] SMP*
>
> last sysfs file: 
>/devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/
0000:15:00.0/irq>
> CPU 3
>
> Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase ipmi_si(U)
> ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth 
> ocfs2(U) ocfs2_dlmfs(U
>
> ) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U) bonding 
> ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU)
> dm_mirror dm_mul
>
> tipath dm_mod video sbs backlight i2c_ec i2c_core button battery 
> asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc 
> bnx2 sg serio_raw shpc
>
> hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod scsi_mod 
> ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>
> *Pid: 28123, comm: oracle Tainted: P      2.6.18-92.el5 #1*
>
> RIP: 0010:[<ffffffff88652f8a>]  [<ffffffff88652f8a>] 
> :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>
> RSP: 0018:ffff8102cf0bba38  EFLAGS: 00010286
>
> RAX: 000000000000003f RBX: 00000000fffffe00 RCX: ffffffff802ec9a8
>
> RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0
>
> RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09: 0000000000000046
>
> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
>
> R13: ffff810316df5c00 R14: ffff810316df5c00 R15: ffff8101bc0625c0
>
> FS:  00002b806dfccc40(0000) GS:ffff81032ff24640(0000) 
> knlGS:0000000000000000
>
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>
> CR2: 00000000086229b8 CR3: 00000002cf119000 CR4: 00000000000006e0
>
> Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task 
> ffff8102e25e97e0)
>
> Stack:  0000000000001f01 3030303030303057 3030303030303030 
> 3435303061323030
>
>  0061316437626364 0000000000000000 0000000000000000 0000000000000000
>
>  0000000000000000 000000008865344a 0000000116df5c00 0000000000000000
>
> Call Trace:
>
>  [<ffffffff88658669>] :ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913
>
>  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
>
>  [<ffffffff8865dde5>] :ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116
>
>  [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9
>
>  [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad
>
>  [<ffffffff80009523>] __d_lookup+0xb0/0xff
>
>  [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259
>
>  [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d
>
>  [<ffffffff8000cb46>] do_lookup+0x65/0x1d4
>
>  [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb
>
>  [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486
>
>  [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1
>
>  [<ffffffff801458b9>] snprintf+0x44/0x4c
>
>  [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6
>
>  [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394
>
>  [<ffffffff886a75eb>] :ocfs2:ocfs2_add_lockres_tracking+0x73/0x81
>
>  [<ffffffff8000caa4>] do_sync_read+0xc7/0x104
>
>  [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a
>
>  [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc
>
>  [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
>
>  [<ffffffff80027338>] do_filp_open+0x2a/0x38
>
>  [<ffffffff8000b337>] vfs_read+0xcb/0x171
>
>  [<ffffffff800130a3>] sys_pread64+0x50/0x70
>
>  [<ffffffff8005d229>] tracesys+0x71/0xe0
>
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>  
>
>  
>
> Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00 00
>
> RIP  [<ffffffff88652f8a>]
:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>
>  RSP <ffff8102cf0bba38>
>
> * *
>
>  
>
>  
>
> /*/Saul J. Gabay/*/**
>
> //Sr. Linux Engineer//////
>
> //IT Infrastructure & Operations////
>
> //Herbalife International Inc.////
>
> //310-410-9600 x24341//
>
> //saulg at herbalife.com//
>
>  
>
>
------------------------------------------------------------------------>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Herbert van den Bergh

2009-Jun-16 21:42 UTC

head link

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

Hello Saul,

Please log a Support Request via Metalink using your Oracle CSI.

Thanks,
Herbert,


Saul Gabay wrote:> I reported this incident as a new BUG #1130.
>
> Please treat this as urgent is affecting our production environment
> repetitively.
>
> Let me know if more information is needed.
>
> Thank you
>
> Saul
>
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
> Sent: Tuesday, June 16, 2009 11:58 AM
> To: Saul Gabay
> Cc: ocfs2-users at oss.oracle.com; Server Ops_Linux
> Subject: Re: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error
>
> Please file a bugzilla in oss.oracle.com/bugzilla.
>
> Saul Gabay wrote:
>   
>> We have a 2 node OCFS2 cluster running Oracle 10g, both nodes crashed.
>>
>>  
>>
>> Node 1 because it panic running IOSTAT, the second node crashed with 
>> this error message you can see below.
>>
>>  
>>
>> I was hoping to see a newer version of OCFS2 so I could proceed with 
>> the upgrade if necessary.
>>
>>  
>>
>> Have you seen this problem, anybody has resolved this?
>>
>>  
>>
>> Node 1 reboots
>>
>> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:26
>>     
> (02:14)
>   
>>  
>>
>> Node 2 reboots
>>
>> reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:29
>>     
> (02:10)
>   
>>  
>>
>> Running Kernel
>>
>> Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 
>> 2008 x86_64 x86_64 x86_64 GNU/Linux
>>
>>  
>>
>> OCFS2 Version installed
>>
>> ocfs2console-1.4.1-1.el5
>>
>> ocfs2-tools-1.4.1-1.el5
>>
>> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>>
>>  
>>
>> *_Crash analysis:_*
>>
>>  
>>
>> KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
>>
>>     DUMPFILE: vmcore  [PARTIAL DUMP]
>>
>>         CPUS: 8
>>
>>         DATE: Tue Jun 16 09:15:26 2009
>>
>>       UPTIME: 2 days, 02:17:01
>>
>> LOAD AVERAGE: 0.22, 0.31, 0.21
>>
>>        TASKS: 570
>>
>>     NODENAME: uscosprdvrtxdb02
>>
>>      RELEASE: 2.6.18-92.el5
>>
>>      VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
>>
>>      MACHINE: x86_64  (2666 Mhz)
>>
>>       MEMORY: 11.8 GB
>>
>>        *PANIC: ""*
>>
>> *         PID: 28123*
>>
>> *     COMMAND: "oracle"*
>>
>>         TASK: ffff8102e25e97e0  [THREAD_INFO: ffff8102cf0ba000]
>>
>>          CPU: 3
>>
>>        *STATE: TASK_RUNNING (PANIC)*
>>
>>  
>>
>> *_ _*
>>
>> *_Kernel messages:_*
>>
>> *o2net: connection to node uscosprdvrtxdb01 (num 0) at 
>> 192.168.5.1:7000 has been idle for 60.0 seconds, shutting it down.*
>>
>> *(0,0):o2net_idle_timer:1476 here are some times that might help debug
>>     
>
>   
>> the situation: (tmr 1245143657.942607 now 1245143717.944198 dr 
>> 1245143657.942600 adv 12*
>>
>> *45143657.942608:1245143657.942609 func (5010bc9a:505) 
>> 1245128670.144972:1245128670.144981)*
>>
>> *o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at 
>> 192.168.5.1:7000*
>>
>> *(28123,3):dlm_do_master_request:1330 ERROR: unhandled 
>> error!----------- [cut here ] --------- [please bite here ] ---------*
>>
>> *Kernel BUG at
>>     
> ...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331*
>   
>> *invalid opcode: 0000 [1] SMP*
>>
>> last sysfs file: 
>>
>>     
> /devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/
> 0000:15:00.0/irq
>   
>> CPU 3
>>
>> Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase ipmi_si(U)
>>     
>
>   
>> ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth 
>> ocfs2(U) ocfs2_dlmfs(U
>>
>> ) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U) bonding 
>> ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU)
>>     
>
>   
>> dm_mirror dm_mul
>>
>> tipath dm_mod video sbs backlight i2c_ec i2c_core button battery 
>> asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc 
>> bnx2 sg serio_raw shpc
>>
>> hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod scsi_mod 
>> ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>>
>> *Pid: 28123, comm: oracle Tainted: P      2.6.18-92.el5 #1*
>>
>> RIP: 0010:[<ffffffff88652f8a>]  [<ffffffff88652f8a>] 
>> :ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>>
>> RSP: 0018:ffff8102cf0bba38  EFLAGS: 00010286
>>
>> RAX: 000000000000003f RBX: 00000000fffffe00 RCX: ffffffff802ec9a8
>>
>> RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0
>>
>> RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09: 0000000000000046
>>
>> R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
>>
>> R13: ffff810316df5c00 R14: ffff810316df5c00 R15: ffff8101bc0625c0
>>
>> FS:  00002b806dfccc40(0000) GS:ffff81032ff24640(0000) 
>> knlGS:0000000000000000
>>
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>
>> CR2: 00000000086229b8 CR3: 00000002cf119000 CR4: 00000000000006e0
>>
>> Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task 
>> ffff8102e25e97e0)
>>
>> Stack:  0000000000001f01 3030303030303057 3030303030303030 
>> 3435303061323030
>>
>>  0061316437626364 0000000000000000 0000000000000000 0000000000000000
>>
>>  0000000000000000 000000008865344a 0000000116df5c00 0000000000000000
>>
>> Call Trace:
>>
>>  [<ffffffff88658669>]
:ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913
>>
>>  [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
>>
>>  [<ffffffff8865dde5>] :ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116
>>
>>  [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9
>>
>>  [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad
>>
>>  [<ffffffff80009523>] __d_lookup+0xb0/0xff
>>
>>  [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259
>>
>>  [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d
>>
>>  [<ffffffff8000cb46>] do_lookup+0x65/0x1d4
>>
>>  [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb
>>
>>  [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486
>>
>>  [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1
>>
>>  [<ffffffff801458b9>] snprintf+0x44/0x4c
>>
>>  [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6
>>
>>  [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394
>>
>>  [<ffffffff886a75eb>] :ocfs2:ocfs2_add_lockres_tracking+0x73/0x81
>>
>>  [<ffffffff8000caa4>] do_sync_read+0xc7/0x104
>>
>>  [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a
>>
>>  [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc
>>
>>  [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
>>
>>  [<ffffffff80027338>] do_filp_open+0x2a/0x38
>>
>>  [<ffffffff8000b337>] vfs_read+0xcb/0x171
>>
>>  [<ffffffff800130a3>] sys_pread64+0x50/0x70
>>
>>  [<ffffffff8005d229>] tracesys+0x71/0xe0
>>
>>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>>
>>  
>>
>>  
>>
>> Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00 00
>>
>> RIP  [<ffffffff88652f8a>]
:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
>>
>>  RSP <ffff8102cf0bba38>
>>
>> * *
>>
>>  
>>
>>  
>>
>> /*/Saul J. Gabay/*/**
>>
>> //Sr. Linux Engineer//////
>>
>> //IT Infrastructure & Operations////
>>
>> //Herbalife International Inc.////
>>
>> //310-410-9600 x24341//
>>
>> //saulg at herbalife.com//
>>
>>  
>>
>>
>>     
> ------------------------------------------------------------------------
>   
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>     
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   -------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090616/ddbc8348/attachment-0001.html

Saul Gabay

2009-Jun-17 00:39 UTC

head link

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

Confirmation 

________________________________

The following Service Request has been created:

 

SR Number

  7561997.993

Priority

  1

SR Submitted Date

  16-Jun-2009 17:38:17 GMT






This SR will be assigned to a support analyst during normal business
hours in your country

 

 

________________________________

From: Herbert van den Bergh [mailto:herbert.van.den.bergh at oracle.com] 
Sent: Tuesday, June 16, 2009 2:43 PM
To: Saul Gabay
Cc: Sunil Mushran; Server Ops_Linux; ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

 


Hello Saul,

Please log a Support Request via Metalink using your Oracle CSI.

Thanks,
Herbert,


Saul Gabay wrote: 

I reported this incident as a new BUG #1130.
 
Please treat this as urgent is affecting our production environment
repetitively.
 
Let me know if more information is needed.
 
Thank you
 
Saul
 
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Tuesday, June 16, 2009 11:58 AM
To: Saul Gabay
Cc: ocfs2-users at oss.oracle.com; Server Ops_Linux
Subject: Re: [Ocfs2-users] OCFS2 1.4.1 DLM unhandled error
 
Please file a bugzilla in oss.oracle.com/bugzilla.
 
Saul Gabay wrote:
  

	We have a 2 node OCFS2 cluster running Oracle 10g, both nodes
crashed.
	 
	 
	 
	Node 1 because it panic running IOSTAT, the second node crashed
with 
	this error message you can see below.
	 
	 
	 
	I was hoping to see a newer version of OCFS2 so I could proceed
with 
	the upgrade if necessary.
	 
	 
	 
	Have you seen this problem, anybody has resolved this?
	 
	 
	 
	Node 1 reboots
	 
	reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:26
	    

(02:14)
  

	 
	 
	Node 2 reboots
	 
	reboot   system boot  2.6.18-92.el5    Tue Jun 16 09:29
	    

(02:10)
  

	 
	 
	Running Kernel
	 
	Linux uscosprdvrtxdb02 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15
EDT 
	2008 x86_64 x86_64 x86_64 GNU/Linux
	 
	 
	 
	OCFS2 Version installed
	 
	ocfs2console-1.4.1-1.el5
	 
	ocfs2-tools-1.4.1-1.el5
	 
	ocfs2-2.6.18-92.el5-1.4.1-1.el5
	 
	 
	 
	*_Crash analysis:_*
	 
	 
	 
	KERNEL: /usr/lib/debug/lib/modules/2.6.18-92.el5/vmlinux
	 
	    DUMPFILE: vmcore  [PARTIAL DUMP]
	 
	        CPUS: 8
	 
	        DATE: Tue Jun 16 09:15:26 2009
	 
	      UPTIME: 2 days, 02:17:01
	 
	LOAD AVERAGE: 0.22, 0.31, 0.21
	 
	       TASKS: 570
	 
	    NODENAME: uscosprdvrtxdb02
	 
	     RELEASE: 2.6.18-92.el5
	 
	     VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
	 
	     MACHINE: x86_64  (2666 Mhz)
	 
	      MEMORY: 11.8 GB
	 
	       *PANIC: ""*
	 
	*         PID: 28123*
	 
	*     COMMAND: "oracle"*
	 
	        TASK: ffff8102e25e97e0  [THREAD_INFO: ffff8102cf0ba000]
	 
	         CPU: 3
	 
	       *STATE: TASK_RUNNING (PANIC)*
	 
	 
	 
	*_ _*
	 
	*_Kernel messages:_*
	 
	*o2net: connection to node uscosprdvrtxdb01 (num 0) at 
	192.168.5.1:7000 has been idle for 60.0 seconds, shutting it
down.*
	 
	*(0,0):o2net_idle_timer:1476 here are some times that might help
debug
	    

 
  

	the situation: (tmr 1245143657.942607 now 1245143717.944198 dr 
	1245143657.942600 adv 12*
	 
	*45143657.942608:1245143657.942609 func (5010bc9a:505) 
	1245128670.144972:1245128670.144981)*
	 
	*o2net: no longer connected to node uscosprdvrtxdb01 (num 0) at 
	192.168.5.1:7000*
	 
	*(28123,3):dlm_do_master_request:1330 ERROR: unhandled 
	error!----------- [cut here ] --------- [please bite here ]
---------*
	 
	*Kernel BUG at
	    

...mushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:1331*
  

	*invalid opcode: 0000 [1] SMP*
	 
	last sysfs file: 
	 
	    

/devices/pci0000:00/0000:00:05.0/0000:10:00.0/0000:11:01.0/0000:14:00.0/
0000:15:00.0/irq
  

	CPU 3
	 
	Modules linked in: nfs lockd fscache nfs_acl mptctl mptbase
ipmi_si(U)
	    

 
  

	ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth 
	ocfs2(U) ocfs2_dlmfs(U
	 
	) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs sunrpc hp_ilo(U)
bonding 
	ipv6 xfrm_nalgo crypto_api emcpdm(PU) emcpgpx(PU) emcpmpx(PU)
emcp(PU)
	    

 
  

	dm_mirror dm_mul
	 
	tipath dm_mod video sbs backlight i2c_ec i2c_core button battery

	asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac
edac_mc 
	bnx2 sg serio_raw shpc
	 
	hp pcspkr usb_storage lpfc scsi_transport_fc cciss(U) sd_mod
scsi_mod 
	ext3 jbd uhci_hcd ohci_hcd ehci_hcd
	 
	*Pid: 28123, comm: oracle Tainted: P      2.6.18-92.el5 #1*
	 
	RIP: 0010:[<ffffffff88652f8a>]  [<ffffffff88652f8a>] 
	:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
	 
	RSP: 0018:ffff8102cf0bba38  EFLAGS: 00010286
	 
	RAX: 000000000000003f RBX: 00000000fffffe00 RCX:
ffffffff802ec9a8
	 
	RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI:
ffffffff802ec9a0
	 
	RBP: ffff8101b98d3e40 R08: ffffffff802ec9a8 R09:
0000000000000046
	 
	R10: 0000000000000000 R11: 0000000000000080 R12:
0000000000000000
	 
	R13: ffff810316df5c00 R14: ffff810316df5c00 R15:
ffff8101bc0625c0
	 
	FS:  00002b806dfccc40(0000) GS:ffff81032ff24640(0000) 
	knlGS:0000000000000000
	 
	CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
	 
	CR2: 00000000086229b8 CR3: 00000002cf119000 CR4:
00000000000006e0
	 
	Process oracle (pid: 28123, threadinfo ffff8102cf0ba000, task 
	ffff8102e25e97e0)
	 
	Stack:  0000000000001f01 3030303030303057 3030303030303030 
	3435303061323030
	 
	 0061316437626364 0000000000000000 0000000000000000
0000000000000000
	 
	 0000000000000000 000000008865344a 0000000116df5c00
0000000000000000
	 
	Call Trace:
	 
	 [<ffffffff88658669>]
:ocfs2_dlm:dlm_get_lock_resource+0xa5e/0x1913
	 
	 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
	 
	 [<ffffffff8865dde5>]
:ocfs2_dlm:dlm_wait_for_recovery+0xa1/0x116
	 
	 [<ffffffff88650c46>] :ocfs2_dlm:dlmlock+0x731/0x11f9
	 
	 [<ffffffff886a5ad0>] :ocfs2:ocfs2_cluster_unlock+0x240/0x2ad
	 
	 [<ffffffff80009523>] __d_lookup+0xb0/0xff
	 
	 [<ffffffff886a17d8>] :ocfs2:ocfs2_dentry_revalidate+0x111/0x259
	 
	 [<ffffffff886a69c1>] :ocfs2:ocfs2_init_mask_waiter+0x24/0x3d
	 
	 [<ffffffff8000cb46>] do_lookup+0x65/0x1d4
	 
	 [<ffffffff886a7e00>] :ocfs2:ocfs2_cluster_lock+0x354/0x7eb
	 
	 [<ffffffff886a9a5c>] :ocfs2:ocfs2_locking_ast+0x0/0x486
	 
	 [<ffffffff886acfd2>] :ocfs2:ocfs2_blocking_ast+0x0/0x2c1
	 
	 [<ffffffff801458b9>] snprintf+0x44/0x4c
	 
	 [<ffffffff886ac242>] :ocfs2:ocfs2_rw_lock+0x10f/0x1d6
	 
	 [<ffffffff886b0159>] :ocfs2:ocfs2_file_aio_read+0x128/0x394
	 
	 [<ffffffff886a75eb>]
:ocfs2:ocfs2_add_lockres_tracking+0x73/0x81
	 
	 [<ffffffff8000caa4>] do_sync_read+0xc7/0x104
	 
	 [<ffffffff886aedcc>] :ocfs2:ocfs2_init_file_private+0x4d/0x5a
	 
	 [<ffffffff8001e35e>] __dentry_open+0x101/0x1dc
	 
	 [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
	 
	 [<ffffffff80027338>] do_filp_open+0x2a/0x38
	 
	 [<ffffffff8000b337>] vfs_read+0xcb/0x171
	 
	 [<ffffffff800130a3>] sys_pread64+0x50/0x70
	 
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
	 
	 
	 
	 
	 
	Code: 0f 0b 68 de 85 66 88 c2 33 05 48 b8 00 09 00 00 01 00 00
00
	 
	RIP  [<ffffffff88652f8a>]
:ocfs2_dlm:dlm_do_master_request+0x2f1/0x61c
	 
	 RSP <ffff8102cf0bba38>
	 
	* *
	 
	 
	 
	 
	 
	/*/Saul J. Gabay/*/**
	 
	//Sr. Linux Engineer//////
	 
	//IT Infrastructure & Operations////
	 
	//Herbalife International Inc.////
	 
	//310-410-9600 x24341//
	 
	//saulg at herbalife.com//
	 
	 
	 
	 
	    

------------------------------------------------------------------------
  

	_______________________________________________
	Ocfs2-users mailing list
	Ocfs2-users at oss.oracle.com
	http://oss.oracle.com/mailman/listinfo/ocfs2-users
	    

 
 
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090616/6ee2a348/attachment-0001.html

Ocfs2 users - Jun 2009 - OCFS2 1.4.1 DLM unhandled error

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error

[Ocfs2-users] OCFS2 1.4.1 DLM unhandled error