thr3ads.net - Ocfs2 users - [Ocfs2-users] Kernel panic due to ocfs2 [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Ramappa, Ravi (NSN - IN/Bangalore)

2013-Feb-26 06:07 UTC

[Ocfs2-users] Kernel panic due to ocfs2

Hi,

In a 13 node cluster, the first four nodes went into kernel panic state. The
/var/log/messages contained messages as below,

Feb 25 22:02:46 prod152 kernel: (o2net,9470,5):dlm_assert_master_handler:1817
ERROR: DIE! Mastery assert from 4, but current owner is 2!
(O000000000000000c36706200000000)
Feb 25 22:02:46 prod152 kernel: lockres: O000000000000000c36706200000000,
owner=2, state=0
Feb 25 22:02:46 prod152 kernel:   last used: 0, refcnt: 3, on purge list: no
Feb 25 22:02:46 prod152 kernel:   on dirty list: no, on reco list: no, migrating
pending: no
Feb 25 22:02:46 prod152 kernel:   inflight locks: 0, asts reserved: 0
Feb 25 22:02:46 prod152 kernel:   refmap nodes: [ ], inflight=0
Feb 25 22:02:46 prod152 kernel:   granted queue:
Feb 25 22:02:46 prod152 kernel:     type=3, conv=-1, node=2, cookie=2:222205,
ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n,cancel=n,unlock=n)
Feb 25 22:02:46 prod152 kernel:   converting queue:
Feb 25 22:02:46 prod152 kernel:   blocked queue:
Feb 25 22:02:46 prod152 kernel: ----------- [cut here ] --------- [please bite
here ] ---------
Feb 25 22:02:46 prod152 kernel: Kernel BUG at
.../build/BUILD/ocfs2-1.4.10/fs/ocfs2/dlm/dlmmaster.c:1819
Feb 25 22:02:46 prod152 kernel: invalid opcode: 0000 [1] SMP
Feb 25 22:02:46 prod152 kernel: last sysfs file:
/block/cciss!c0d0/cciss!c0d0p1/stat
Feb 25 22:02:46 prod152 kernel: CPU 5
Feb 26 09:50:27 prod152 syslogd 1.4.1: restart.

The OCFS2 rpm versions used are as below,

[root at prod152 ~]# uname -r
2.6.18-308.1.1.el5

[root at prod152 ~]# rpm -qa| grep ocfs
ocfs2-2.6.18-308.1.1.el5xen-1.4.10-1.el5
ocfs2-tools-devel-1.6.3-2.el5
ocfs2-2.6.18-308.1.1.el5-1.4.10-1.el5
ocfs2-tools-debuginfo-1.6.3-2.el5
ocfs2-tools-1.6.3-2.el5
ocfs2console-1.6.3-2.el5
ocfs2-2.6.18-308.1.1.el5debug-1.4.10-1.el5

root at prod152 ~]# cat /etc/ocfs2/cluster.conf
node:
        ip_port = 7777
        ip_address = 10.10.10.150
        number = 0
        name = prod150
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.151
        number = 1
        name = prod151
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.152
        number = 2
        name = prod152
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.153
        number = 3
        name = prod153
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.106
        number = 4
        name = prod106
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.107
        number = 5
        name = prod107
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.155
        number = 6
        name = prod155
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.156
        number = 7
        name = prod156
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.157
        number = 8
        name = prod157
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.158
        number = 9
        name = prod158
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.51
        number = 10
        name = prod51
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.52
        number = 11
        name = prod52
        cluster = ocfs2
node:
        ip_port = 7777
        ip_address = 10.10.10.154
        number = 12
        name = prod154
        cluster = ocfs2
cluster:
        node_count =13
        name = ocfs2
[root at prod152 ~]#

Is this a known issue ? Any issues in the configuration ?

Thanks,

Ravi




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20130226/72b8e1bb/attachment.html

Srinivas Eeda

2013-Feb-26 06:31 UTC

head link

[Ocfs2-users] Kernel panic due to ocfs2

This is due to a race in lock mastery/purge. I have recently fixed this 
problem but haven't yet submitted the patch to mainline. Please file a 
Service request with Oracle to get a one-off fix.

On 02/25/2013 10:07 PM, Ramappa, Ravi (NSN - IN/Bangalore)
wrote:> Hi,
> In a 13 node cluster, the first four nodes went into kernel panic 
> state. The /var/log/messages contained messages as below,
> Feb 25 22:02:46 prod152 kernel: 
> (o2net,9470,5):dlm_assert_master_handler:1817 ERROR: DIE! Mastery 
> assert from 4, but current owner is 2! (O000000000000000c36706200000000)
> Feb 25 22:02:46 prod152 kernel: lockres: 
> O000000000000000c36706200000000, owner=2, state=0
> Feb 25 22:02:46 prod152 kernel:   last used: 0, refcnt: 3, on purge 
> list: no
> Feb 25 22:02:46 prod152 kernel:   on dirty list: no, on reco list: no, 
> migrating pending: no
> Feb 25 22:02:46 prod152 kernel:   inflight locks: 0, asts reserved: 0
> Feb 25 22:02:46 prod152 kernel:   refmap nodes: [ ], inflight=0
> Feb 25 22:02:46 prod152 kernel:   granted queue:
> Feb 25 22:02:46 prod152 kernel:     type=3, conv=-1, node=2, 
> cookie=2:222205, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), 
> pending=(conv=n,lock=n,cancel=n,unlock=n)
> Feb 25 22:02:46 prod152 kernel:   converting queue:
> Feb 25 22:02:46 prod152 kernel:   blocked queue:
> Feb 25 22:02:46 prod152 kernel: ----------- [cut here ] --------- 
> [please bite here ] ---------
> Feb 25 22:02:46 prod152 kernel: Kernel BUG at 
> .../build/BUILD/ocfs2-1.4.10/fs/ocfs2/dlm/dlmmaster.c:1819
> Feb 25 22:02:46 prod152 kernel: invalid opcode: 0000 [1] SMP
> Feb 25 22:02:46 prod152 kernel: last sysfs file: 
> /block/cciss!c0d0/cciss!c0d0p1/stat
> Feb 25 22:02:46 prod152 kernel: CPU 5
> Feb 26 09:50:27 prod152 syslogd 1.4.1: restart.
> The OCFS2 rpm versions used are as below,
> [root at prod152 ~]# uname -r
> 2.6.18-308.1.1.el5
> [root at prod152 ~]# rpm -qa| grep ocfs
> ocfs2-2.6.18-308.1.1.el5xen-1.4.10-1.el5
> ocfs2-tools-devel-1.6.3-2.el5
> ocfs2-2.6.18-308.1.1.el5-1.4.10-1.el5
> ocfs2-tools-debuginfo-1.6.3-2.el5
> ocfs2-tools-1.6.3-2.el5
> ocfs2console-1.6.3-2.el5
> ocfs2-2.6.18-308.1.1.el5debug-1.4.10-1.el5
> root at prod152 ~]# cat /etc/ocfs2/cluster.conf
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.150
>         number = 0
>         name = prod150
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.151
>         number = 1
>         name = prod151
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.152
>         number = 2
>         name = prod152
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.153
>         number = 3
>         name = prod153
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.106
>         number = 4
>         name = prod106
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.107
>         number = 5
>         name = prod107
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.155
>         number = 6
>         name = prod155
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.156
>         number = 7
>         name = prod156
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.157
>         number = 8
>         name = prod157
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.158
>         number = 9
>         name = prod158
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.51
>         number = 10
>         name = prod51
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.52
>         number = 11
>         name = prod52
>         cluster = ocfs2
> node:
>         ip_port = 7777
>         ip_address = 10.10.10.154
>         number = 12
>         name = prod154
>         cluster = ocfs2
> cluster:
>         node_count =13
>         name = ocfs2
> [root at prod152 ~]#
> Is this a known issue ? Any issues in the configuration ?
> Thanks,
> Ravi
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20130225/3332dd7e/attachment-0001.html

Ocfs2 users - Feb 2013 - Kernel panic due to ocfs2

[Ocfs2-users] Kernel panic due to ocfs2

[Ocfs2-users] Kernel panic due to ocfs2