thr3ads.net - Ocfs2 users - [Ocfs2-users] kernel panic

If this information is useful, please help other people find it:
Share via:

Consulente3

2007-Jan-22 01:40 UTC

[Ocfs2-users] kernel panic - not syncing

Hi all, 

my test environment, is composed by 2 server with centos 4.4
nodes is exporting with aoe6-43 + vblade-14

kernel-2.6.9-42.0.3.EL
ocfs2-tools-1.2.2-1
ocfs2console-1.2.2-1
ocfs2-2.6.9-42.0.3.EL-1.2.3-1

/dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)

Device                FS     Nodes
/dev/etherd/e2.0      ocfs2  ocfs2, becks
/dev/etherd/e3.0      ocfs2  ocfs2, becks

Device                FS     UUID                                  Label
/dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  test1
/dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4

O2CB_HEARTBEAT_THRESHOLD=31

when i try to make stress test:

Index 4: took 0 ms to do checking slots
Index 5: took 2 ms to do waiting for write completion
Index 6: took 1995 ms to do msleep
Index 7: took 0 ms to do allocating bios for read
Index 8: took 0 ms to do bio alloc read
Index 9: took 0 ms to do bio add page read
Index 10: took 0 ms to do submit_bio for read
Index 11: took 2 ms to do waiting for read completion
Index 12: took 0 ms to do bio alloc write
Index 13: took 0 ms to do bio add page write
Index 14: took 0 ms to do submit_bio for write
Index 15: took 0 ms to do checking slots
Index 16: took 1 ms to do waiting for write completion
Index 17: took 1996 ms to do msleep
Index 18: took 0 ms to do allocating bios for read
Index 19: took 0 ms to do bio allo read
Index 20: took 0 ms to do bio add page read
Index 21: took 0 ms to do submit_bio for read
Index 22: took 10001 ms to do waiting for read completion
(3,0):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active
regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
system by panicing


<6>o2net: connection to node ocfs2 (num 2) at 10.1.7.107:777 has been
idle for 10 seconds, shutting it down
(3,0): o2net_idle_timer:1309 here are some times that might help debug
the situation:
(tmr: 1169487957.71650 now 1169487967.69569 dr 1169487962.88883 adv
1169487957.71671:1159487957.71674
func 83bce37b2:505) 1169487901.984644:1169487901.984676)

the kernel panic occurs always on the same node, and the other node
still responding

thanks!

Srinivas Eeda

2007-Jan-22 09:30 UTC

head link

[Ocfs2-users] kernel panic - not syncing

problem appears to be that IO is taking more time than effective
O2CB_HEARTBEAT_THRESHOLD. Your configured value "31" doesn't seem
to be effective?

Index 6: took 1995 ms to do msleepIndex 
Index 17: took 1996 ms to do msleep
Index 22: took 10001 ms to do waiting for read completion.

Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify. 

Thanks,
--Srini.




Consulente3 wrote:> Hi all, 
>
> my test environment, is composed by 2 server with centos 4.4
> nodes is exporting with aoe6-43 + vblade-14
>
> kernel-2.6.9-42.0.3.EL
> ocfs2-tools-1.2.2-1
> ocfs2console-1.2.2-1
> ocfs2-2.6.9-42.0.3.EL-1.2.3-1
>
> /dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
> /dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)
>
> Device                FS     Nodes
> /dev/etherd/e2.0      ocfs2  ocfs2, becks
> /dev/etherd/e3.0      ocfs2  ocfs2, becks
>
> Device                FS     UUID                                  Label
> /dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  test1
> /dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4
>
> O2CB_HEARTBEAT_THRESHOLD=31
>
> when i try to make stress test:
>
> Index 4: took 0 ms to do checking slots
> Index 5: took 2 ms to do waiting for write completion
> Index 6: took 1995 ms to do msleep
> Index 7: took 0 ms to do allocating bios for read
> Index 8: took 0 ms to do bio alloc read
> Index 9: took 0 ms to do bio add page read
> Index 10: took 0 ms to do submit_bio for read
> Index 11: took 2 ms to do waiting for read completion
> Index 12: took 0 ms to do bio alloc write
> Index 13: took 0 ms to do bio add page write
> Index 14: took 0 ms to do submit_bio for write
> Index 15: took 0 ms to do checking slots
> Index 16: took 1 ms to do waiting for write completion
> Index 17: took 1996 ms to do msleep
> Index 18: took 0 ms to do allocating bios for read
> Index 19: took 0 ms to do bio allo read
> Index 20: took 0 ms to do bio add page read
> Index 21: took 0 ms to do submit_bio for read
> Index 22: took 10001 ms to do waiting for read completion
> (3,0):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active
> regions.
> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> system by panicing
>
>
> <6>o2net: connection to node ocfs2 (num 2) at 10.1.7.107:777 has been
> idle for 10 seconds, shutting it down
> (3,0): o2net_idle_timer:1309 here are some times that might help debug
> the situation:
> (tmr: 1169487957.71650 now 1169487967.69569 dr 1169487962.88883 adv
> 1169487957.71671:1159487957.71674
> func 83bce37b2:505) 1169487901.984644:1169487901.984676)
>
> the kernel panic occurs always on the same node, and the other node
> still responding
>
> thanks!
>                                                                  
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Consulente3

2007-Jan-23 06:17 UTC

head link

[Ocfs2-users] kernel panic - not syncing

I can reprodute it, every time on heavy IO 

I have read this FAQ:
I encounter "Kernel panic - not syncing: ocfs2 is very sorry to be fencing
this system by panicing" whenever I run a heavy io load?

so, i have append on node "becks" , the string
"elevator=deadline" on boot
node ocfs2, has the default rh IO scheduler 

This is my last panic on node becks:

 Index 19: took 0 ms to do bio add page read
Index 20: took 0 ms to do submit_bio for read
Index 21: took 36 ms to do  waiting for read completion
Index 22: took 0 ms to do bio alloc write
Index 23: took 0 ms to do bio add page write
Index 0: took 0 ms to do submit_bio for write
Index 1: took 0 ms to do checking slots
Index 2: took 1 ms to do waiting for write completion
Index 3: took 1962 ms to do msleep
Index 4: took 0 ms allocating bios for read
Index 5: took 0 ms to do bio alloc read
Index 6: took 0 ms to do bio add page read
Index 7: took 0 ms to do submit_bio for read
Index 8: took 9362 ms to do waiting for read completion
Index 9: took 0 ms to do bio alloc write
Index 10: took 0 ms to do add page write
Index 11: took 0 ms to do submit_bio for write
Index 12: took 0 ms to do checking slots
Index 13: took 48665 ms to do waiting for write completion
(3,0):02hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active regions
.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by
panicing

Other info:

[root@ocfs2 ~]# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
31
[root@becks ~]# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
31

[root@ocfs2 ~]# mount -t ocfs2
/dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)

[root@becks ~]# mount -t ocfs2
/dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)

[root@ocfs2 ~]# /etc/init.d/ocfs2 status
Active OCFS2 mountpoints:  /ocfs2 /ocfs2_nfs

[root@becks ~]# /etc/init.d/ocfs2 status
Active OCFS2 mountpoints:  /ocfs2 /ocfs2_nfs

[root@ocfs2 ~]# mounted.ocfs2 -f
Device                FS     Nodes
/dev/etherd/e2.0      ocfs2  ocfs2, becks
/dev/etherd/e3.0      ocfs2  ocfs2, becks

[root@becks ~]#  mounted.ocfs2 -f
Device                FS     Nodes
/dev/etherd/e3.0      ocfs2  ocfs2, becks
/dev/etherd/e2.0      ocfs2  ocfs2, becks

[root@ocfs2 ~]# mounted.ocfs2 -d
Device                FS     UUID                                  Label
/dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  seceti
/dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4

[root@becks ~]# mounted.ocfs2 -d
Device                FS     UUID                                  Label
/dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4
/dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  seceti

i can panic the nodes, also detaching the network cable... 

If you have any more debugging questions, feel free to ask me
thanks

-----Messaggio originale-----
Da: Srinivas Eeda [mailto:srinivas.eeda@oracle.com] 
Inviato: luned? 22 gennaio 2007 18.30
A: Consulente3
Cc: ocfs2-users@oss.oracle.com
Oggetto: Re: [Ocfs2-users] kernel panic - not syncing

problem appears to be that IO is taking more time than effective
O2CB_HEARTBEAT_THRESHOLD. Your configured value "31" doesn't seem
to be effective?

Index 6: took 1995 ms to do msleepIndex 
Index 17: took 1996 ms to do msleep
Index 22: took 10001 ms to do waiting for read completion.

Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify.

Ocfs2 users - Jan 2007 - kernel panic - not syncing

[Ocfs2-users] kernel panic - not syncing

[Ocfs2-users] kernel panic - not syncing

[Ocfs2-users] kernel panic - not syncing