thr3ads.net - Ocfs2 users - [Ocfs2-users] kernel panics on sles 10 rc3 [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Steve Feehan

2006-Jul-14 13:40 UTC

[Ocfs2-users] kernel panics on sles 10 rc3

I've just setup ocfs2 on a shared iSCSI disk (from a NetApp) on SLES
10 RC3. Both clients are Xen guests. Perhaps I should direct this
question to a SUSE list, but I hoped that someone here might be able
to offer guidance.

The configuration was simple and I had a working setup very quickly.
Unfortunately each time I reboot one of the nodes it panics during
shutdown. For example, I've included the shutdown output at the end of
this mail.

I can often (not always) trigger the panic by doing:

slesvm1:~ # /etc/init.d/o2cb status
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster ocfs2: Online
Checking heartbeat: Active
slesvm1:~ #
slesvm1:~ # mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sda1 on /oracle type ocfs2 (rw,_netdev,heartbeat=local)
slesvm1:~ #
slesvm1:~ # /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2)                          done
slesvm1:~ #
slesvm1:~ # mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
(reverse-i-search)`stop': /etc/init.d/ocfs2 stop
(reverse-i-search)`':
slesvm1:~ #
slesvm1:~ # /etc/init.d/o2cb stop
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
slesvm1:~ #
slesvm1:~ #
slesvm1:~ # Oops: 0000 [#1]
SMP
last sysfs file: /block/sda/removable
Modules linked in: sg sd_mod ipv6 iscsi_tcp libiscsi
scsi_transport_iscsi scsi_mod apparmor aamatch_pcre loop dm_mod
reiserfs xenblk xennet
CPU:    0
EIP:    0061:[<c0127491>]    Not tainted VLI
EFLAGS: 00210083   (2.6.16.20-0.12-xen #1)
EIP is at cascade+0x11/0x40
eax: c1213c80   ebx: d1241d6c   ecx: 0000000a   edx: c121448c
esi: c12144dc   edi: c1213c80   ebp: 0000000a   esp: c0383ec8
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
Stack: <0>00000000 c1214478 c1213c80 c0383ef8 c0128510 00000000
00000000 00000000
       44b79c9c 00002b39 00000000 00000000 c0383ef8 c0383ef8 00000001 c036e108
       c0382000 c03ab180 c01234f5 c03ade40 0000000a 00000000 c0382000 00000001
Call Trace:
 [<c0128510>] run_timer_softirq+0xb0/0x1c0
 [<c01234f5>] __do_softirq+0x85/0x110
 [<c0123605>] do_softirq+0x85/0x90
 [<c010687c>] do_IRQ+0x3c/0x70
 [<c024d111>] evtchn_do_upcall+0x91/0xb0
 [<c01050e8>] hypervisor_callback+0x2c/0x34
 [<c0102f5d>] xen_idle+0x4d/0xb0
 [<c01030e6>] cpu_idle+0x66/0xe0
 [<c038476f>] start_kernel+0x2ef/0x3a0
 [<c0384210>] unknown_bootoption+0x0/0x270
Code: 71 14 e8 f3 fd ff ff 8b 0b 39 cb 75 dd 5b 5e c3 8d 76 00 8d bc
27 00 00 00 00 55 89 cd 57 89 c7 56 8d 34 ca 53 8b 1e 39 de 74 14 <39>
7b 14 89 da 75 19 8b 1b 89 f8 e8 bf fd ff ff 39 de 75 ec 89
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Does anyone have an idea what the problem might be? Any additional
information I can provide that might help to track it down?

Thanks in advance for any input.

Steve



Example shutdown output:
----------------------------------------------------------------------------------------
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Boot logging started on /dev/tty1(/dev/console) at Thu Jul 13 08:28:20 2006
Master Resource Control: previous runlevel: 5, switching to runlevel:6
Shutting down CRON daemon                                            done
Shutting down auditd                                                 done
Shutting down irqbalance                                             done
Shutting down cupsd                                                  done
Unloading AppArmor profiles                                          done
Shutting down ZENworks Management Daemon                             done
Shutting down Name Service Cache Daemon                              done
Shutting down mail service (Postfix)                                 done
Saving random seed                                                   done
Umount SMB/ CIFS File Systems                                        done
Shutting down slpd                                                   done
Shutting down service gdm                                            done
Shutting down powersaved                                             done
Stopping Oracle Cluster File System (OCFS2)                          done
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
Shutting down SSH daemon                                             done
Remove Net File System (NFS)                                         unused
Shutting down RPC portmap daemon                                     done
Logging out from iqn.1992-08.com.netapp:sn.84166997:                 done
Stopping iSCSI initiator service:                                    done
Shutting down syslog services                                        done
Shutting down network interfaces:
    eth0
    eth0      configuration: eth-id-00:16:3e:dc:9b:b8                done
Shutting down service network  .  .  .  .  .  .  .  .  .  .  .  .  . done.
Shutting down HAL daemon                                             done
Shutting down D-BUS daemon                                           done
Shutting down resource manager                                       done
Running /etc/init.d/halt.local                                       done
Sending all processes the TERM signal...                             done
Sending all processes the KILL signal...                             done
Turning off swap                                                     done
Unloading AppArmor profiles                                          done
                                                                     done
Unmounting file systems
securityfs umounted
devpts umounted
debugfs umounted
sysfs umounted
/dev/hda2 umounted                                                   done
                                                                     done
Shutting down MD Raid                                                done
Stopping udevd:                                                      done
proc umounted
Unable to handle kernel paging request at virtual address d13f1d6c
 printing eip:
c01272c1
*pde = ma 06093067 pa 009cc067
*pte = ma 00000000 pa fffff000
Oops: 0002 [#1]
SMP
last sysfs file: /class/net/eth0/address
Modules linked in: joydev st sr_mod ide_cd cdrom ide_core xfs_quota
xfs exportfs sg sd_mod xt_pkttype ipt_LOG xt_limit scsi_mod
ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat
ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables
ip6table_filter ip6_tables x_tables ipv6 apparmor aamatch_pcre loop
dm_mod reiserfs xenblk xennet
CPU:    0
EIP:    0061:[<c01272c1>]    Not tainted VLI
EFLAGS: 00010006   (2.6.16.20-0.12-xen #1)
EIP is at internal_add_timer+0x61/0xa0
eax: d13f1d6c   ebx: c1213c80   ecx: c12144d4   edx: ce09527c
esi: 036c82ec   edi: 036c894b   ebp: 00000000   esp: c0383e70
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
Stack: <0>ce09527c c1213c80 c012777d 00000000 ce095080 00000008
c1213c80 ce095080
       c02734ac 00000001 c02b2879 00000001 c1214de0 c013698d c1214e00 c0382000
       00000000 000cd1f9 00000000 c1214de4 35147d9a 0000d143 ce095080 00000100
Call Trace:
 [<c012777d>] __mod_timer+0x8d/0xc0
 [<c02734ac>] sk_reset_timer+0xc/0x20
 [<c02b2879>] tcp_write_timer+0x119/0x650
 [<c013698d>] hrtimer_run_queues+0x4d/0x180
 [<c01285c9>] run_timer_softirq+0x169/0x1c0
 [<c02b2760>] tcp_write_timer+0x0/0x650
 [<c01234f5>] __do_softirq+0x85/0x110
 [<c0123605>] do_softirq+0x85/0x90
 [<c010687c>] do_IRQ+0x3c/0x70
 [<c024d111>] evtchn_do_upcall+0x91/0xb0
 [<c01050e8>] hypervisor_callback+0x2c/0x34
 [<c0102f5d>] xen_idle+0x4d/0xb0
 [<c01030e6>] cpu_idle+0x66/0xe0
 [<c038476f>] start_kernel+0x2ef/0x3a0
 [<c0384210>] unknown_bootoption+0x0/0x270
Code: c1 e8 11 25 f8 01 00 00 8d 8c 18 0c 0c 00 00 eb 12 85 c9 79 48
89 f0 8d 76 00 25 ff 00 00 00 8d 4c c3 0c 8b 41 04 89 0a 89 51 04 <89>
10 8b 1c 24 8b 74 24 04 89 42 04 83 c4 08 c3 c1 e8 05 25 f8
 <0>Kernel panic - not syncing: Fatal exception in interrupt

-- 
Steve Feehan

Alexei_Roudnev

2006-Jul-14 20:08 UTC

head link

[Ocfs2-users] kernel panics on sles 10 rc3

First of all, make o2cd dependent on iSCSI (so that it starts AFTER it abnd
STOPS before it). I recommend to make sshd start BEFORE both - it allows you
to have emergency access to the system if you did anything wrong.

Second. iSCSI is very reluctant on shutdown.
I'd better manually remove iscsi shutdown from K* files at all, so that it
never stops. You are lucky that
your system did not froze (when I experimented with LVM2 on iSCSI, I had
many such scenarios).

In all other things, such combination work fine for me (except that I was
not able to make OCFSv2 work stable as a document storage on i386 servers).

----- Original Message ----- 
From: "Steve Feehan" <sfeehan at gmail.com>
To: <ocfs2-users at oss.oracle.com>
Sent: Friday, July 14, 2006 6:40 AM
Subject: [Ocfs2-users] kernel panics on sles 10 rc3

> I've just setup ocfs2 on a shared iSCSI disk (from a NetApp) on SLES
> 10 RC3. Both clients are Xen guests. Perhaps I should direct this
> question to a SUSE list, but I hoped that someone here might be able
> to offer guidance.
>
> The configuration was simple and I had a working setup very quickly.
> Unfortunately each time I reboot one of the nodes it panics during
> shutdown. For example, I've included the shutdown output at the end of
> this mail.
>
> I can often (not always) trigger the panic by doing:
>
> slesvm1:~ # /etc/init.d/o2cb status
> Module "configfs": Loaded
> Filesystem "configfs": Mounted
> Module "ocfs2_nodemanager": Loaded
> Module "ocfs2_dlm": Loaded
> Module "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking cluster ocfs2: Online
> Checking heartbeat: Active
> slesvm1:~ #
> slesvm1:~ # mount | grep ocfs
> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
> /dev/sda1 on /oracle type ocfs2 (rw,_netdev,heartbeat=local)
> slesvm1:~ #
> slesvm1:~ # /etc/init.d/ocfs2 stop
> Stopping Oracle Cluster File System (OCFS2)                          done
> slesvm1:~ #
> slesvm1:~ # mount | grep ocfs
> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
> (reverse-i-search)`stop': /etc/init.d/ocfs2 stop
> (reverse-i-search)`':
> slesvm1:~ #
> slesvm1:~ # /etc/init.d/o2cb stop
> Cleaning heartbeat on ocfs2: OK
> Stopping cluster ocfs2: OK
> Unloading module "ocfs2": OK
> Unmounting ocfs2_dlmfs filesystem: OK
> Unloading module "ocfs2_dlmfs": OK
> Unmounting configfs filesystem: OK
> Unloading module "configfs": OK
> slesvm1:~ #
> slesvm1:~ #
> slesvm1:~ # Oops: 0000 [#1]
> SMP
> last sysfs file: /block/sda/removable
> Modules linked in: sg sd_mod ipv6 iscsi_tcp libiscsi
> scsi_transport_iscsi scsi_mod apparmor aamatch_pcre loop dm_mod
> reiserfs xenblk xennet
> CPU:    0
> EIP:    0061:[<c0127491>]    Not tainted VLI
> EFLAGS: 00210083   (2.6.16.20-0.12-xen #1)
> EIP is at cascade+0x11/0x40
> eax: c1213c80   ebx: d1241d6c   ecx: 0000000a   edx: c121448c
> esi: c12144dc   edi: c1213c80   ebp: 0000000a   esp: c0383ec8
> ds: 007b   es: 007b   ss: 0069
> Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
> Stack: <0>00000000 c1214478 c1213c80 c0383ef8 c0128510 00000000
> 00000000 00000000
>        44b79c9c 00002b39 00000000 00000000 c0383ef8 c0383ef8 00000001
c036e108>        c0382000 c03ab180 c01234f5 c03ade40 0000000a 00000000 c0382000
00000001> Call Trace:
>  [<c0128510>] run_timer_softirq+0xb0/0x1c0
>  [<c01234f5>] __do_softirq+0x85/0x110
>  [<c0123605>] do_softirq+0x85/0x90
>  [<c010687c>] do_IRQ+0x3c/0x70
>  [<c024d111>] evtchn_do_upcall+0x91/0xb0
>  [<c01050e8>] hypervisor_callback+0x2c/0x34
>  [<c0102f5d>] xen_idle+0x4d/0xb0
>  [<c01030e6>] cpu_idle+0x66/0xe0
>  [<c038476f>] start_kernel+0x2ef/0x3a0
>  [<c0384210>] unknown_bootoption+0x0/0x270
> Code: 71 14 e8 f3 fd ff ff 8b 0b 39 cb 75 dd 5b 5e c3 8d 76 00 8d bc
> 27 00 00 00 00 55 89 cd 57 89 c7 56 8d 34 ca 53 8b 1e 39 de 74 14
<39>
> 7b 14 89 da 75 19 8b 1b 89 f8 e8 bf fd ff ff 39 de 75 ec 89
>  <0>Kernel panic - not syncing: Fatal exception in interrupt
>
> Does anyone have an idea what the problem might be? Any additional
> information I can provide that might help to track it down?
>
> Thanks in advance for any input.
>
> Steve
>
>
>
> Example shutdown output:
> --------------------------------------------------------------------------
--------------> INIT: Switching to runlevel: 6
> INIT: Sending processes the TERM signal
> Boot logging started on /dev/tty1(/dev/console) at Thu Jul 13 08:28:20
2006> Master Resource Control: previous runlevel: 5, switching to runlevel:6
> Shutting down CRON daemon                                            done
> Shutting down auditd                                                 done
> Shutting down irqbalance                                             done
> Shutting down cupsd                                                  done
> Unloading AppArmor profiles                                          done
> Shutting down ZENworks Management Daemon                             done
> Shutting down Name Service Cache Daemon                              done
> Shutting down mail service (Postfix)                                 done
> Saving random seed                                                   done
> Umount SMB/ CIFS File Systems                                        done
> Shutting down slpd                                                   done
> Shutting down service gdm                                            done
> Shutting down powersaved                                             done
> Stopping Oracle Cluster File System (OCFS2)                          done
> Cleaning heartbeat on ocfs2: OK
> Stopping cluster ocfs2: OK
> Unloading module "ocfs2": OK
> Unmounting ocfs2_dlmfs filesystem: OK
> Unloading module "ocfs2_dlmfs": OK
> Unmounting configfs filesystem: OK
> Unloading module "configfs": OK
> Shutting down SSH daemon                                             done
> Remove Net File System (NFS)
unused> Shutting down RPC portmap daemon                                     done
> Logging out from iqn.1992-08.com.netapp:sn.84166997:                 done
> Stopping iSCSI initiator service:                                    done
> Shutting down syslog services                                        done
> Shutting down network interfaces:
>     eth0
>     eth0      configuration: eth-id-00:16:3e:dc:9b:b8                done
> Shutting down service network  .  .  .  .  .  .  .  .  .  .  .  .  . done.
> Shutting down HAL daemon                                             done
> Shutting down D-BUS daemon                                           done
> Shutting down resource manager                                       done
> Running /etc/init.d/halt.local                                       done
> Sending all processes the TERM signal...                             done
> Sending all processes the KILL signal...                             done
> Turning off swap                                                     done
> Unloading AppArmor profiles                                          done
>                                                                      done
> Unmounting file systems
> securityfs umounted
> devpts umounted
> debugfs umounted
> sysfs umounted
> /dev/hda2 umounted                                                   done
>                                                                      done
> Shutting down MD Raid                                                done
> Stopping udevd:                                                      done
> proc umounted
> Unable to handle kernel paging request at virtual address d13f1d6c
>  printing eip:
> c01272c1
> *pde = ma 06093067 pa 009cc067
> *pte = ma 00000000 pa fffff000
> Oops: 0002 [#1]
> SMP
> last sysfs file: /class/net/eth0/address
> Modules linked in: joydev st sr_mod ide_cd cdrom ide_core xfs_quota
> xfs exportfs sg sd_mod xt_pkttype ipt_LOG xt_limit scsi_mod
> ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat
> ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables
> ip6table_filter ip6_tables x_tables ipv6 apparmor aamatch_pcre loop
> dm_mod reiserfs xenblk xennet
> CPU:    0
> EIP:    0061:[<c01272c1>]    Not tainted VLI
> EFLAGS: 00010006   (2.6.16.20-0.12-xen #1)
> EIP is at internal_add_timer+0x61/0xa0
> eax: d13f1d6c   ebx: c1213c80   ecx: c12144d4   edx: ce09527c
> esi: 036c82ec   edi: 036c894b   ebp: 00000000   esp: c0383e70
> ds: 007b   es: 007b   ss: 0069
> Process swapper (pid: 0, threadinfo=c0382000 task=c03265c0)
> Stack: <0>ce09527c c1213c80 c012777d 00000000 ce095080 00000008
> c1213c80 ce095080
>        c02734ac 00000001 c02b2879 00000001 c1214de0 c013698d c1214e00
c0382000>        00000000 000cd1f9 00000000 c1214de4 35147d9a 0000d143 ce095080
00000100> Call Trace:
>  [<c012777d>] __mod_timer+0x8d/0xc0
>  [<c02734ac>] sk_reset_timer+0xc/0x20
>  [<c02b2879>] tcp_write_timer+0x119/0x650
>  [<c013698d>] hrtimer_run_queues+0x4d/0x180
>  [<c01285c9>] run_timer_softirq+0x169/0x1c0
>  [<c02b2760>] tcp_write_timer+0x0/0x650
>  [<c01234f5>] __do_softirq+0x85/0x110
>  [<c0123605>] do_softirq+0x85/0x90
>  [<c010687c>] do_IRQ+0x3c/0x70
>  [<c024d111>] evtchn_do_upcall+0x91/0xb0
>  [<c01050e8>] hypervisor_callback+0x2c/0x34
>  [<c0102f5d>] xen_idle+0x4d/0xb0
>  [<c01030e6>] cpu_idle+0x66/0xe0
>  [<c038476f>] start_kernel+0x2ef/0x3a0
>  [<c0384210>] unknown_bootoption+0x0/0x270
> Code: c1 e8 11 25 f8 01 00 00 8d 8c 18 0c 0c 00 00 eb 12 85 c9 79 48
> 89 f0 8d 76 00 25 ff 00 00 00 8d 4c c3 0c 8b 41 04 89 0a 89 51 04
<89>
> 10 8b 1c 24 8b 74 24 04 89 42 04 83 c4 08 c3 c1 e8 05 25 f8
>  <0>Kernel panic - not syncing: Fatal exception in interrupt
>
> -- 
> Steve Feehan
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Ocfs2 users - Jul 2006 - kernel panics on sles 10 rc3

[Ocfs2-users] kernel panics on sles 10 rc3

[Ocfs2-users] kernel panics on sles 10 rc3