thr3ads.net - Ocfs2 users - [Ocfs2-users] Database won't mount [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Kristiansen Morten

2011-Feb-09 13:17 UTC

[Ocfs2-users] Database won't mount

Hi,

We're running an Oracle cluster with Oracle cluster dataguard. For a testing
reason and to change hostnames, we wanted to reinstall the three dataguard
nodes. The old installation was/is:
`uname -a`
Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

Installed with ocfs2:
rpm -qa|grep ocfs2
                ocfs2-tools-debuginfo-1.2.6-1.el5
ocfs2-tools-1.2.6-1.el5
ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5
ocfs2console-1.2.6-1.el5
ocfs2-tools-devel-1.2.6-1.el5

The reinstalled nodes was installed with:
                `uname -a`
                Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST
2011 x86_64 x86_64 x86_64 GNU/Linux

                rpm -qa|grep ocfs2
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5

Oracle version is 10.2.0.3 on both.

The installation seems to be OK. I first reinstalled the two node who wasn't
running the dataguard instance. When these two was installed, I tried to start
the dataguard instance on node2. To get node1 ready for reinstallation. But when
I run "startup mount", the database wouldn't mount. It was
"hanging" until I run "shutdown abort" in another session.
After a logn while, while "hanging", I finally got an ora-600 [2116]
in my alertlog. As well as two tracefiles in the bdump catalog. One of the
tracefiles was saying:
*****snipp*****
*** 2011-02-07 09:33:55.096
*** SERVICE NAME:() 2011-02-07 09:33:55.096
*** SESSION ID:(2195.1) 2011-02-07 09:33:55.096
Received ORADEBUG command 'dump errorstack 3' from process Unix process
pid: 17868, image:
*** 2011-02-07 09:33:55.096
ksedmp: internal or fatal error
****snipp****


The other tracefile:
*** 2011-02-07 09:23:52.445
*** SERVICE NAME:() 2011-02-07 09:23:52.444
*** SESSION ID:(2185.1) 2011-02-07 09:23:52.444
Waited for detached process: CKPT for 300 seconds:
*** 2011-02-07 09:23:52.445
Dumping diagnostic information for CKPT:
OS pid = 17835
loadavg : 3.01 3.04 2.91
memory info: free memory = 0.00M
swap info:   free = 0.00M alloc = 0.00M total = 0.00M
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S oracle   17835     1  0  75   0 - 1871417 io_get 09:18 ?      00:00:00
ora_ckpt_DBDG02
[Thread debugging using libthread_db enabled]
warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7fffc2de9000
0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1


The alertlog sayings:
ALTER DATABASE   MOUNT
Mon Feb  7 09:18:52 2011
This instance was first to mount
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc:
Mon Feb  7 09:33:55 2011
Trace dumping is performing id=[cdmp_20110207093355]
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []
Mon Feb  7 09:33:56 2011
Trace dumping is performing id=[cdmp_20110207093356]
Mon Feb  7 09:43:23 2011
Shutting down instance (abort)
License high water mark = 2
Termination issued to instance processes. Waiting for the processes to exit
Mon Feb  7 09:43:33 2011
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 25297


After some investigation, I thought there must be something wrong in the OS or
OCFS2. So we downgraded the kernel to:
                Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008
x86_64 x86_64 x86_64 GNU/Linux

And installed the ocfs2 kernelversion. And then everything worked OK. So my
question is if there is something wrong with ocfs2 kernel version
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails?

Mvh
Morten Kristiansen

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110209/fbee344d/attachment-0001.html

Kristiansen Morten

2011-Feb-09 13:30 UTC

head link

[Ocfs2-users] Database won't mount

Hi again,
Forgot to say that when I had tried to start the database, I was unable to
umount /disk03 which is one of two disks where the controlfile for the database
is installed. When I shutdown CRS, there was still one oracle process left and
it was impossible to kill it. Had to reboot the server. In the `ps -fel` list,
the process had value "wait_f" in the WCHAN column and it was owned by
process 1.

Mvh
Morten Kristiansen

From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at
oss.oracle.com] On Behalf Of Kristiansen Morten
Sent: 9. februar 2011 14:18
To: ocfs2-users at oss.oracle.com
Subject: [Ocfs2-users] Database won't mount

Hi,

We're running an Oracle cluster with Oracle cluster dataguard. For a testing
reason and to change hostnames, we wanted to reinstall the three dataguard
nodes. The old installation was/is:
`uname -a`
Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

Installed with ocfs2:
rpm -qa|grep ocfs2
                ocfs2-tools-debuginfo-1.2.6-1.el5
ocfs2-tools-1.2.6-1.el5
ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5
ocfs2console-1.2.6-1.el5
ocfs2-tools-devel-1.2.6-1.el5

The reinstalled nodes was installed with:
                `uname -a`
                Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST
2011 x86_64 x86_64 x86_64 GNU/Linux

                rpm -qa|grep ocfs2
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5

Oracle version is 10.2.0.3 on both.

The installation seems to be OK. I first reinstalled the two node who wasn't
running the dataguard instance. When these two was installed, I tried to start
the dataguard instance on node2. To get node1 ready for reinstallation. But when
I run "startup mount", the database wouldn't mount. It was
"hanging" until I run "shutdown abort" in another session.
After a logn while, while "hanging", I finally got an ora-600 [2116]
in my alertlog. As well as two tracefiles in the bdump catalog. One of the
tracefiles was saying:
*****snipp*****
*** 2011-02-07 09:33:55.096
*** SERVICE NAME:() 2011-02-07 09:33:55.096
*** SESSION ID:(2195.1) 2011-02-07 09:33:55.096
Received ORADEBUG command 'dump errorstack 3' from process Unix process
pid: 17868, image:
*** 2011-02-07 09:33:55.096
ksedmp: internal or fatal error
****snipp****


The other tracefile:
*** 2011-02-07 09:23:52.445
*** SERVICE NAME:() 2011-02-07 09:23:52.444
*** SESSION ID:(2185.1) 2011-02-07 09:23:52.444
Waited for detached process: CKPT for 300 seconds:
*** 2011-02-07 09:23:52.445
Dumping diagnostic information for CKPT:
OS pid = 17835
loadavg : 3.01 3.04 2.91
memory info: free memory = 0.00M
swap info:   free = 0.00M alloc = 0.00M total = 0.00M
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S oracle   17835     1  0  75   0 - 1871417 io_get 09:18 ?      00:00:00
ora_ckpt_DBDG02
[Thread debugging using libthread_db enabled]
warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7fffc2de9000
0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1


The alertlog sayings:
ALTER DATABASE   MOUNT
Mon Feb  7 09:18:52 2011
This instance was first to mount
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc:
Mon Feb  7 09:33:55 2011
Trace dumping is performing id=[cdmp_20110207093355]
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []
Mon Feb  7 09:33:56 2011
Trace dumping is performing id=[cdmp_20110207093356]
Mon Feb  7 09:43:23 2011
Shutting down instance (abort)
License high water mark = 2
Termination issued to instance processes. Waiting for the processes to exit
Mon Feb  7 09:43:33 2011
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 25297


After some investigation, I thought there must be something wrong in the OS or
OCFS2. So we downgraded the kernel to:
                Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008
x86_64 x86_64 x86_64 GNU/Linux

And installed the ocfs2 kernelversion. And then everything worked OK. So my
question is if there is something wrong with ocfs2 kernel version
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails?

Mvh
Morten Kristiansen

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110209/95e602ea/attachment.html

Kristiansen Morten

2011-Feb-14 07:35 UTC

head link

[Ocfs2-users] Database won't mount

As I said we're running Oracle 10.2.0.3 both on RAC software and DB
Enterprice Software. And I don't think it's any problem running RAC on
ocfs2. We have been running 7 different RAC, all on ocfs2 for several years with
no problems. And after following this mailinglist for a year, I don't think
it's that rear. And yes the database has problems with CKPT with the
controlfile placed on /disk03. Where I think the oracle process are hanging and
that's why I'm not able to unmount /disk03. The question is why this
happens? In my opinion it has something to do with either OS or ocfs2. Running
on a lower version of the kernel, is no such problem. So I think there must be
some kind of bug somewhere.

Mvh
Morten Kristiansen

From: Michael Austin [mailto:onedbguru at gmail.com]
Sent: 11. februar 2011 19:51
To: Kristiansen Morten
Subject: Re: [Ocfs2-users] Database won't mount

It appears the ORA-600 gets generated when CKPT could not lock/access the
control files.

First, what version of Oracle RAC? While OCFS2 "can" be used to as the
shared storage between RAC does not mean it is a good idea - in fact it is
STRONGLY suggested that you use ASM and ACFS.  I have seen MANY clusters (10g
and 11g) running on ASM with NO problems with one at a previous employer at
380TB+ on Solaris, not Linux.  Using anything other than ASM seems to be very
problematic.  If you don't want to use ASM, you can always use ACFS - while
it lives within ASM, ASM is the volume manager.  With ACFS, you have
"normal" mount points where you can store your data and access it just
like a normal unix file system - however things like deletes etc are
significantly faster.  They can also be dynamically resized on the fly/online.
In one environment in the past we used ASM for normal database storage and used
ACFS volumes for the shared stuff like FRA archivelogs and rman backups.




On Wed, Feb 9, 2011 at 8:30 AM, Kristiansen Morten <Morten.Kristiansen at
hn-ikt.no<mailto:Morten.Kristiansen at hn-ikt.no>> wrote:
Hi again,
Forgot to say that when I had tried to start the database, I was unable to
umount /disk03 which is one of two disks where the controlfile for the database
is installed. When I shutdown CRS, there was still one oracle process left and
it was impossible to kill it. Had to reboot the server. In the `ps -fel` list,
the process had value "wait_f" in the WCHAN column and it was owned by
process 1.

Mvh
Morten Kristiansen

From: ocfs2-users-bounces at oss.oracle.com<mailto:ocfs2-users-bounces at
oss.oracle.com> [mailto:ocfs2-users-bounces at
oss.oracle.com<mailto:ocfs2-users-bounces at oss.oracle.com>] On Behalf Of
Kristiansen Morten
Sent: 9. februar 2011 14:18
To: ocfs2-users at oss.oracle.com<mailto:ocfs2-users at oss.oracle.com>
Subject: [Ocfs2-users] Database won't mount

Hi,

We're running an Oracle cluster with Oracle cluster dataguard. For a testing
reason and to change hostnames, we wanted to reinstall the three dataguard
nodes. The old installation was/is:
`uname -a`
Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

Installed with ocfs2:
rpm -qa|grep ocfs2
                ocfs2-tools-debuginfo-1.2.6-1.el5
ocfs2-tools-1.2.6-1.el5
ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5
ocfs2console-1.2.6-1.el5
ocfs2-tools-devel-1.2.6-1.el5

The reinstalled nodes was installed with:
                `uname -a`
                Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST
2011 x86_64 x86_64 x86_64 GNU/Linux

                rpm -qa|grep ocfs2
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5

Oracle version is 10.2.0.3 on both.

The installation seems to be OK. I first reinstalled the two node who wasn't
running the dataguard instance. When these two was installed, I tried to start
the dataguard instance on node2. To get node1 ready for reinstallation. But when
I run "startup mount", the database wouldn't mount. It was
"hanging" until I run "shutdown abort" in another session.
After a logn while, while "hanging", I finally got an ora-600 [2116]
in my alertlog. As well as two tracefiles in the bdump catalog. One of the
tracefiles was saying:
*****snipp*****
*** 2011-02-07 09:33:55.096
*** SERVICE NAME:() 2011-02-07 09:33:55.096
*** SESSION ID:(2195.1) 2011-02-07 09:33:55.096
Received ORADEBUG command 'dump errorstack 3' from process Unix process
pid: 17868, image:
*** 2011-02-07 09:33:55.096
ksedmp: internal or fatal error
****snipp****


The other tracefile:
*** 2011-02-07 09:23:52.445
*** SERVICE NAME:() 2011-02-07 09:23:52.444
*** SESSION ID:(2185.1) 2011-02-07 09:23:52.444
Waited for detached process: CKPT for 300 seconds:
*** 2011-02-07 09:23:52.445
Dumping diagnostic information for CKPT:
OS pid = 17835
loadavg : 3.01 3.04 2.91
memory info: free memory = 0.00M
swap info:   free = 0.00M alloc = 0.00M total = 0.00M
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S oracle   17835     1  0  75   0 - 1871417 io_get 09:18 ?      00:00:00
ora_ckpt_DBDG02
[Thread debugging using libthread_db enabled]
warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7fffc2de9000
0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1


The alertlog sayings:
ALTER DATABASE   MOUNT
Mon Feb  7 09:18:52 2011
This instance was first to mount
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc:
Mon Feb  7 09:33:55 2011
Trace dumping is performing id=[cdmp_20110207093355]
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []
Mon Feb  7 09:33:56 2011
Trace dumping is performing id=[cdmp_20110207093356]
Mon Feb  7 09:43:23 2011
Shutting down instance (abort)
License high water mark = 2
Termination issued to instance processes. Waiting for the processes to exit
Mon Feb  7 09:43:33 2011
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 25297


After some investigation, I thought there must be something wrong in the OS or
OCFS2. So we downgraded the kernel to:
                Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008
x86_64 x86_64 x86_64 GNU/Linux

And installed the ocfs2 kernelversion. And then everything worked OK. So my
question is if there is something wrong with ocfs2 kernel version
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails?

Mvh
Morten Kristiansen


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com<mailto:Ocfs2-users at oss.oracle.com>
http://oss.oracle.com/mailman/listinfo/ocfs2-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110214/39a648c2/attachment-0001.html

Ocfs2 users - Feb 2011 - Database won't mount

[Ocfs2-users] Database won't mount

[Ocfs2-users] Database won't mount

[Ocfs2-users] Database won't mount