Hi, We're running an Oracle cluster with Oracle cluster dataguard. For a testing reason and to change hostnames, we wanted to reinstall the three dataguard nodes. The old installation was/is: `uname -a` Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Installed with ocfs2: rpm -qa|grep ocfs2 ocfs2-tools-debuginfo-1.2.6-1.el5 ocfs2-tools-1.2.6-1.el5 ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5 ocfs2console-1.2.6-1.el5 ocfs2-tools-devel-1.2.6-1.el5 The reinstalled nodes was installed with: `uname -a` Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux rpm -qa|grep ocfs2 ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 Oracle version is 10.2.0.3 on both. The installation seems to be OK. I first reinstalled the two node who wasn't running the dataguard instance. When these two was installed, I tried to start the dataguard instance on node2. To get node1 ready for reinstallation. But when I run "startup mount", the database wouldn't mount. It was "hanging" until I run "shutdown abort" in another session. After a logn while, while "hanging", I finally got an ora-600 [2116] in my alertlog. As well as two tracefiles in the bdump catalog. One of the tracefiles was saying: *****snipp***** *** 2011-02-07 09:33:55.096 *** SERVICE NAME:() 2011-02-07 09:33:55.096 *** SESSION ID:(2195.1) 2011-02-07 09:33:55.096 Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 17868, image: *** 2011-02-07 09:33:55.096 ksedmp: internal or fatal error ****snipp**** The other tracefile: *** 2011-02-07 09:23:52.445 *** SERVICE NAME:() 2011-02-07 09:23:52.444 *** SESSION ID:(2185.1) 2011-02-07 09:23:52.444 Waited for detached process: CKPT for 300 seconds: *** 2011-02-07 09:23:52.445 Dumping diagnostic information for CKPT: OS pid = 17835 loadavg : 3.01 3.04 2.91 memory info: free memory = 0.00M swap info: free = 0.00M alloc = 0.00M total = 0.00M F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S oracle 17835 1 0 75 0 - 1871417 io_get 09:18 ? 00:00:00 ora_ckpt_DBDG02 [Thread debugging using libthread_db enabled] warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc2de9000 0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1 The alertlog sayings: ALTER DATABASE MOUNT Mon Feb 7 09:18:52 2011 This instance was first to mount Mon Feb 7 09:33:55 2011 Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc: Mon Feb 7 09:33:55 2011 Trace dumping is performing id=[cdmp_20110207093355] Mon Feb 7 09:33:55 2011 Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc: ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], [] Mon Feb 7 09:33:56 2011 Trace dumping is performing id=[cdmp_20110207093356] Mon Feb 7 09:43:23 2011 Shutting down instance (abort) License high water mark = 2 Termination issued to instance processes. Waiting for the processes to exit Mon Feb 7 09:43:33 2011 Instance termination failed to kill one or more processes Instance terminated by USER, pid = 25297 After some investigation, I thought there must be something wrong in the OS or OCFS2. So we downgraded the kernel to: Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux And installed the ocfs2 kernelversion. And then everything worked OK. So my question is if there is something wrong with ocfs2 kernel version ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails? Mvh Morten Kristiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110209/fbee344d/attachment-0001.html
Hi again, Forgot to say that when I had tried to start the database, I was unable to umount /disk03 which is one of two disks where the controlfile for the database is installed. When I shutdown CRS, there was still one oracle process left and it was impossible to kill it. Had to reboot the server. In the `ps -fel` list, the process had value "wait_f" in the WCHAN column and it was owned by process 1. Mvh Morten Kristiansen From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Kristiansen Morten Sent: 9. februar 2011 14:18 To: ocfs2-users at oss.oracle.com Subject: [Ocfs2-users] Database won't mount Hi, We're running an Oracle cluster with Oracle cluster dataguard. For a testing reason and to change hostnames, we wanted to reinstall the three dataguard nodes. The old installation was/is: `uname -a` Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Installed with ocfs2: rpm -qa|grep ocfs2 ocfs2-tools-debuginfo-1.2.6-1.el5 ocfs2-tools-1.2.6-1.el5 ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5 ocfs2console-1.2.6-1.el5 ocfs2-tools-devel-1.2.6-1.el5 The reinstalled nodes was installed with: `uname -a` Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux rpm -qa|grep ocfs2 ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 Oracle version is 10.2.0.3 on both. The installation seems to be OK. I first reinstalled the two node who wasn't running the dataguard instance. When these two was installed, I tried to start the dataguard instance on node2. To get node1 ready for reinstallation. But when I run "startup mount", the database wouldn't mount. It was "hanging" until I run "shutdown abort" in another session. After a logn while, while "hanging", I finally got an ora-600 [2116] in my alertlog. As well as two tracefiles in the bdump catalog. One of the tracefiles was saying: *****snipp***** *** 2011-02-07 09:33:55.096 *** SERVICE NAME:() 2011-02-07 09:33:55.096 *** SESSION ID:(2195.1) 2011-02-07 09:33:55.096 Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 17868, image: *** 2011-02-07 09:33:55.096 ksedmp: internal or fatal error ****snipp**** The other tracefile: *** 2011-02-07 09:23:52.445 *** SERVICE NAME:() 2011-02-07 09:23:52.444 *** SESSION ID:(2185.1) 2011-02-07 09:23:52.444 Waited for detached process: CKPT for 300 seconds: *** 2011-02-07 09:23:52.445 Dumping diagnostic information for CKPT: OS pid = 17835 loadavg : 3.01 3.04 2.91 memory info: free memory = 0.00M swap info: free = 0.00M alloc = 0.00M total = 0.00M F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S oracle 17835 1 0 75 0 - 1871417 io_get 09:18 ? 00:00:00 ora_ckpt_DBDG02 [Thread debugging using libthread_db enabled] warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc2de9000 0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1 The alertlog sayings: ALTER DATABASE MOUNT Mon Feb 7 09:18:52 2011 This instance was first to mount Mon Feb 7 09:33:55 2011 Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc: Mon Feb 7 09:33:55 2011 Trace dumping is performing id=[cdmp_20110207093355] Mon Feb 7 09:33:55 2011 Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc: ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], [] Mon Feb 7 09:33:56 2011 Trace dumping is performing id=[cdmp_20110207093356] Mon Feb 7 09:43:23 2011 Shutting down instance (abort) License high water mark = 2 Termination issued to instance processes. Waiting for the processes to exit Mon Feb 7 09:43:33 2011 Instance termination failed to kill one or more processes Instance terminated by USER, pid = 25297 After some investigation, I thought there must be something wrong in the OS or OCFS2. So we downgraded the kernel to: Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux And installed the ocfs2 kernelversion. And then everything worked OK. So my question is if there is something wrong with ocfs2 kernel version ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails? Mvh Morten Kristiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110209/95e602ea/attachment.html
As I said we're running Oracle 10.2.0.3 both on RAC software and DB Enterprice Software. And I don't think it's any problem running RAC on ocfs2. We have been running 7 different RAC, all on ocfs2 for several years with no problems. And after following this mailinglist for a year, I don't think it's that rear. And yes the database has problems with CKPT with the controlfile placed on /disk03. Where I think the oracle process are hanging and that's why I'm not able to unmount /disk03. The question is why this happens? In my opinion it has something to do with either OS or ocfs2. Running on a lower version of the kernel, is no such problem. So I think there must be some kind of bug somewhere. Mvh Morten Kristiansen From: Michael Austin [mailto:onedbguru at gmail.com] Sent: 11. februar 2011 19:51 To: Kristiansen Morten Subject: Re: [Ocfs2-users] Database won't mount It appears the ORA-600 gets generated when CKPT could not lock/access the control files. First, what version of Oracle RAC? While OCFS2 "can" be used to as the shared storage between RAC does not mean it is a good idea - in fact it is STRONGLY suggested that you use ASM and ACFS. I have seen MANY clusters (10g and 11g) running on ASM with NO problems with one at a previous employer at 380TB+ on Solaris, not Linux. Using anything other than ASM seems to be very problematic. If you don't want to use ASM, you can always use ACFS - while it lives within ASM, ASM is the volume manager. With ACFS, you have "normal" mount points where you can store your data and access it just like a normal unix file system - however things like deletes etc are significantly faster. They can also be dynamically resized on the fly/online. In one environment in the past we used ASM for normal database storage and used ACFS volumes for the shared stuff like FRA archivelogs and rman backups. On Wed, Feb 9, 2011 at 8:30 AM, Kristiansen Morten <Morten.Kristiansen at hn-ikt.no<mailto:Morten.Kristiansen at hn-ikt.no>> wrote: Hi again, Forgot to say that when I had tried to start the database, I was unable to umount /disk03 which is one of two disks where the controlfile for the database is installed. When I shutdown CRS, there was still one oracle process left and it was impossible to kill it. Had to reboot the server. In the `ps -fel` list, the process had value "wait_f" in the WCHAN column and it was owned by process 1. Mvh Morten Kristiansen From: ocfs2-users-bounces at oss.oracle.com<mailto:ocfs2-users-bounces at oss.oracle.com> [mailto:ocfs2-users-bounces at oss.oracle.com<mailto:ocfs2-users-bounces at oss.oracle.com>] On Behalf Of Kristiansen Morten Sent: 9. februar 2011 14:18 To: ocfs2-users at oss.oracle.com<mailto:ocfs2-users at oss.oracle.com> Subject: [Ocfs2-users] Database won't mount Hi, We're running an Oracle cluster with Oracle cluster dataguard. For a testing reason and to change hostnames, we wanted to reinstall the three dataguard nodes. The old installation was/is: `uname -a` Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Installed with ocfs2: rpm -qa|grep ocfs2 ocfs2-tools-debuginfo-1.2.6-1.el5 ocfs2-tools-1.2.6-1.el5 ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5 ocfs2console-1.2.6-1.el5 ocfs2-tools-devel-1.2.6-1.el5 The reinstalled nodes was installed with: `uname -a` Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux rpm -qa|grep ocfs2 ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 Oracle version is 10.2.0.3 on both. The installation seems to be OK. I first reinstalled the two node who wasn't running the dataguard instance. When these two was installed, I tried to start the dataguard instance on node2. To get node1 ready for reinstallation. But when I run "startup mount", the database wouldn't mount. It was "hanging" until I run "shutdown abort" in another session. After a logn while, while "hanging", I finally got an ora-600 [2116] in my alertlog. As well as two tracefiles in the bdump catalog. One of the tracefiles was saying: *****snipp***** *** 2011-02-07 09:33:55.096 *** SERVICE NAME:() 2011-02-07 09:33:55.096 *** SESSION ID:(2195.1) 2011-02-07 09:33:55.096 Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 17868, image: *** 2011-02-07 09:33:55.096 ksedmp: internal or fatal error ****snipp**** The other tracefile: *** 2011-02-07 09:23:52.445 *** SERVICE NAME:() 2011-02-07 09:23:52.444 *** SESSION ID:(2185.1) 2011-02-07 09:23:52.444 Waited for detached process: CKPT for 300 seconds: *** 2011-02-07 09:23:52.445 Dumping diagnostic information for CKPT: OS pid = 17835 loadavg : 3.01 3.04 2.91 memory info: free memory = 0.00M swap info: free = 0.00M alloc = 0.00M total = 0.00M F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S oracle 17835 1 0 75 0 - 1871417 io_get 09:18 ? 00:00:00 ora_ckpt_DBDG02 [Thread debugging using libthread_db enabled] warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc2de9000 0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1 The alertlog sayings: ALTER DATABASE MOUNT Mon Feb 7 09:18:52 2011 This instance was first to mount Mon Feb 7 09:33:55 2011 Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc: Mon Feb 7 09:33:55 2011 Trace dumping is performing id=[cdmp_20110207093355] Mon Feb 7 09:33:55 2011 Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc: ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], [] Mon Feb 7 09:33:56 2011 Trace dumping is performing id=[cdmp_20110207093356] Mon Feb 7 09:43:23 2011 Shutting down instance (abort) License high water mark = 2 Termination issued to instance processes. Waiting for the processes to exit Mon Feb 7 09:43:33 2011 Instance termination failed to kill one or more processes Instance terminated by USER, pid = 25297 After some investigation, I thought there must be something wrong in the OS or OCFS2. So we downgraded the kernel to: Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux And installed the ocfs2 kernelversion. And then everything worked OK. So my question is if there is something wrong with ocfs2 kernel version ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails? Mvh Morten Kristiansen _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com<mailto:Ocfs2-users at oss.oracle.com> http://oss.oracle.com/mailman/listinfo/ocfs2-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110214/39a648c2/attachment-0001.html