thr3ads.net - Ocfs users - [Ocfs-users] ORA-01207 after SAN maintenance [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Matt Daniels

2004-Nov-24 08:28 UTC

[Ocfs-users] ORA-01207 after SAN maintenance

We had a situation over the weekend with our production database that we
can't figure out, hoping someone can shed some light.

Specifics:
Oracle 9.2.0.4
OS is Redhat AS2.1
ocfs-2.4.9-e-summit-1.0.12-1
ocfs-tools-1.0.10-1
ocfs-support-1.0.10-1
ocfs-2.4.9-e-enterprise-1.0.12-1

All database, redo, undo, and control files are on ocfs, archived logs are on
ext3.

We shut down the database for san maintenance, but didn't shut down cluster
manager.  The san was disconnected from the server, a tray was added and then
the san was reconnected.  The server and cluster manager remained up during the
maintenance.

When we tried to restart the database, we got an ORA-01207, saying the control
file was older than the datafiles.  Per Oracle support, we recreated the control
file
and attempted to bring the db up with the new one.  At this point we received
the
following:

Errors in file
/opt/oracle/product/9.2.0/admin/ENTPRD/udump/entprd2_ora_22596.trc:
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [5], [393], [6101],
[], [], [], []

There's a RAC bug entry for [kcoapl_blkchk], but it was for a 4-node RAC,
ours is only
2 nodes, so Oracle internals support said they didn't think it applied to
our case.  We
ended up doing a point-in-time recovery to before the san maintenance, but moved
the
datafiles to an ext3 partition for now.

Has anyone seen this before, or have any input as to what happened?  We're
trying to
determine if this is a bug, and if we should move back to RAC/ocfs.

Thanks very much,
Matt Daniels
Apps DBA, Priority Healthcare Corp
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs-users/attachments/20041124/be435c95/attachment.html

Jeremy Schneider

2004-Nov-24 09:33 UTC

head link

[Ocfs-users] ORA-01207 after SAN maintenance

wow, i'm surprised that cluster manager (oracm) stayed up.  i don't know
all the technical internals of exactly how it works, but i know that it
uses a quorum on shared storage...  i guess it might only use the quorum
for split-brain situations (where the interconnect goes down) but
personally i'd still never yank the shared disk quorum out from under it
without shutting it down!

but i also have to admit that your error message doesn't sound like it
would be related to this.  did you shut down GSD or did you leave that
running too?  the first error (control file older than datafiles)
doesn't make much sense at all...  i think that just means the SCN in
the datafile headers was newer than the SCN recorded in the control
file?  did the DB shutdown cleanly according to the alert log?

(FYI, we're running 9.2.0.5 on a 2-node RHEL3 cluster using ocfs  --
it's a backend for 11.5.9  --  and we've been production for almost 3
months without any problems so far...  oh - and we have [separate] ocfs
partitions for archive logs too)

jeremy, dba

>>> "Matt Daniels"
<Matt.Daniels@priorityhealthcare.com> 11/24/20049:30:10 AM >>>
We had a situation over the weekend with our production database that
we can't figure out, hoping someone can shed some light.

Specifics:
Oracle 9.2.0.4
OS is Redhat AS2.1
ocfs-2.4.9-e-summit-1.0.12-1
ocfs-tools-1.0.10-1
ocfs-support-1.0.10-1
ocfs-2.4.9-e-enterprise-1.0.12-1

All database, redo, undo, and control files are on ocfs, archived logs
are on ext3.

We shut down the database for san maintenance, but didn't shut down
cluster
manager.  The san was disconnected from the server, a tray was added
and then
the san was reconnected.  The server and cluster manager remained up
during the
maintenance.

When we tried to restart the database, we got an ORA-01207, saying the
control
file was older than the datafiles.  Per Oracle support, we recreated
the control file
and attempted to bring the db up with the new one.  At this point we
received the
following:

Errors in file
/opt/oracle/product/9.2.0/admin/ENTPRD/udump/entprd2_ora_22596.trc:
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [5], [393],
[6101], [], [], [], []

There's a RAC bug entry for [kcoapl_blkchk], but it was for a 4-node
RAC, ours is only
2 nodes, so Oracle internals support said they didn't think it applied
to our case.  We
ended up doing a point-in-time recovery to before the san maintenance,
but moved the
datafiles to an ext3 partition for now.

Has anyone seen this before, or have any input as to what happened? 
We're trying to
determine if this is a bug, and if we should move back to RAC/ocfs.

Thanks very much,
Matt Daniels
Apps DBA, Priority Healthcare Corp

This message (including any attachments) contains confidential information
intended for a specific individual(s) and purpose, and is protected by law.  If
you are not the intended recipient, you should delete this message.  Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, by anyone other than the intended recipient(s), is strictly
prohibited.

<<<<...>>>>

Jeram

2004-Nov-25 19:07 UTC

head link

[Ocfs-users] ORA-01207 after SAN maintenance

Hi Matt...

 

Have you seen Note:76434.1 in Metalink.

 

Rgds/Jeram

  _____  

From: Matt Daniels [mailto:Matt.Daniels@priorityhealthcare.com] 
Sent: Wednesday, November 24, 2004 9:30 PM
To: ocfs-users@oss.oracle.com
Subject: [Ocfs-users] ORA-01207 after SAN maintenance

 

We had a situation over the weekend with our production database that we
can't figure out, hoping someone can shed some light.

Specifics: 
Oracle 9.2.0.4 
OS is Redhat AS2.1 
ocfs-2.4.9-e-summit-1.0.12-1 
ocfs-tools-1.0.10-1 
ocfs-support-1.0.10-1 
ocfs-2.4.9-e-enterprise-1.0.12-1 

All database, redo, undo, and control files are on ocfs, archived logs are
on ext3. 

We shut down the database for san maintenance, but didn't shut down cluster 
manager.  The san was disconnected from the server, a tray was added and
then 
the san was reconnected.  The server and cluster manager remained up during
the 
maintenance. 

When we tried to restart the database, we got an ORA-01207, saying the
control 
file was older than the datafiles.  Per Oracle support, we recreated the
control file 
and attempted to bring the db up with the new one.  At this point we
received the 
following: 

Errors in file
/opt/oracle/product/9.2.0/admin/ENTPRD/udump/entprd2_ora_22596.trc: 
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [5], [393],
[6101], [], [], [], [] 

There's a RAC bug entry for [kcoapl_blkchk], but it was for a 4-node RAC,
ours is only 
2 nodes, so Oracle internals support said they didn't think it applied to
our case.  We 
ended up doing a point-in-time recovery to before the san maintenance, but
moved the 
datafiles to an ext3 partition for now. 

Has anyone seen this before, or have any input as to what happened?  We're
trying to 
determine if this is a bug, and if we should move back to RAC/ocfs. 

Thanks very much, 
Matt Daniels 
Apps DBA, Priority Healthcare Corp 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs-users/attachments/20041126/62122e8c/attachment.html

Matt Daniels

2004-Nov-25 20:08 UTC

head link

[Ocfs-users] ORA-01207 after SAN maintenance

Hi Jeram,
 
Thanks for the reply.  Yes, we read the note, especially this part:
 
Bug# 3281882   See 
<http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_id=3281882.8&p_database_id=NOT>
[NOTE:3281882.8]

      Block corruption / OERI[kcoapl_blkchk] in multinode RAC after multiple
reconfigurations


      Fixed: 9.2.0.5, 10.1.0.2 

Oracle Internals support, however, determined that this bug didn't apply to
our case since it occured specifically with a 4-node RAC instance, and ours is
only 2 nodes.

We're still trying to determine the cause for this, as we lost our
production instance and had to do point-in-time recovery.

Interestingly, one of our development instances experienced the exact problem as
well.  Its datafiles were stored on the san in a different ocfs partition, but
in the same storage group.  Our other development instance using the same san
storage group, but with datafiles on an ext3 partition, wasn't affected and
came up fine.

Thanks again for the response!

Matt

-----Original Message-----
From: Jeram [mailto:jeram@JISEDU.OR.ID]
Sent: Thursday, November 25, 2004 8:11 PM
To: Matt Daniels; ocfs-users@oss.oracle.com
Subject: RE: [Ocfs-users] ORA-01207 after SAN maintenance



Hi Matt...

 

Have you seen Note:76434.1 in Metalink.

 

Rgds/Jeram


  _____  


From: Matt Daniels [mailto:Matt.Daniels@priorityhealthcare.com] 
Sent: Wednesday, November 24, 2004 9:30 PM
To: ocfs-users@oss.oracle.com
Subject: [Ocfs-users] ORA-01207 after SAN maintenance

 

We had a situation over the weekend with our production database that we
can't figure out, hoping someone can shed some light.

Specifics: 
Oracle 9.2.0.4 
OS is Redhat AS2.1 
ocfs-2.4.9-e-summit-1.0.12-1 
ocfs-tools-1.0.10-1 
ocfs-support-1.0.10-1 
ocfs-2.4.9-e-enterprise-1.0.12-1 

All database, redo, undo, and control files are on ocfs, archived logs are on
ext3.

We shut down the database for san maintenance, but didn't shut down cluster 
manager.  The san was disconnected from the server, a tray was added and then 
the san was reconnected.  The server and cluster manager remained up during the 
maintenance. 

When we tried to restart the database, we got an ORA-01207, saying the control 
file was older than the datafiles.  Per Oracle support, we recreated the control
file
and attempted to bring the db up with the new one.  At this point we received
the
following: 

Errors in file
/opt/oracle/product/9.2.0/admin/ENTPRD/udump/entprd2_ora_22596.trc:
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [5], [393], [6101],
[], [], [], []

There's a RAC bug entry for [kcoapl_blkchk], but it was for a 4-node RAC,
ours is only
2 nodes, so Oracle internals support said they didn't think it applied to
our case.  We
ended up doing a point-in-time recovery to before the san maintenance, but moved
the
datafiles to an ext3 partition for now. 

Has anyone seen this before, or have any input as to what happened?  We're
trying to
determine if this is a bug, and if we should move back to RAC/ocfs. 

Thanks very much, 
Matt Daniels 
Apps DBA, Priority Healthcare Corp 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs-users/attachments/20041125/718b5d4f/attachment.html

Matt Daniels

2004-Nov-25 20:18 UTC

head link

[Ocfs-users] ORA-01207 after SAN maintenance

Hi Jeremy,

Thanks for the response.  I believe GSD was still running when the san
maintenance was done.  The database shutdown cleanly, all the issues arose when
we tried to start it back up.  We've since found out that a development
instance with datafiles on another ocfs partition on the san suffered the exact
problem as production, while a development instance with its datafiles on an
ext3 partition on the san had no issues at all, and came up cleanly.

This one still has us stumped, we're working with support to try and
determine root cause...any other thoughts or suggestions are welcome...

-Matt

-----Original Message-----
From: ocfs-users-bounces@oss.oracle.com
[mailto:ocfs-users-bounces@oss.oracle.com]On Behalf Of Jeremy Schneider
Sent: Wednesday, November 24, 2004 10:33 AM
To: ocfs-users@oss.oracle.com; Matt Daniels
Subject: Re: [Ocfs-users] ORA-01207 after SAN maintenance


wow, i'm surprised that cluster manager (oracm) stayed up.  i don't know
all the technical internals of exactly how it works, but i know that it
uses a quorum on shared storage...  i guess it might only use the quorum
for split-brain situations (where the interconnect goes down) but
personally i'd still never yank the shared disk quorum out from under it
without shutting it down!

but i also have to admit that your error message doesn't sound like it
would be related to this.  did you shut down GSD or did you leave that
running too?  the first error (control file older than datafiles)
doesn't make much sense at all...  i think that just means the SCN in
the datafile headers was newer than the SCN recorded in the control
file?  did the DB shutdown cleanly according to the alert log?

(FYI, we're running 9.2.0.5 on a 2-node RHEL3 cluster using ocfs  --
it's a backend for 11.5.9  --  and we've been production for almost 3
months without any problems so far...  oh - and we have [separate] ocfs
partitions for archive logs too)

jeremy, dba

>>> "Matt Daniels"
<Matt.Daniels@priorityhealthcare.com> 11/24/20049:30:10 AM >>>
We had a situation over the weekend with our production database that
we can't figure out, hoping someone can shed some light.

Specifics:
Oracle 9.2.0.4
OS is Redhat AS2.1
ocfs-2.4.9-e-summit-1.0.12-1
ocfs-tools-1.0.10-1
ocfs-support-1.0.10-1
ocfs-2.4.9-e-enterprise-1.0.12-1

All database, redo, undo, and control files are on ocfs, archived logs
are on ext3.

We shut down the database for san maintenance, but didn't shut down
cluster
manager.  The san was disconnected from the server, a tray was added
and then
the san was reconnected.  The server and cluster manager remained up
during the
maintenance.

When we tried to restart the database, we got an ORA-01207, saying the
control
file was older than the datafiles.  Per Oracle support, we recreated
the control file
and attempted to bring the db up with the new one.  At this point we
received the
following:

Errors in file
/opt/oracle/product/9.2.0/admin/ENTPRD/udump/entprd2_ora_22596.trc:
ORA-00600: internal error code, arguments: [kcoapl_blkchk], [5], [393],
[6101], [], [], [], []

There's a RAC bug entry for [kcoapl_blkchk], but it was for a 4-node
RAC, ours is only
2 nodes, so Oracle internals support said they didn't think it applied
to our case.  We
ended up doing a point-in-time recovery to before the san maintenance,
but moved the
datafiles to an ext3 partition for now.

Has anyone seen this before, or have any input as to what happened? 
We're trying to
determine if this is a bug, and if we should move back to RAC/ocfs.

Thanks very much,
Matt Daniels
Apps DBA, Priority Healthcare Corp

This message (including any attachments) contains confidential information
intended for a specific individual(s) and purpose, and is protected by law.  If
you are not the intended recipient, you should delete this message.  Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, by anyone other than the intended recipient(s), is strictly
prohibited.

<<<<...>>>>
_______________________________________________
Ocfs-users mailing list
Ocfs-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs-users

Maybe Matching Threads

Search for more maybe matching threads

Ocfs users - Nov 2004 - ORA-01207 after SAN maintenance

[Ocfs-users] ORA-01207 after SAN maintenance

[Ocfs-users] ORA-01207 after SAN maintenance

[Ocfs-users] ORA-01207 after SAN maintenance

[Ocfs-users] ORA-01207 after SAN maintenance

[Ocfs-users] ORA-01207 after SAN maintenance

Maybe Matching Threads