Pei Ku
2005-Feb-11 13:32 UTC
[Ocfs-users] OCFS file system used as archived redo destination is corrupted
we started using an ocfs file system about 4 months ago as the shared archived redo destination for the 4-node rac instances (HP dl380, msa1000, RH AS 2.1) . last night we are seeing some weird behavior, and my guess is the inode directory in the file system is getting corrupted. I've always had a bad feeling about OCFS not being very robust at handling constant file creation and deletion (which is what happens when you use it for archived redo logs). ocfs-2.4.9-e-smp-1.0.12-1 is what we are using in production. For now, we set up an archo redo dest on a local ext3 FS on each node and made that dest the mandatory dest; we changed the ocfs dest to an optional one. The reason we made ocfs arch redo dest the primary dest a few months ago was because we are planning to migrate to rman-based backup (as opposed to the current hot backup scheme); it's easier (required?) to manage RAC archived redo logs with rman if archived redos reside in a shared file system below are some diagnostics: $ ls -l rdo_1_21810.arc* -rw-r----- 1 oracle dba 397312 Feb 10 22:30 rdo_1_21810.arc -rw-r----- 1 oracle dba 397312 Feb 10 22:30 rdo_1_21810.arc (they have the same inode, btw -- I had done a 'ls -li' earlier but the output had rolled off the screen) after a while , one of the dba scripts gziped the file(s). Now they look like this: $ ls -liL /export/u10/oraarch/AUCP/rdo_1_21810.arc* 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz These two same files have the same inode also. But the size is way too small. yeah, /export/u10 is pretty hosed... Pei -----Original Message----- From: Pei Ku Sent: Thu 2/10/2005 11:16 PM To: IT Cc: ADS Subject: possible OCFS /export/u10/ corruption on dbprd* Ulf, AUCP had problems creating archive file "/export/u10/oraarch/AUCP/rdo_1_21810.arc". After a few tries, it appeared that it was able to -- except that there are *two* rdo_1_21810.arc files in it (by the time you look at it, it/they probably would get gzipped. We also have a couple of zero-lengh gzipped redo log files (which is not normal) in there. At least the problem had not brought any of the AUCP instances down. Manoj and I turned on archiving to an ext3 file system on each node for now; archiving to /export/u10/ is still active but made optional for now. My guess /export/u10/ is corrupted in some way. I still say OCFS can't take constant file creation/removing. We are one rev behind (1.0.12 vs 1.0.13 on ocfs.org). No guarantee that 1.0.13 contains the cure... Pei -----Original Message----- From: Oracle [mailto:oracle@dbprd01.autc.com] Sent: Thu 2/10/2005 10:26 PM To: DBA; Page DBA; Unix Admin Cc: Subject: SL1:dbprd01.autc.com:050210_222600:oalert_mon> Alert Log Errors SEVER_LVL=1 PROG=oalert_mon **** oalert_mon.pl: DB=AUCP SID=AUCP1 [Thu Feb 10 22:25:21] ORA-19504: failed to create file "/export/u10/oraarch/AUCP/rdo_1_21810.arc" [Thu Feb 10 22:25:21] ORA-19504: failed to create file "/export/u10/oraarch/AUCP/rdo_1_21810.arc" [Thu Feb 10 22:25:21] ORA-27040: skgfrcre: create error, unable to create file [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot be archived [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: '/export/u01/oradata/AUCP/redo12m1.log' [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: '/export/u01/oradata/AUCP/redo12m2.log' [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot be archived [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: '/export/u01/oradata/AUCP/redo12m1.log' [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: '/export/u01/oradata/AUCP/redo12m2.log' [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot be archived [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: '/export/u01/oradata/AUCP/redo12m1.log' [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: '/export/u01/oradata/AUCP/redo12m2.log' -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs-users/attachments/20050211/ef968754/attachment.html
Sunil Mushran
2005-Feb-11 13:52 UTC
[Ocfs-users] OCFS file system used as archived redo destination is corrupted
Looks like the dirnode index is screwed up. The file is showing up twice, but there is only one copy of the file. We had detected a race which could cause this. Was fixed. Did you start on 1.0.12 or run an older version of the module with this device? May want to look into upgrading to atleast 1.0.13. We did some memory alloc changes which were sorely required. As part of our tests, we simulate the archiver... run a script on multiple nodes which constantly creates files. Pei Ku wrote:> > we started using an ocfs file system about 4 months ago as the shared > archived redo destination for the 4-node rac instances (HP dl380, > msa1000, RH AS 2.1) . last night we are seeing some weird behavior, > and my guess is the inode directory in the file system is getting > corrupted. I've always had a bad feeling about OCFS not being very > robust at handling constant file creation and deletion (which is what > happens when you use it for archived redo logs). > > ocfs-2.4.9-e-smp-1.0.12-1 is what we are using in production. > > For now, we set up an archo redo dest on a local ext3 FS on each node > and made that dest the mandatory dest; we changed the ocfs dest to an > optional one. The reason we made ocfs arch redo dest the primary dest > a few months ago was because we are planning to migrate to rman-based > backup (as opposed to the current hot backup scheme); it's easier > (required?) to manage RAC archived redo logs with rman if archived > redos reside in a shared file system > > below are some diagnostics: > $ ls -l rdo_1_21810.arc* > > -rw-r----- 1 oracle dba 397312 Feb 10 22:30 rdo_1_21810.arc > -rw-r----- 1 oracle dba 397312 Feb 10 22:30 rdo_1_21810.arc > > (they have the same inode, btw -- I had done a 'ls -li' earlier but > the output had rolled off the screen) > > after a while , one of the dba scripts gziped the file(s). Now they > look like this: > > $ ls -liL /export/u10/oraarch/AUCP/rdo_1_21810.arc* > 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 > /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz > 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 > /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz > > These two same files have the same inode also. But the size is way > too small. > > yeah, /export/u10 is pretty hosed... > > Pei > > -----Original Message----- > *From:* Pei Ku > *Sent:* Thu 2/10/2005 11:16 PM > *To:* IT > *Cc:* ADS > *Subject:* possible OCFS /export/u10/ corruption on dbprd* > > Ulf, > > AUCP had problems creating archive file > "/export/u10/oraarch/AUCP/rdo_1_21810.arc". After a few tries, it > appeared that it was able to -- except that there are *two* > rdo_1_21810.arc files in it (by the time you look at it, it/they > probably would get gzipped. We also have a couple of zero-lengh > gzipped redo log files (which is not normal) in there. > > At least the problem had not brought any of the AUCP instances > down. Manoj and I turned on archiving to an ext3 file system on > each node for now; archiving to /export/u10/ is still active but > made optional for now. > > My guess /export/u10/ is corrupted in some way. I still say OCFS > can't take constant file creation/removing. > > We are one rev behind (1.0.12 vs 1.0.13 on ocfs.org). No > guarantee that 1.0.13 contains the cure... > > Pei > > -----Original Message----- > *From:* Oracle [mailto:oracle@dbprd01.autc.com] > *Sent:* Thu 2/10/2005 10:26 PM > *To:* DBA; Page DBA; Unix Admin > *Cc:* > *Subject:* SL1:dbprd01.autc.com:050210_222600:oalert_mon> > Alert Log Errors > > SEVER_LVL=1 PROG=oalert_mon > **** oalert_mon.pl: DB=AUCP SID=AUCP1 > [Thu Feb 10 22:25:21] ORA-19504: failed to create file > "/export/u10/oraarch/AUCP/rdo_1_21810.arc" > [Thu Feb 10 22:25:21] ORA-19504: failed to create file > "/export/u10/oraarch/AUCP/rdo_1_21810.arc" > [Thu Feb 10 22:25:21] ORA-27040: skgfrcre: create error, > unable to create file > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot > be archived > [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > '/export/u01/oradata/AUCP/redo12m1.log' > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > '/export/u01/oradata/AUCP/redo12m2.log' > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot > be archived > [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > '/export/u01/oradata/AUCP/redo12m1.log' > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > '/export/u01/oradata/AUCP/redo12m2.log' > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot > be archived > [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > '/export/u01/oradata/AUCP/redo12m1.log' > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > '/export/u01/oradata/AUCP/redo12m2.log' > >------------------------------------------------------------------------ > >_______________________________________________ >Ocfs-users mailing list >Ocfs-users@oss.oracle.com >http://oss.oracle.com/mailman/listinfo/ocfs-users > >
Pei Ku
2005-Feb-11 14:45 UTC
[Ocfs-users] OCFS file system used as archived redo destination is corrupted
This file system was created under 1.0.12. Does upgrade from 1.0.12 to 1.0.13 require reformatting the file systems? I don't care about the file system we are using for archived redos (it's pretty screwed up anyway - it's gonna need a clean swipe). But should I do a full FS pre-upgrade dump and post-upgrade restore for the file system used for storing oracle datafiles? Of course I'll do a full db backup in any case. Are you saying the problem I described was a known problem in 1.0.12 and had been fixed in 1.0.13? Before this problem, our production db had archive_lag_target set to 15 minutes (in order for the standby db not to lag behind the production db too much). Since this is a four-node RAC, it means that there are at least (60/15)*4=16 archived redos generated per hour (16*24=384 per day). And the fact that this problem only appears after several months tells me that OCFS QA process needs to be more thorough and run for a long time in order to catch bugs like this. Another weird thing is when I do a 'ls /export/u10/oraarch/AUCP', it takes about 15 sec and CPU usage is like 25+% for that duration. If I do the same command on multiple nodes, the elapse time might be 30 sec on each node. It concerns me a simple command like 'ls' can be that resource intensive and slow. Maybe it's related to the FS corruption... thanks Pei> -----Original Message----- > From: Sunil Mushran [mailto:Sunil.Mushran@oracle.com] > Sent: Friday, February 11, 2005 11:51 AM > To: Pei Ku > Cc: ocfs-users@oss.oracle.com > Subject: Re: [Ocfs-users] OCFS file system used as archived redo > destination is corrupted > > > Looks like the dirnode index is screwed up. The file is showing up > twice, but there is only one copy of the file. > We had detected a race which could cause this. Was fixed. > > Did you start on 1.0.12 or run an older version of the module > with this > device? > May want to look into upgrading to atleast 1.0.13. We did > some memory alloc > changes which were sorely required. > > As part of our tests, we simulate the archiver... run a script on > multiple nodes > which constantly creates files. > > Pei Ku wrote: > > > > > we started using an ocfs file system about 4 months ago as > the shared > > archived redo destination for the 4-node rac instances (HP dl380, > > msa1000, RH AS 2.1) . last night we are seeing some weird > behavior, > > and my guess is the inode directory in the file system is getting > > corrupted. I've always had a bad feeling about OCFS not being very > > robust at handling constant file creation and deletion > (which is what > > happens when you use it for archived redo logs). > > > > ocfs-2.4.9-e-smp-1.0.12-1 is what we are using in production. > > > > For now, we set up an archo redo dest on a local ext3 FS on > each node > > and made that dest the mandatory dest; we changed the ocfs > dest to an > > optional one. The reason we made ocfs arch redo dest the > primary dest > > a few months ago was because we are planning to migrate to > rman-based > > backup (as opposed to the current hot backup scheme); it's easier > > (required?) to manage RAC archived redo logs with rman if archived > > redos reside in a shared file system > > > > below are some diagnostics: > > $ ls -l rdo_1_21810.arc* > > > > -rw-r----- 1 oracle dba 397312 Feb 10 22:30 > rdo_1_21810.arc > > -rw-r----- 1 oracle dba 397312 Feb 10 22:30 > rdo_1_21810.arc > > > > (they have the same inode, btw -- I had done a 'ls -li' earlier but > > the output had rolled off the screen) > > > > after a while , one of the dba scripts gziped the file(s). > Now they > > look like this: > > > > $ ls -liL /export/u10/oraarch/AUCP/rdo_1_21810.arc* > > 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 > > /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz > > 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 > > /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz > > > > These two same files have the same inode also. But the size is way > > too small. > > > > yeah, /export/u10 is pretty hosed... > > > > Pei > > > > -----Original Message----- > > *From:* Pei Ku > > *Sent:* Thu 2/10/2005 11:16 PM > > *To:* IT > > *Cc:* ADS > > *Subject:* possible OCFS /export/u10/ corruption on dbprd* > > > > Ulf, > > > > AUCP had problems creating archive file > > "/export/u10/oraarch/AUCP/rdo_1_21810.arc". After a > few tries, it > > appeared that it was able to -- except that there are *two* > > rdo_1_21810.arc files in it (by the time you look at it, it/they > > probably would get gzipped. We also have a couple of zero-lengh > > gzipped redo log files (which is not normal) in there. > > > > At least the problem had not brought any of the AUCP instances > > down. Manoj and I turned on archiving to an ext3 file system on > > each node for now; archiving to /export/u10/ is still active but > > made optional for now. > > > > My guess /export/u10/ is corrupted in some way. I > still say OCFS > > can't take constant file creation/removing. > > > > We are one rev behind (1.0.12 vs 1.0.13 on ocfs.org). No > > guarantee that 1.0.13 contains the cure... > > > > Pei > > > > -----Original Message----- > > *From:* Oracle [mailto:oracle@dbprd01.autc.com] > > *Sent:* Thu 2/10/2005 10:26 PM > > *To:* DBA; Page DBA; Unix Admin > > *Cc:* > > *Subject:* SL1:dbprd01.autc.com:050210_222600:oalert_mon> > > Alert Log Errors > > > > SEVER_LVL=1 PROG=oalert_mon > > **** oalert_mon.pl: DB=AUCP SID=AUCP1 > > [Thu Feb 10 22:25:21] ORA-19504: failed to create file > > "/export/u10/oraarch/AUCP/rdo_1_21810.arc" > > [Thu Feb 10 22:25:21] ORA-19504: failed to create file > > "/export/u10/oraarch/AUCP/rdo_1_21810.arc" > > [Thu Feb 10 22:25:21] ORA-27040: skgfrcre: create error, > > unable to create file > > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# > 21810 cannot > > be archived > > [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > > '/export/u01/oradata/AUCP/redo12m1.log' > > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > > '/export/u01/oradata/AUCP/redo12m2.log' > > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# > 21810 cannot > > be archived > > [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > > '/export/u01/oradata/AUCP/redo12m1.log' > > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > > '/export/u01/oradata/AUCP/redo12m2.log' > > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# > 21810 cannot > > be archived > > [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > > '/export/u01/oradata/AUCP/redo12m1.log' > > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > > '/export/u01/oradata/AUCP/redo12m2.log' > > > >------------------------------------------------------------- > ----------- > > > >_______________________________________________ > >Ocfs-users mailing list > >Ocfs-users@oss.oracle.com > >http://oss.oracle.com/mailman/listinfo/ocfs-users > > > > >
Pei Ku
2005-Feb-11 16:05 UTC
[Ocfs-users] OCFS file system used as archived redo destination is corrupted
oooh, I forgot to look in /var/log/messages: Feb 10 22:20:55 dbprd01 kernel: (24943) ERROR: status = -2, Linux/ocfsmain.c, 2195 Feb 10 22:25:21 dbprd01 kernel: (27737) ERROR: status = -2, Linux/ocfsmain.c, 2195 The timestamp corresponds to the time stamp of errors in alert.log. Do these errors look familiar? Regarding 'colorls' -- we are not using it, but I neglected to mention that the monitoring scripts are using '-ltr' in order to get a sorted list of files. So I need to get more than just the list of filenames -- stat() can't be avoided. thanks again Pei> -----Original Message----- > From: Sunil Mushran [mailto:Sunil.Mushran@oracle.com] > Sent: Friday, February 11, 2005 1:47 PM > To: Pei Ku > Cc: ocfs-users@oss.oracle.com > Subject: Re: [Ocfs-users] OCFS file system used as archived redo > destination is corrupted > > > No... the on-disk format is the same. 1.0.12 to 1.0.13 does > not require formatting. However, cleaning up the messed up > volume is always a good idea. > > The problem you described was similar could be caused by > a race in the dlm (fixed in 1.0.12) or due to mem alloc failiure > (fixed in 1.0.13). If you see any status=-12 (ENOMEM) errors > in var/log/messages, do upgrade to 1.0.13. > > Do you have colorls enabled? ls does two things.... getdents() > to get the names of the files followed by stat() on each file. > As ocfs v1 does sync reads, each stat() involves disk access. > (strace will show what I am talking about). This becomes an > issue when there are lot of files. > > So, if you need to quickly get a list of files, disable colorls. > > Pei Ku wrote: > > >This file system was created under 1.0.12. > > > >Does upgrade from 1.0.12 to 1.0.13 require reformatting the > file systems? I don't care about the file system we are > using for archived redos (it's pretty screwed up anyway - > it's gonna need a clean swipe). But should I do a full FS > pre-upgrade dump and post-upgrade restore for the file system > used for storing oracle datafiles? Of course I'll do a full > db backup in any case. > > > >Are you saying the problem I described was a known problem > in 1.0.12 and had been fixed in 1.0.13? > > > >Before this problem, our production db had > archive_lag_target set to 15 minutes (in order for the > standby db not to lag behind the production db too much). > Since this is a four-node RAC, it means that there are at > least (60/15)*4=16 archived redos generated per hour > (16*24=384 per day). And the fact that this problem only > appears after several months tells me that OCFS QA process > needs to be more thorough and run for a long time in order to > catch bugs like this. > > > >Another weird thing is when I do a 'ls > /export/u10/oraarch/AUCP', it takes about 15 sec and CPU > usage is like 25+% for that duration. If I do the same > command on multiple nodes, the elapse time might be 30 sec on > each node. It concerns me a simple command like 'ls' can be > that resource intensive and slow. Maybe it's related to the > FS corruption... > > > >thanks > > > >Pei > > > > > > > > > > > >>-----Original Message----- > >>From: Sunil Mushran [mailto:Sunil.Mushran@oracle.com] > >>Sent: Friday, February 11, 2005 11:51 AM > >>To: Pei Ku > >>Cc: ocfs-users@oss.oracle.com > >>Subject: Re: [Ocfs-users] OCFS file system used as archived redo > >>destination is corrupted > >> > >> > >>Looks like the dirnode index is screwed up. The file is showing up > >>twice, but there is only one copy of the file. > >>We had detected a race which could cause this. Was fixed. > >> > >>Did you start on 1.0.12 or run an older version of the module > >>with this > >>device? > >>May want to look into upgrading to atleast 1.0.13. We did > >>some memory alloc > >>changes which were sorely required. > >> > >>As part of our tests, we simulate the archiver... run a script on > >>multiple nodes > >>which constantly creates files. > >> > >>Pei Ku wrote: > >> > >> > >> > >>> > >>>we started using an ocfs file system about 4 months ago as > >>> > >>> > >>the shared > >> > >> > >>>archived redo destination for the 4-node rac instances > (HP dl380, > >>>msa1000, RH AS 2.1) . last night we are seeing some weird > >>> > >>> > >>behavior, > >> > >> > >>>and my guess is the inode directory in the file system is getting > >>>corrupted. I've always had a bad feeling about OCFS not > being very > >>>robust at handling constant file creation and deletion > >>> > >>> > >>(which is what > >> > >> > >>>happens when you use it for archived redo logs). > >>> > >>>ocfs-2.4.9-e-smp-1.0.12-1 is what we are using in production. > >>> > >>>For now, we set up an archo redo dest on a local ext3 FS on > >>> > >>> > >>each node > >> > >> > >>>and made that dest the mandatory dest; we changed the ocfs > >>> > >>> > >>dest to an > >> > >> > >>>optional one. The reason we made ocfs arch redo dest the > >>> > >>> > >>primary dest > >> > >> > >>>a few months ago was because we are planning to migrate to > >>> > >>> > >>rman-based > >> > >> > >>>backup (as opposed to the current hot backup scheme); it's easier > >>>(required?) to manage RAC archived redo logs with rman if archived > >>>redos reside in a shared file system > >>> > >>>below are some diagnostics: > >>>$ ls -l rdo_1_21810.arc* > >>> > >>>-rw-r----- 1 oracle dba 397312 Feb 10 22:30 > >>> > >>> > >>rdo_1_21810.arc > >> > >> > >>>-rw-r----- 1 oracle dba 397312 Feb 10 22:30 > >>> > >>> > >>rdo_1_21810.arc > >> > >> > >>> > >>>(they have the same inode, btw -- I had done a 'ls -li' > earlier but > >>>the output had rolled off the screen) > >>> > >>>after a while , one of the dba scripts gziped the file(s). > >>> > >>> > >>Now they > >> > >> > >>>look like this: > >>> > >>> $ ls -liL /export/u10/oraarch/AUCP/rdo_1_21810.arc* > >>>1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 > >>>/export/u10/oraarch/AUCP/rdo_1_21810.arc.gz > >>>1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00 > >>>/export/u10/oraarch/AUCP/rdo_1_21810.arc.gz > >>> > >>>These two same files have the same inode also. But the > size is way > >>>too small. > >>> > >>>yeah, /export/u10 is pretty hosed... > >>> > >>>Pei > >>> > >>> -----Original Message----- > >>> *From:* Pei Ku > >>> *Sent:* Thu 2/10/2005 11:16 PM > >>> *To:* IT > >>> *Cc:* ADS > >>> *Subject:* possible OCFS /export/u10/ corruption on dbprd* > >>> > >>> Ulf, > >>> > >>> AUCP had problems creating archive file > >>> "/export/u10/oraarch/AUCP/rdo_1_21810.arc". After a > >>> > >>> > >>few tries, it > >> > >> > >>> appeared that it was able to -- except that there are *two* > >>> rdo_1_21810.arc files in it (by the time you look at > it, it/they > >>> probably would get gzipped. We also have a couple of > zero-lengh > >>> gzipped redo log files (which is not normal) in there. > >>> > >>> At least the problem had not brought any of the AUCP instances > >>> down. Manoj and I turned on archiving to an ext3 file > system on > >>> each node for now; archiving to /export/u10/ is still > active but > >>> made optional for now. > >>> > >>> My guess /export/u10/ is corrupted in some way. I > >>> > >>> > >>still say OCFS > >> > >> > >>> can't take constant file creation/removing. > >>> > >>> We are one rev behind (1.0.12 vs 1.0.13 on ocfs.org). No > >>> guarantee that 1.0.13 contains the cure... > >>> > >>> Pei > >>> > >>> -----Original Message----- > >>> *From:* Oracle [mailto:oracle@dbprd01.autc.com] > >>> *Sent:* Thu 2/10/2005 10:26 PM > >>> *To:* DBA; Page DBA; Unix Admin > >>> *Cc:* > >>> *Subject:* SL1:dbprd01.autc.com:050210_222600:oalert_mon> > >>> Alert Log Errors > >>> > >>> SEVER_LVL=1 PROG=oalert_mon > >>> **** oalert_mon.pl: DB=AUCP SID=AUCP1 > >>> [Thu Feb 10 22:25:21] ORA-19504: failed to create file > >>> "/export/u10/oraarch/AUCP/rdo_1_21810.arc" > >>> [Thu Feb 10 22:25:21] ORA-19504: failed to create file > >>> "/export/u10/oraarch/AUCP/rdo_1_21810.arc" > >>> [Thu Feb 10 22:25:21] ORA-27040: skgfrcre: create error, > >>> unable to create file > >>> [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# > >>> > >>> > >>21810 cannot > >> > >> > >>> be archived > >>> [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > >>> [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > >>> '/export/u01/oradata/AUCP/redo12m1.log' > >>> [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > >>> '/export/u01/oradata/AUCP/redo12m2.log' > >>> [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# > >>> > >>> > >>21810 cannot > >> > >> > >>> be archived > >>> [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > >>> [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > >>> '/export/u01/oradata/AUCP/redo12m1.log' > >>> [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > >>> '/export/u01/oradata/AUCP/redo12m2.log' > >>> [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# > >>> > >>> > >>21810 cannot > >> > >> > >>> be archived > >>> [Thu Feb 10 22:25:28] ORA-19504: failed to create file "" > >>> [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > >>> '/export/u01/oradata/AUCP/redo12m1.log' > >>> [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1: > >>> '/export/u01/oradata/AUCP/redo12m2.log' > >>> > >>>------------------------------------------------------------- > >>> > >>> > >>----------- > >> > >> > >>>_______________________________________________ > >>>Ocfs-users mailing list > >>>Ocfs-users@oss.oracle.com > >>>http://oss.oracle.com/mailman/listinfo/ocfs-users > >>> > >>> > >>> > >>> >