Robert Blok
2004-Mar-10 14:54 UTC
[Ocfs-users] copy error + control file corruption in ocfs 1.1 0
Hi Bharat/Philip, I haven't tested it yet. My systems are running RHAS 2.1. I'll come back to let you now the outcome. Regards, Robert. -----Original Message----- From: Bharat_Sajnani@Dell.com To: robert.blok@logicacmg.com Sent: 3/10/2004 7:09 PM Subject: RE: [Ocfs-users] copy error + control file corruption in ocfs 1.10 Hi Robert, Have you had a chance to retest this issue with the new fileutils package that philip recently posted? Also, are you on RHAS 2.1 or RHEL3? Thanks, Bharat -----Original Message----- From: ocfs-users-bounces@oss.oracle.com [mailto:ocfs-users-bounces@oss.oracle.com] On Behalf Of Robert Blok Sent: Wednesday, March 10, 2004 6:15 AM To: 'ocfs-users@oss.oracle.com' Subject: [Ocfs-users] copy error + control file corruption in ocfs 1.10 Wim, Below are two problems I found in testing the newly released ocfs version. Just for your information. Gr, Robert. - Copying of multiple files gives errors [oracle@prac01 test]$ cp --o_direct -R ./a2/* ./backup/a2/. cp: writing `./backup/a2/./ccdata.dbf.bck': Invalid argument cp: writing `./backup/a2/./ccindex.dbf': Invalid argument cp: writing `./backup/a2/./control02.ctl': Invalid argument cp: writing `./backup/a2/./system01.dbf': Invalid argument cp: writing `./backup/a2/./temp01.dbf': Invalid argument cp: writing `./backup/a2/./undotbs01.dbf': Invalid argument cp: writing `./backup/a2/./undotbs01.dbf.bck': Invalid argument cp: writing `./backup/a2/./undotbs02.dbf': Invalid argument [oracle@prac01 test]$ for file in `ls ./a2/*`> do > cp --o_direct -Rp $file ./backup/a2 > done[oracle@prac01 test]$ for file in `ls ./a1/*`; do cp --o_direct -Rp $file ./backup/a1; done [oracle@prac01 test]$ for file in `ls ./r1/*`; do cp --o_direct -Rp $file ./backup/r1; done [oracle@prac01 test]$ for file in `ls ./r2/*`; do cp --o_direct -Rp $file ./backup/r2; done - Creating a corrupt control file [oracle@prac01 a1]$ cp /dev/zero control01.ctl cp: cannot create regular file `control01.ctl': Permission denied --> both instances are crashed! [oracle@prac01 a1]$ rm control01.ctl [oracle@prac01 a1]$ cp ../a2/control02.ctl ./control01.ctl [oracle@prac01 a1]$ svrctl start database -d test bash: svrctl: command not found [oracle@prac01 a1]$ srvctl start database -d test PRKP-1005 : Failed to start up cluster database test ORA-00227: corrupt block detected in controlfile: (block 315, # blocks 1) ORA-00202: controlfile: '/oradata/test/a1/control01.ctl' This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. _______________________________________________ Ocfs-users mailing list Ocfs-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs-users This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Robert Blok
2004-Mar-12 02:23 UTC
[Ocfs-users] copy error + control file corruption in ocfs 1.1 0
Ok Philip, the copy errors where my fault. I didn't install the latest version of the fileutils package (fileutils-4.1-10.19.i386.rpm). The second problem is still there: - Make the first control file corrupt. The database has crashed now. - Copy the second controlfile over the first controlfile: cp --o_direct ../a2/control02.ctl ./control01.ctl - Restart the database: [oracle@prac01 a1]$ srvctl start database -d test PRKP-1005 : Failed to start up cluster database test ORA-00227: corrupt block detected in controlfile: (block 315, # blocks 1) ORA-00202: controlfile: '/oradata/test/a1/control01.ctl' Any ideas? Kind regards, Robert. -----Original Message----- From: Bryce To: Robert Blok Sent: 3/10/2004 11:51 PM Subject: RE: [Ocfs-users] copy error + control file corruption in ocfs 1.1 0 On Wed, 2004-03-10 at 21:42, Robert Blok wrote:> Hi Philip, > > I've installed the fileutils package. When I tried to install it, Igot an> error that the package was already installed. I've used the --forceoption> to install it anyway.*BUZZT* very wrong. I could understand messages about a newer version being already installed but the SAME? no no,... rpm -qV fileutils what version is it? (the -V just check to make sure it all md5sums correctly) You should have this version installed (4.1-10.19) -rw-r--r-- 1 bryce bryce 666451 Mar 10 10:50 fileutils-4.1-10.19.i386.rpm (Check we're running a RHAT AS2.1 kernel) [root@ca-test7-RHAS21 test]# uname -a Linux ca-test7 2.4.9-e.37enterprise #1 SMP Mon Jan 26 11:20:59 EST 2004 i686 unknown (Check we're on an FS that supports O_DIRECT) [root@ca-test7-RHAS21 test]# pwd /oastdbf1/test [root@ca-test7-RHAS21 test]# mount | grep oastdbf1 /dev/sdc1 on /oastdbf1 type ocfs (rw) (Some files I made earlier (128K block increments) remember, O_DIRECT means we need files that are blk sized aligned. In this case our OCFS FS is using a blk size of 128K. That said, odd sized files in O_DIRECT are kinda possible in certain circumstances with AS2.1, AS3 changes the playing field slightly but we're not to worried about that for this issue) [root@ca-test7-RHAS21 test]# ls -l 1 total 5765 -rw-r--r-- 1 root root 131072 Mar 10 08:26 1 -rw-r--r-- 1 root root 262144 Mar 10 08:26 2 -rw-r--r-- 1 root root 393216 Mar 10 08:26 3 -rw-r--r-- 1 root root 524288 Mar 10 08:26 4 -rw-r--r-- 1 root root 655360 Mar 10 08:26 5 -rw-r--r-- 1 root root 786432 Mar 10 08:26 6 -rw-r--r-- 1 root root 917504 Mar 10 08:27 7 -rw-r--r-- 1 root root 1048576 Mar 10 08:27 8 -rw-r--r-- 1 root root 1179648 Mar 10 08:27 9 (Somewhere to copy them) [root@ca-test7-RHAS21 test]# mkdir new (And copy,..) [root@ca-test7-RHAS21 test]# ls -l new total 5765 -rw-r--r-- 1 root root 131072 Mar 10 08:26 1 -rw-r--r-- 1 root root 262144 Mar 10 08:26 2 -rw-r--r-- 1 root root 393216 Mar 10 08:26 3 -rw-r--r-- 1 root root 524288 Mar 10 08:26 4 -rw-r--r-- 1 root root 655360 Mar 10 08:26 5 -rw-r--r-- 1 root root 786432 Mar 10 08:26 6 -rw-r--r-- 1 root root 917504 Mar 10 08:27 7 -rw-r--r-- 1 root root 1048576 Mar 10 08:27 8 -rw-r--r-- 1 root root 1179648 Mar 10 08:27 9 The md5sums all checked out ok as well. The rpm error that the package was already installed screams to me that you have the wrong package 8/ Phil =--> The copying test went wrong. It copies the first file and then onlycreates> the other files leaving 0 byte files: > [oracle@prac01 a1]$ cp --o_direct -Rp ../backup/* ./backup > cp: writing `./backup/a2/undotbs02.dbf': Invalid argument > cp: writing `./backup/a2/undotbs01.dbf.bck': Invalid argument > cp: writing `./backup/a2/undotbs01.dbf': Invalid argument> etcetera... > > I haven't tested the control file yet. I'll try it tomorrow afternoon. > > Regards, > Robert. > > -----Original Message----- > From: Bryce > To: Robert Blok > Cc: 'ocfs-users@oss.oracle.com' > Sent: 3/10/2004 6:10 PM > Subject: Re: [Ocfs-users] copy error + control file corruption in ocfs1.10> > Ok well since the 5.2.0 stuff sucks,... > I've poked at the AS2.1 fileutils-4.1-10 kit again and rebuilt it with > the fix > NOW UNDERSTAND, I've not done a large amount of testing but it doesseem> to cure the issue. > > I've just pushed the kit to the oss server so it'll pick it up in afew> minutes as the following (it harvests any push I make every 15 minutes > or so) > > binary: >http://oss.oracle.com/projects/ocfs/files/supported/RedHat/RHAS2.1/i386/> fileutils-4.1-10.19.i386.rpm > > src: >http://oss.oracle.com/projects/ocfs/files/source/RHAT/RHAS2.1/fileutils-> 4.1-10.19.src.rpm > > Phil > =--> > On Wed, 2004-03-10 at 15:26, Bryce wrote: > > Umm I think you tripped over an oversight on my part > > If you grab the modifed coreutils, unpack and look at the code in > > src/copy.c about line 1858, it crates a buffer for data read in tobe> > stored in > > > > buf = (char *) alloca (buf_size + sizeof (int)); > > > > that line should probably have alloca replaced with valloc > > > > What I think is happening (I'm caught doing port forwarding stuffatm)> > is that the read is being made as O_DIRECT into a buffer that is not > > page aligned. I'll try ad get this port forward stuff don asap andtry> > and get back to a solution for you > > > > Phil > > =--> > > > > > On Wed, 2004-03-10 at 12:15, Robert Blok wrote: > > > Wim, > > > > > > Below are two problems I found in testing the newly released ocfs > version. > > > Just for your information. > > > > > > Gr, > > > Robert. > > > > > > - Copying of multiple files gives errors > > > > > > [oracle@prac01 test]$ cp --o_direct -R ./a2/* ./backup/a2/. > > > cp: writing `./backup/a2/./ccdata.dbf.bck': Invalid argument > > > cp: writing `./backup/a2/./ccindex.dbf': Invalid argument > > > cp: writing `./backup/a2/./control02.ctl': Invalid argument > > > cp: writing `./backup/a2/./system01.dbf': Invalid argument > > > cp: writing `./backup/a2/./temp01.dbf': Invalid argument > > > cp: writing `./backup/a2/./undotbs01.dbf': Invalid argument > > > cp: writing `./backup/a2/./undotbs01.dbf.bck': Invalid argument > > > cp: writing `./backup/a2/./undotbs02.dbf': Invalid argument > > > [oracle@prac01 test]$ for file in `ls ./a2/*` > > > > do > > > > cp --o_direct -Rp $file ./backup/a2 > > > > done > > > [oracle@prac01 test]$ for file in `ls ./a1/*`; do cp --o_direct-Rp> $file > > > ./backup/a1; done > > > [oracle@prac01 test]$ for file in `ls ./r1/*`; do cp --o_direct-Rp> $file > > > ./backup/r1; done > > > [oracle@prac01 test]$ for file in `ls ./r2/*`; do cp --o_direct-Rp> $file > > > ./backup/r2; done > > > > > > - Creating a corrupt control file > > > > > > [oracle@prac01 a1]$ cp /dev/zero control01.ctl > > > cp: cannot create regular file `control01.ctl': Permission denied > > > > > > --> both instances are crashed! > > > > > > [oracle@prac01 a1]$ rm control01.ctl > > > [oracle@prac01 a1]$ cp ../a2/control02.ctl ./control01.ctl > > > [oracle@prac01 a1]$ svrctl start database -d test > > > bash: svrctl: command not found > > > [oracle@prac01 a1]$ srvctl start database -d test > > > PRKP-1005 : Failed to start up cluster database test > > > ORA-00227: corrupt block detected in controlfile: (block 315, # > blocks 1) > > > ORA-00202: controlfile: '/oradata/test/a1/control01.ctl' > > > > > > > > > This e-mail and any attachment is for authorised use by theintended> recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be > copied, disclosed to, retained or used by, any other party. If you are > not an intended recipient then please promptly delete this e-mail and > any attachment and all copies and inform the sender. Thank you. > > > _______________________________________________ > > > Ocfs-users mailing list > > > Ocfs-users@oss.oracle.com > > > http://oss.oracle.com/mailman/listinfo/ocfs-users > > > > _______________________________________________ > > Ocfs-users mailing list > > Ocfs-users@oss.oracle.com > > http://oss.oracle.com/mailman/listinfo/ocfs-users > > This e-mail and any attachment is for authorised use by the intendedrecipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Robert Blok
2004-Mar-17 03:57 UTC
[Ocfs-users] copy error + control file corruption in ocfs 1.1 0
Wim, Philip, Below a log of how I corrupted the control file. Strangely, The controlfile doesn't become 1 byte in size. Actually, I have to abort it. Philip, the lsof doesn't give any open files on both controlfiles: [root@prac01 root]# lsof | grep control [root@prac02 root]# lsof | grep control [root@prac01 a1]# lsof ./control01.ctl [root@prac01 a1]# lsof ../a2/control02.ctl [root@prac02 a1]# lsof ./control01.ctl [root@prac02 a1]# lsof ../a2/control02.ctl Kind Regards, Robert.> how do you corrupt the first control file ? > I guess I don't see this happening at all here[oracle@prac01 test]$ cp -Rp --o_direct ./backup/* . cp: preserving times for `./a1': Operation not permitted cp: preserving times for `./a2': Operation not permitted cp: preserving times for `./r1': Operation not permitted cp: preserving times for `./r2': Operation not permitted [oracle@prac01 test]$ srvctl start database -d test [oracle@prac01 test]$ ps -ef | grep smon oracle 22393 1 0 10:44 ? 00:00:00 ora_smon_test1 oracle 22583 1511 0 10:44 pts/1 00:00:00 grep smon [oracle@prac01 test]$ sqlplus '/ as sysdba' SQL*Plus: Release 9.2.0.4.0 - Production on Wed Mar 17 10:44:59 2004 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options JServer Release 9.2.0.4.0 - Production SQL> exit Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options JServer Release 9.2.0.4.0 - Production [oracle@prac01 test]$ cat /dev/zero [oracle@prac01 test]$ ls -al /dev/zero crw-rw-rw- 1 root root 1, 5 Mar 19 2002 /dev/zero [oracle@prac01 test]$ cp --o_direct /dev/zero ./a1/control01.ctl [oracle@prac01 test]$ ls -al ./a1/control01.ctl -rw-r----- 1 oracle dba 904527872 Mar 10 11:37 ./a1/control01.ctl This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.