Derek Suzuki
2004-Apr-22 11:48 UTC
[Ocfs-users] A couple more minor questions about OCFS and RHE L3
Sort of a followup... We've been running OCFS in sync mode for a little over a month now, and it has worked reasonably well. Performance is still a bit spotty, but we're told that the next kernel update for RHEL3 should improve the situation. We might eventually move to Polyserve's cluster filesystem for its multipathing capability and potentially better performance, but at least we have a stable, functioning platform for the time being. My DBA still wants to try async mode with OCFS. We followed your recommendations for using dd to recreate the logfiles to be contiguous on disk (if we don't do that, they always seem to come up with two non-contiguous extents) and are doing some testing. We were, however, wondering what kind of symptoms would appear if we were to trigger the non-contiguous aio problem that you have described. We weren't sure if it would result in an immediate failure, or if it would trigger silent corruption that we wouldn't notice until much later. We are also considering the possibility of using OCFS for our datafiles and raw devices for the redologs. In theory that ought to eliminate the problem entirely. It's easy enough to reconstruct individual logfiles with dd, but I'd be worried that someone might forget to do that while building a new database or restoring a dataset from backups. Anyway, I wanted to thank you and Sunil again for all of the helpful info you've provided us during our RAC deployment. Circumstances forced us to go live sooner than we would have liked, and we needed all the help we could get to get everything working. Derek> -----Original Message----- > From: Wim Coekaerts [mailto:wim.coekaerts@oracle.com] > Sent: Saturday, March 06, 2004 10:36 PM > To: Derek Suzuki > Cc: 'ocfs-users@oss.oracle.com' > Subject: Re: [Ocfs-users] A couple more minor questions about OCFS and > RHEL3 > > > Next, I saw a Metalink thread which suggests that async I/O is not > > supported on OCFS with RHAS 2.1. It doesn't say anything > about RHEL3. > > We've been using async in our testing with no problems so > far, and plan to > > use it in production unless Oracle feels the combination is not yet > > trustworthy. > > well - tough one, it works, but the big issue is that you rredologfile > need to be contiguous on disk, otherwise you might have > failures, exact > same goes for rhel3 as rhas21. you can see that by running debugocfs > eg : > /ocfs/log/foo1.dbf -> debugocfs -f /log/foo1.dbf /dev/sdXXX > that will show how many offsets (should only have one) in the extents > if its more than 1, dd it over with a very large blocksize and see if > that ends up being 1 contig file. > > if you do that, everything should work, however, there just > hasn't been > enough real testing with aio, need to ggather more evidence. > > the reason the logfiles are annoying is because he way aio is > implemented and how we call it, it cannto handle short io's or non > contig aio submits. >>
Sunil Mushran
2004-Apr-22 13:06 UTC
[Ocfs-users] A couple more minor questions about OCFS and RHE L3
Symptoms is a failed write in the logfile which leads to an immediate db crash. Any io errors on the logfiles are considered fatal. ORA-27091: unable to queue I/O ORA-27072: File I/O error Derek Suzuki wrote:> Sort of a followup... > We've been running OCFS in sync mode for a little over a month now, >and it has worked reasonably well. Performance is still a bit spotty, but >we're told that the next kernel update for RHEL3 should improve the >situation. We might eventually move to Polyserve's cluster filesystem for >its multipathing capability and potentially better performance, but at least >we have a stable, functioning platform for the time being. > My DBA still wants to try async mode with OCFS. We followed your >recommendations for using dd to recreate the logfiles to be contiguous on >disk (if we don't do that, they always seem to come up with two >non-contiguous extents) and are doing some testing. We were, however, >wondering what kind of symptoms would appear if we were to trigger the >non-contiguous aio problem that you have described. We weren't sure if it >would result in an immediate failure, or if it would trigger silent >corruption that we wouldn't notice until much later. > We are also considering the possibility of using OCFS for our >datafiles and raw devices for the redologs. In theory that ought to >eliminate the problem entirely. It's easy enough to reconstruct individual >logfiles with dd, but I'd be worried that someone might forget to do that >while building a new database or restoring a dataset from backups. > > Anyway, I wanted to thank you and Sunil again for all of the helpful >info you've provided us during our RAC deployment. Circumstances forced us >to go live sooner than we would have liked, and we needed all the help we >could get to get everything working. > >Derek > > > >>-----Original Message----- >>From: Wim Coekaerts [mailto:wim.coekaerts@oracle.com] >>Sent: Saturday, March 06, 2004 10:36 PM >>To: Derek Suzuki >>Cc: 'ocfs-users@oss.oracle.com' >>Subject: Re: [Ocfs-users] A couple more minor questions about OCFS and >>RHEL3 >> >> >> >>> Next, I saw a Metalink thread which suggests that async I/O is not >>>supported on OCFS with RHAS 2.1. It doesn't say anything >>> >>> >>about RHEL3. >> >> >>>We've been using async in our testing with no problems so >>> >>> >>far, and plan to >> >> >>>use it in production unless Oracle feels the combination is not yet >>>trustworthy. >>> >>> >>well - tough one, it works, but the big issue is that you rredologfile >>need to be contiguous on disk, otherwise you might have >>failures, exact >>same goes for rhel3 as rhas21. you can see that by running debugocfs >>eg : >>/ocfs/log/foo1.dbf -> debugocfs -f /log/foo1.dbf /dev/sdXXX >>that will show how many offsets (should only have one) in the extents >>if its more than 1, dd it over with a very large blocksize and see if >>that ends up being 1 contig file. >> >>if you do that, everything should work, however, there just >>hasn't been >>enough real testing with aio, need to ggather more evidence. >> >>the reason the logfiles are annoying is because he way aio is >>implemented and how we call it, it cannto handle short io's or non >>contig aio submits. >> >> >_______________________________________________ >Ocfs-users mailing list >Ocfs-users@oss.oracle.com >http://oss.oracle.com/mailman/listinfo/ocfs-users > >