thr3ads.net - Ocfs users - [Ocfs-users] A couple more minor questions about OCFS and RHEL3 [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Derek Suzuki

2004-Mar-07 00:13 UTC

[Ocfs-users] A couple more minor questions about OCFS and RHEL3

Oracle appears to have Wim chained in the basement, forced to answer
mailing list questions at all hours.  I do appreciate it.
 Our cluster has been stable since we installed RAC, but a few minor issues
have me concerned.  First, our storage array seems to maintain continuous
low-level activity even when the database is shut down.  The CPUs spend a
modest amount of time in iowait state while this is going on.  I figure this
might be related to the I/O fencing and inter-node communication features of
OCFS, but I want to verify that this is expected.
 Next, I saw a Metalink thread which suggests that async I/O is not
supported on OCFS with RHAS 2.1.  It doesn't say anything about RHEL3.
We've been using async in our testing with no problems so far, and plan to
use it in production unless Oracle feels the combination is not yet
trustworthy.
 The last issue is that sometimes we see  messages such as the following
from dmesg:

(11637) ERROR: status = -16, Common/ocfsgendlm.c, 1220
(11637) ERROR: status = -16, Common/ocfsgendlm.c, 1285
(11637) ERROR: status = -16, Common/ocfsgendlm.c, 1586
(11637) ERROR: status = -16, Common/ocfsgencreate.c, 1027
(11637) ERROR: status = -16, Common/ocfsgencreate.c, 1770
(12717) ERROR: status = -16, Common/ocfsgendlm.c, 1220
(12717) ERROR: status = -16, Common/ocfsgendlm.c, 1285
(12717) ERROR: status = -16, Common/ocfsgendlm.c, 1586
(12717) ERROR: status = -16, Common/ocfsgencreate.c, 1027
(12717) ERROR: status = -16, Common/ocfsgencreate.c, 1770
(12717) ERROR: status = -16, Common/ocfsgendlm.c, 1220
(12717) ERROR: status = -16, Common/ocfsgendlm.c, 1285
(12717) ERROR: status = -16, Common/ocfsgendlm.c, 1586
(12717) ERROR: status = -16, Common/ocfsgencreate.c, 1027
(12717) ERROR: status = -16, Common/ocfsgencreate.c, 1770

 I think these mostly come up around boot time, so maybe they're related to
mounting cluster filesystems when the other node is down.  The messages do
not come continuously, and the systems behave properly, so I'm just trying
to make sure that this isn't the sign of some subtle error.

Derek

Wim Coekaerts

2004-Mar-07 00:35 UTC

head link

[Ocfs-users] A couple more minor questions about OCFS and RHEL3

heh...
>  Our cluster has been stable since we installed RAC, but a few minor issues
> have me concerned.  First, our storage array seems to maintain continuous
> low-level activity even when the database is shut down.  The CPUs spend a
> modest amount of time in iowait state while this is going on.  I figure
this
> might be related to the I/O fencing and inter-node communication features
of
> OCFS, but I want to verify that this is expected.
ocfs does about 1k wite and 32kb read / second per mounted volume. if
nothing goes on, it write a heartbeat and reads everryone elses (32
sectors worth)
>  Next, I saw a Metalink thread which suggests that async I/O is not
> supported on OCFS with RHAS 2.1.  It doesn't say anything about RHEL3.
> We've been using async in our testing with no problems so far, and plan
to
> use it in production unless Oracle feels the combination is not yet
> trustworthy.
well - tough one, it works, but the big issue is that you rredologfile
need to be contiguous on disk, otherwise you might have failures, exact
same goes for rhel3 as rhas21. you can see that by running debugocfs
eg :
/ocfs/log/foo1.dbf -> debugocfs -f /log/foo1.dbf /dev/sdXXX
that will show how many offsets (should only have one) in the extents
if its more than 1, dd it over with a very large blocksize and see if
that ends up being 1 contig file.

if you do that, everything should work, however, there just hasn't been
enough real testing with aio, need to ggather more evidence.

the reason the logfiles are annoying is because he way aio is
implemented and how we call it, it cannto handle short io's or non
contig aio submits.
>  The last issue is that sometimes we see  messages such as the following
> from dmesg:
> 
> (11637) ERROR: status = -16, Common/ocfsgendlm.c, 1220
> (11637) ERROR: status = -16, Common/ocfsgendlm.c, 1285
> (11637) ERROR: status = -16, Common/ocfsgendlm.c, 1586
> (11637) ERROR: status = -16, Common/ocfsgencreate.c, 1027
> (11637) ERROR: status = -16, Common/ocfsgencreate.c, 1770
> (12717) ERROR: status = -16, Common/ocfsgendlm.c, 1220
> (12717) ERROR: status = -16, Common/ocfsgendlm.c, 1285
> (12717) ERROR: status = -16, Common/ocfsgendlm.c, 1586
> (12717) ERROR: status = -16, Common/ocfsgencreate.c, 1027
> (12717) ERROR: status = -16, Common/ocfsgencreate.c, 1770
> (12717) ERROR: status = -16, Common/ocfsgendlm.c, 1220
> (12717) ERROR: status = -16, Common/ocfsgendlm.c, 1285
> (12717) ERROR: status = -16, Common/ocfsgendlm.c, 1586
> (12717) ERROR: status = -16, Common/ocfsgencreate.c, 1027
> (12717) ERROR: status = -16, Common/ocfsgencreate.c, 1770
> 
>  I think these mostly come up around boot time, so maybe they're
related to
> mounting cluster filesystems when the other node is down.  The messages do
> not come continuously, and the systems behave properly, so I'm just
trying
> to make sure that this isn't the sign of some subtle error.
hmm have to look at the code for this , get ebusy, sounds like dlm and
trying to get access to a file thats in use

you know when things are serious yoy really ought to call support, don't
rely on this maillist for production problems ;) mileage may vary ;)

Reasonably Related Threads

Search for more possibly parallel threads

Ocfs users - Mar 2004 - A couple more minor questions about OCFS and RHEL3

[Ocfs-users] A couple more minor questions about OCFS and RHEL3

[Ocfs-users] A couple more minor questions about OCFS and RHEL3

Reasonably Related Threads