Hello, I would need help with our OCFS2 (1.8.0) filesystem. We are having problems with it since a couple days. When we write onto it, it hangs. The "hanging pattern" is easily reproductible. If I write a 1GB file on the filesystem, it does the following: - write ~200 MB of data on the disk in 1 second - freeze for about 10 seconds - write ~200 MB of data on the disk in 1 second - freeze for about 10 seconds - write ~200 MB of data on the disk in 1 second - freeze for about 10 seconds (and so on) When the freezes occur: - other writes operations (from other processes) on the same node also freeze - writes operations on other nodes are not affected by the freezes on another node Read operations (on any cluster node, even the one with frozen writes) don't seem to be affected by the freezes. One sure thing, read operations alone don't cause the filesystem freeze. For info, before the problem began to appear we could sustain 640 MB/s writes without any freeze. I tried to mount the filesystem on a single node to avoid issues that could happen with inter-node communications and the problem was still there. Filesystem detailsThe filesystem has 18 TB and it is currently 72% full.Mount options are the following: rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=localAll Features: backup-super strict-journal-super sparse extended-slotmap inline-data metaecc indexed-dirs refcount discontig-bg unwritten There is nothing special in the systems logs beside application errors caused by the freezes. Would a fsck.ocfs2 help? How long would it take for 18 TB? Is there a flag I can enable in debugfs.ocfs2 to get a better idea of what is happening and why it is freezing like that? Any help would be greatly appreciated. Thanks in advance, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121025/c52a4c68/attachment.html
Hello Jeff, You might want to check what the writer process is waiting on when it's frozen. The wchan column of ps might be enough, but if not, then perhaps a kernel stack trace of the process from /proc/<pid>/stack or from echo t > /proc/sysrq-trigger . The latter will show other blocked processes as well, which may be helpful in determining the cause of the freeze. Thanks, Herbert. On 10/25/2012 06:32 PM, Jeff Paterson wrote:> Hello, > > I would need help with our OCFS2 (1.8.0) filesystem. We are having > problems with it since a couple days. When we write onto it, it hangs. > > The "hanging pattern" is easily reproductible. If I write a 1GB file > on the filesystem, it does the following: > - write ~200 MB of data on the disk in 1 second > - freeze for about 10 seconds > - write ~200 MB of data on the disk in 1 second > - freeze for about 10 seconds > - write ~200 MB of data on the disk in 1 second > - freeze for about 10 seconds > (and so on) > > When the freezes occur: > - other writes operations (from other processes) on the same > node also freeze > - writes operations on other nodes are not affected by the > freezes on another node > Read operations (on any cluster node, even the one with frozen writes) > don't seem to be affected by the freezes. One sure thing, read > operations alone don't cause the filesystem freeze. > > For info, before the problem began to appear we could sustain 640 MB/s > writes without any freeze. > > I tried to mount the filesystem on a single node to avoid issues that > could happen with inter-node communications and the problem was still > there. > > > *_Filesystem details_* > > * The filesystem has 18 TB and it is currently 72% full. > * Mount options are the following: > rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=local > * All Features: backup-super strict-journal-super sparse > extended-slotmap inline-data metaecc indexed-dirs refcount > discontig-bg unwritten > > > > There is nothing special in the systems logs beside application errors > caused by the freezes. > > > Would a fsck.ocfs2 help? How long would it take for 18 TB? > > Is there a flag I can enable in debugfs.ocfs2 to get a better idea of > what is happening and why it is freezing like that? > > > Any help would be greatly appreciated. > > Thanks in advance, > > Jeff > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121025/aa844a3b/attachment.html
I believe the problem could be due to fragmentation. 1) Can you run the following script and email me the output https://oss.oracle.com/~seeda/misc/stat_sysdir.sh run it as stat_sysdir.sh -d <dev> 2) can you also do the following and provide me the fs state mount -t debugfs debugfs /sys/kernel/debug cat /sys/kernel/debug/ocfs2/*/fs_state On 10/25/2012 6:32 PM, Jeff Paterson wrote:> Hello, > > I would need help with our OCFS2 (1.8.0) filesystem. We are having > problems with it since a couple days. When we write onto it, it hangs. > > The "hanging pattern" is easily reproductible. If I write a 1GB file > on the filesystem, it does the following: > - write ~200 MB of data on the disk in 1 second > - freeze for about 10 seconds > - write ~200 MB of data on the disk in 1 second > - freeze for about 10 seconds > - write ~200 MB of data on the disk in 1 second > - freeze for about 10 seconds > (and so on) > > When the freezes occur: > - other writes operations (from other processes) on the same > node also freeze > - writes operations on other nodes are not affected by the > freezes on another node > Read operations (on any cluster node, even the one with frozen writes) > don't seem to be affected by the freezes. One sure thing, read > operations alone don't cause the filesystem freeze. > > For info, before the problem began to appear we could sustain 640 MB/s > writes without any freeze. > > I tried to mount the filesystem on a single node to avoid issues that > could happen with inter-node communications and the problem was still > there. > > > *_Filesystem details_* > > * The filesystem has 18 TB and it is currently 72% full. > * Mount options are the following: > rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=local > * All Features: backup-super strict-journal-super sparse > extended-slotmap inline-data metaecc indexed-dirs refcount > discontig-bg unwritten > > > > There is nothing special in the systems logs beside application errors > caused by the freezes. > > > Would a fsck.ocfs2 help? How long would it take for 18 TB? > > Is there a flag I can enable in debugfs.ocfs2 to get a better idea of > what is happening and why it is freezing like that? > > > Any help would be greatly appreciated. > > Thanks in advance, > > Jeff > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121025/1a246f3f/attachment.html
Hello Scott,
I had help from an Oracle developer, Srinivas, and he fixed my issue. Thanks
again Srini !!
Disclaimer: I am not technically knowledgable of the OCFS2 filesystem so I will
explain what I understood from a discussion with Srini.
The main issue was that my filesystem is getting more and more fragmented and
the default write pre-allocation window (localalloc bitmap) was set too big for
such fragmented filesystem. For what I understood, when you are doing writes on
a OCFS2 filesystem, the filesystem reserves a chunk of space before beginning to
write onto the filesystem. Even if you are writing a very small file, the
filesystem will always reserve that chunk of space (I assume this helps reduce
fragmentation ?!)
According to my filesystem setup, the size of the pre-allocated chunks was set
at 136 MB so it meant the filesystem needed to find 136 MB of contiguous space
every time a write was being done. That caused delays because it had hard time
finding them ...
Srini showed me how to reduce the pre-allocated chunks size (localalloc bitmap)
to a smaller size (16 MB instead of 136 MB) and, since then, everything works as
new. The solution to my problem was to add localalloc=16 to my filesystem
mount options, umount/mount the filesystem and everything was fixed.
[root at fileserv01 ~]# grep tier2-ocfs2 /etc/fstab LABEL=tier2-ocfs2
/tier2-ocfs2 ocfs2
_netdev,nodev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,localalloc=16
0 0
For info, you can view your current localalloc setting by looking at the
fs_state in the debugfs.
You first need to mount the virtual debugfs filesystem if it's not already
mounted:
[root at fileserv01 ~]# grep debugfs /etc/fstab debugfs /sys/kernel/debug
debugfs 0 0
My localalloc settings before the change:
[root at fileserv01 ~]# grep "LocalAlloc ="
/sys/kernel/debug/ocfs2/*/fs_stateLocalAlloc => State: 1 Descriptor: 0
Size: 17441 bits Default: 29696 bits
My localalloc settings after the change
[root at fileserv01 ~]# grep "LocalAlloc ="
/sys/kernel/debug/ocfs2/*/fs_stateLocalAlloc => State: 1 Descriptor: 0
Size: 2048 bits Default: 2048 bits
What you will be missing from my above post is the analysis from Srini where he
found that they were not many "136 MB" chunks of contiguous space on
my filesystem and therefore that tuning was definitely going to help.
I hope this post may help you and others.
Jeff
p.s. sadly, there is currently no defragmentation tool for OCFS2
From: SKempinski at sjrwmd.com
To: jpaterson23 at hotmail.com; ocfs2-users at oss.oracle.com
Subject: RE: [Ocfs2-users] OCFS2 hanging on writes
Date: Wed, 31 Oct 2012 12:30:00 +0000
Jeff,
Have you found a resolution to this issue?
Lately we've been experiencing intermittent freezing, so I'm curious to
hear more about your issue.
Thanks, Scott
From: ocfs2-users-bounces at oss.oracle.com [ocfs2-users-bounces at
oss.oracle.com] on behalf of Jeff Paterson [jpaterson23 at hotmail.com]
Sent: Thursday, October 25, 2012 9:32 PM
To: ocfs2-users at oss.oracle.com
Subject: [Ocfs2-users] OCFS2 hanging on writes
Hello,
I would need help with our OCFS2 (1.8.0) filesystem. We are having problems
with it since a couple days. When we write onto it, it hangs.
The "hanging pattern" is easily reproductible. If I write a 1GB file
on the filesystem, it does the following:
- write ~200 MB of data on the disk in 1 second
- freeze for about 10 seconds
- write ~200 MB of data on the disk in 1 second
- freeze for about 10 seconds
- write ~200 MB of data on the disk in 1 second
- freeze for about 10 seconds
(and so on)
When the freezes occur:
- other writes operations (from other processes) on the same node also
freeze
- writes operations on other nodes are not affected by the freezes on
another node
Read operations (on any cluster node, even the one with frozen writes) don't
seem to be affected by the freezes. One sure thing, read operations alone
don't
cause the filesystem freeze.
For info, before the problem began to appear we could sustain 640 MB/s writes
without any freeze.
I tried to mount the filesystem on a single node to avoid issues that could
happen with inter-node communications and the problem was still there.
Filesystem details
The filesystem has 18 TB and it is currently 72% full.Mount options are the
following:
rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=localAll
Features: backup-super strict-journal-super sparse extended-slotmap inline-data
metaecc indexed-dirs refcount discontig-bg unwritten
There is nothing special in the systems logs beside application errors caused by
the freezes.
Would a fsck.ocfs2 help? How long would it take for 18 TB?
Is there a flag I can enable in debugfs.ocfs2 to get a better idea of what is
happening and why it is freezing like that?
Any help would be greatly appreciated.
Thanks in advance,
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121031/244cefb6/attachment.html
Jeff Paterson
2012-Nov-20 02:07 UTC
[Ocfs2-users] Other problems with OCFS2 hanging on writes
Hello,
A month ago, following the advice from Srini, we modified our localalloc to be
2048. It made sense at that time because our disk fragmentation was showing
many "2048 clusters" of contiguous space available.
# ./stat_sysdir-analyze.sh stat_sysdir-output.txt-201210260843 Number |of
|clust. | Contiguous cluster size-------------------------------- 3981 510 and
smaller 4929 511 2393 1024 63400 2048 2682 4096 2415 8192 430 16384
and bigger
It all worked very well and saved the day!
Today, we began to have more issues with our disk writes hanging. This time
though it might be a bit different ...
First of all, while the filesystem is even more fragmented than before, there
are still many "2048 clusters" available:
# ./stat_sysdir-analyze.sh stat_sysdir-output.txt-201211191829 Number |of
|clust. | Contiguous cluster size-------------------------------- 4407 510 and
smaller 50270 511 13609 1024 10338 2048 649 4096 347 8192 641 16384
and bigger
and then, strangely, the problem appeared only every 20 minutes, lasting 2-3
minutes each time.
Could the problem be related with some internal process in OCFS2 running every
20 minutes? Orphan scans?
Or could the problem be due to the excessive fragmentation? Should I change
the localalloc size to be lower than 511 clusters instead of 2048 (i.e.
localalloc=3)? Would that actually worsen the fragmentation? For
information, I don't think the filesystem is over-utilized. There are only
12TB used on a 18TB filesystem so that should leave enough space to properly
deal with fragmentation.
.... and finally, the problem suddenly disappeared after I shutdown 2 of the 3
nodes of the cluster. I re-enabled the 2 nodes 30 minutes later and the problem
didn't come back. On the other hand, most of the employees using that
filesystem finished their work day so the load is quite lower now.
Any idea of what happened?
Thanks in advance,
Jeff
p.s. I attached the small script I wrote to analyze the data gathered from
Srini's script (https://oss.oracle.com/~seeda/misc/stat_sysdir.sh)
From: SKempinski at sjrwmd.com
To: jpaterson23 at hotmail.com; ocfs2-users at oss.oracle.com
Subject: RE: [Ocfs2-users] OCFS2 hanging on writes -- SOLVED
Date: Wed, 7 Nov 2012 23:44:06 +0000
Thank you for providing this write-up, Jeff. I think you explained it very
well.
This could be the issue we are experiencing, so I've implemented the
localalloc mount option. Additionally, we have begun to move some of the data
off to a new filesystem. For the past few days we have not seen the slowdown,
so, for now at least, we have
a solution.
Thank you, again.
Scott
From: ocfs2-users-bounces at oss.oracle.com [ocfs2-users-bounces at
oss.oracle.com] on behalf of Jeff Paterson [jpaterson23 at hotmail.com]
Sent: Wednesday, October 31, 2012 9:24 PM
To: Scott Kempinski; ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] OCFS2 hanging on writes -- SOLVED
Hello Scott,
I had help from an Oracle developer, Srinivas, and he fixed my issue. Thanks
again Srini !!
Disclaimer: I am not technically knowledgable of the OCFS2 filesystem so I will
explain what I understood from a discussion with Srini.
The main issue was that my filesystem is getting more and more fragmented and
the default write pre-allocation window (localalloc bitmap) was
set too big for such fragmented filesystem. For what I understood, when you
are doing writes on a OCFS2 filesystem, the filesystem reserves a chunk of space
before beginning to write onto the filesystem. Even if you are
writing a very small file, the filesystem will always reserve that chunk of
space (I assume this helps reduce fragmentation ?!)
According to my filesystem setup, the size of the pre-allocated chunks was set
at 136 MB so it meant the filesystem needed to find 136 MB of contiguous space
every time a write was being done. That caused delays because it
had hard time finding them ...
Srini showed me how to reduce the pre-allocated chunks size (localalloc bitmap)
to a smaller size (16 MB instead of 136 MB) and, since then, everything works as
new. The solution to my problem was to add localalloc=16 to
my filesystem mount options, umount/mount the filesystem and everything was
fixed.
[root at fileserv01 ~]# grep tier2-ocfs2 /etc/fstab
LABEL=tier2-ocfs2 /tier2-ocfs2 ocfs2
_netdev,nodev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,localalloc=16
0
0
For info, you can view your current localalloc setting by looking at the
fs_state in the debugfs.
You first need to mount the virtual debugfs filesystem if it's not already
mounted:
[root at fileserv01 ~]# grep debugfs /etc/fstab
debugfs /sys/kernel/debug debugfs 0 0
My localalloc settings before the change:
[root at fileserv01 ~]# grep "LocalAlloc ="
/sys/kernel/debug/ocfs2/*/fs_state
LocalAlloc => State: 1 Descriptor: 0 Size:
17441 bits Default: 29696 bits
My localalloc settings after the change
[root at fileserv01 ~]# grep "LocalAlloc ="
/sys/kernel/debug/ocfs2/*/fs_state
LocalAlloc => State: 1 Descriptor: 0 Size:
2048 bits Default: 2048 bits
What you will be missing from my above post is the analysis from Srini where he
found that they were not many "136 MB" chunks of contiguous space on
my filesystem and therefore that tuning was definitely going to help.
I hope this post may help you and others.
Jeff
p.s. sadly, there is currently no defragmentation tool for OCFS2
From: SKempinski at sjrwmd.com
To: jpaterson23 at hotmail.com; ocfs2-users at oss.oracle.com
Subject: RE: [Ocfs2-users] OCFS2 hanging on writes
Date: Wed, 31 Oct 2012 12:30:00 +0000
Jeff,
Have you found a resolution to this issue?
Lately we've been experiencing intermittent freezing, so I'm curious to
hear more about your issue.
Thanks, Scott
From: ocfs2-users-bounces at oss.oracle.com [ocfs2-users-bounces at
oss.oracle.com] on behalf of Jeff Paterson [jpaterson23 at hotmail.com]
Sent: Thursday, October 25, 2012 9:32 PM
To: ocfs2-users at oss.oracle.com
Subject: [Ocfs2-users] OCFS2 hanging on writes
Hello,
I would need help with our OCFS2 (1.8.0) filesystem. We are having problems
with it since a couple days. When we write onto it, it hangs.
The "hanging pattern" is easily reproductible. If I write a 1GB file
on the filesystem, it does the following:
- write ~200 MB of data on the disk in 1 second
- freeze for about 10 seconds
- write ~200 MB of data on the disk in 1 second
- freeze for about 10 seconds
- write ~200 MB of data on the disk in 1 second
- freeze for about 10 seconds
(and so on)
When the freezes occur:
- other writes operations (from other processes) on the same node also
freeze
- writes operations on other nodes are not affected by the freezes on
another node
Read operations (on any cluster node, even the one with frozen writes) don't
seem to be affected by the freezes. One sure thing, read operations alone
don't
cause the filesystem freeze.
For info, before the problem began to appear we could sustain 640 MB/s writes
without any freeze.
I tried to mount the filesystem on a single node to avoid issues that could
happen with inter-node communications and the problem was still there.
Filesystem details
The filesystem has 18 TB and it is currently 72% full.Mount options are the
following:
rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=localAll
Features: backup-super strict-journal-super sparse extended-slotmap inline-data
metaecc indexed-dirs refcount discontig-bg unwritten
There is nothing special in the systems logs beside application errors caused by
the freezes.
Would a fsck.ocfs2 help? How long would it take for 18 TB?
Is there a flag I can enable in debugfs.ocfs2 to get a better idea of what is
happening and why it is freezing like that?
Any help would be greatly appreciated.
Thanks in advance,
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121119/c1f9f6df/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stat_sysdir-analyze.sh
Type: application/octet-stream
Size: 924 bytes
Desc: not available
Url :
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121119/c1f9f6df/attachment-0001.obj