thr3ads.net - Ocfs2 users - [Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Tariq Saeed

2015-Sep-20 04:40 UTC

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

Hi,
You have mounted with>>>>
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-
>>>>
ro,atime_quantum=60,localalloc=53,coherency=full,user_xattr,acl,_netdev)

localalloc=53 is the chunk size in MB local allocator looks
for in the global bit map file when it needs more space.

The default for your fs is 3888 clusters (in sys_stat.sh output you 
attached). Since your cluster size is 4k, that would be 3888*4*1024, or
15.9 MB.  Your choice of 53 in localalloc would make local alloc
chunk size (in units of clusters) (1024*53)/4 = 13568. You can check 
this out
# grep -R . /sys/kernel/debug/ocfs2 | grep "LocalAlloc =>"
/sys/kernel/debug/ocfs2/0B7694290B0C41B58508755D19304B8C/fs_state:LocalAlloc 
=> State: 1  Descriptor: 0  Size: 13568 bits  Default: 13568 bits
<<<<
you will see this.

In the output you provide, you can see there is no contig chunk of size 
 >= 13568 under col "Contig" ( I have looked in first 16 or so
chains
and there is none, maybe there is one way down). This means looking at
lots and lots of chunks and each lookup involves a disk i/o. This is the 
root cause of your problem. You should unmount and mount again
without specifying any value for local alloc or specifying 16 and you
will see a dramatic improvement in performance.
Regards,
-Tariq Saeed




On 09/19/2015 07:23 PM, Alan Hodgson wrote:> Hey, Tariq,
>
> Thanks for taking a look at this!
>
> Output attached.
>
>
> On Saturday, September 19, 2015 06:03:12 PM you wrote:
>> Hi,
>> First suspect if fragmented fs. Please run
>> the attached script and send the ouput.
>> Thanks.
>> -Tariq
>>
>> On 09/19/2015 03:43 PM, Alan Hodgson wrote:
>>> I've had this filesystem in production for 8 months or so.
It's on an
>>> array of Intel S3500 SSDs on an LSI hardware raid controller
(without
>>> trim).
>>>
>>> This filesystem has pretty consistently delivered >500MB/sec
writes, up to
>>> 300 from any particular guest, and has otherwise been responsive.
>>>
>>> Then, within the last couple of days, it is now writing at like
25-50
>>> MB/sec on average, and seems to block reads for long enough to
cause
>>> guest issues.
>>>
>>> It is a 2-node cluster, the file system is on top of a DRBD
active/active
>>> cluster. The node interconnection is a dedicated 10 Gbit link.
>>>
>>> The SSD array doesn't seem to be the issue. I have local file
systems on
>>> the same array, and they write at close to 1GB/sec. Not quite as
fast as
>>> new, but still decent.
>>>
>>> DRBD still seems to be fast. Resync appears to be happening at over
400
>>> MB/sec, although not tested extensively as I don't want to
resync the
>>> whole
>>> partition. And the issue remains regardless of whether the second
node is
>>> even up.
>>>
>>> Writes to ocfs2 with either one or both nodes mounted ... 25-50
MB/sec.
>>> And
>>> super slow/blocked reads within the guests while it's doing
them. The
>>> cluster is really quite screwed as a result. A straight dd to a
file on
>>> the host averages 25MB/sec. Reads are fine, though, well over
1GB/sec.
>>>
>>> The file system is a little less than half full. It hosts only KVM
guest
>>> images (raw sparse files).
>>>
>>> I have added maybe 300GB of data in the last 24 hours, but I do
believe
>>> this started happening before that.
>>>
>>> Random details below, happy to supply anything ... thanks in
advance for
>>> any help.
>>>
>>> df:
>>> /dev/drbd0       4216522032 1887421612 2329100420  45% /vmhost
>>>
>>> mount:
>>> configfs on /sys/kernel/config type configfs (rw,relatime)
>>> none on /sys/kernel/dlm type ocfs2_dlmfs (rw,relatime)
>>> /dev/drbd0 on /vmhost type ocfs2
>>>
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-
>>>
ro,atime_quantum=60,localalloc=53,coherency=full,user_xattr,acl,_netdev)
>>>
>>> Kernel 3.18.9, hardened Gentoo.
>>>
>>> debugfs.ocfs2 -R "stats" /dev/drbd0:
>>>           Revision: 0.90
>>>           Mount Count: 0   Max Mount Count: 20
>>>           State: 0   Errors: 0
>>>           Check Interval: 0   Last Check: Sat Sep 19 14:02:48 2015
>>>           Creator OS: 0
>>>           Feature Compat: 3 backup-super strict-journal-super
>>>           Feature Incompat: 14160 sparse extended-slotmap
inline-data xattr
>>>
>>> indexed-dirs refcount discontig-bg
>>>
>>>           Tunefs Incomplete: 0
>>>           Feature RO compat: 1 unwritten
>>>           Root Blknum: 5   System Dir Blknum: 6
>>>           First Cluster Group Blknum: 3
>>>           Block Size Bits: 12   Cluster Size Bits: 12
>>>           Max Node Slots: 8
>>>           Extended Attributes Inline Size: 256
>>>           Label: vmh1cluster
>>>           UUID: CF2BAA51E994478587983E08B160930E
>>>           Hash: 436666593 (0x1a0700e1)
>>>           DX Seeds: 3101242030 1341766635 3133423927 (0xb8d932ae
0x4ff9bbeb
>>>
>>> 0xbac44137)
>>>
>>>           Cluster stack: classic o2cb
>>>           Cluster flags: 0
>>>           Inode: 2   Mode: 00   Generation: 3336532616 (0xc6df7288)
>>>           FS Generation: 3336532616 (0xc6df7288)
>>>           CRC32: 00000000   ECC: 0000
>>>           Type: Unknown   Attr: 0x0   Flags: Valid System
Superblock
>>>           Dynamic Features: (0x0)
>>>           User: 0 (root)   Group: 0 (root)   Size: 0
>>>           Links: 0   Clusters: 1054130508
>>>           ctime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
>>>           atime: 0x0 0x0 -- Wed Dec 31 16:00:00.0 1969
>>>           mtime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
>>>           dtime: 0x0 -- Wed Dec 31 16:00:00 1969
>>>           Refcount Block: 0
>>>           Last Extblk: 0   Orphan Slot: 0
>>>           Sub Alloc Slot: Global   Sub Alloc Bit: 6553
>>>
>>>    o2info --volinfo /dev/drbd0 :
>>>          Label: vmh1cluster
>>>
>>>           UUID: CF2BAA51E994478587983E08B160930E
>>>
>>>     Block Size: 4096
>>>
>>> Cluster Size: 4096
>>>
>>>     Node Slots: 8
>>>
>>>       Features: backup-super strict-journal-super sparse
extended-slotmap
>>>       Features: inline-data xattr indexed-dirs refcount
discontig-bg
>>>       unwritten
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users

Alan Hodgson

2015-Sep-20 20:56 UTC

head link

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

On Saturday, September 19, 2015 09:40:59 PM Tariq Saeed
wrote:> and there is none, maybe there is one way down). This means looking at
> lots and lots of chunks and each lookup involves a disk i/o. This is the
> root cause of your problem. You should unmount and mount again
> without specifying any value for local alloc or specifying 16 and you
> will see a dramatic improvement in performance.
> Regards,
> -Tariq Saeed
Tariq - thank you!

I brought the cluster down again today and remounted with localalloc=16 and 
I'm back to 700MB/sec writes.

I'm wondering if this is likely to bite me again, though? It seems I may
have
screwed up by initially creating the file system with the default 4k cluster 
size. I could probably bring it down over Christmas and rebuild it with better 
settings if that's likely to prevent future problems.

What would you suggest for mkfs options for an ocfs2 file system that's
about
4TB in size and hosts only sparse virtual machine guest images (so very few, 
large sparse files, with unpredictable allocations and frequent hole punching)?

Ocfs2 users - Sep 2015 - ocfs2 file system just became very slow and unresponsive for writes

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes