thr3ads.net - Ocfs2 users - [Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Alan Hodgson

2015-Sep-19 22:43 UTC

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

I've had this filesystem in production for 8 months or so. It's on an
array of
Intel S3500 SSDs on an LSI hardware raid controller (without trim). 

This filesystem has pretty consistently delivered >500MB/sec writes, up to
300
from any particular guest, and has otherwise been responsive.

Then, within the last couple of days, it is now writing at like 25-50 MB/sec 
on average, and seems to block reads for long enough to cause guest issues.

It is a 2-node cluster, the file system is on top of a DRBD active/active 
cluster. The node interconnection is a dedicated 10 Gbit link. 

The SSD array doesn't seem to be the issue. I have local file systems on the
same array, and they write at close to 1GB/sec. Not quite as fast as new, but 
still decent. 

DRBD still seems to be fast. Resync appears to be happening at over 400 
MB/sec, although not tested extensively as I don't want to resync the whole 
partition. And the issue remains regardless of whether the second node is even 
up.

Writes to ocfs2 with either one or both nodes mounted ... 25-50 MB/sec. And 
super slow/blocked reads within the guests while it's doing them. The
cluster
is really quite screwed as a result. A straight dd to a file on the host 
averages 25MB/sec. Reads are fine, though, well over 1GB/sec.

The file system is a little less than half full. It hosts only KVM guest images 
(raw sparse files).

I have added maybe 300GB of data in the last 24 hours, but I do believe this 
started happening before that.

Random details below, happy to supply anything ... thanks in advance for any 
help.

df:
/dev/drbd0       4216522032 1887421612 2329100420  45% /vmhost

mount:
configfs on /sys/kernel/config type configfs (rw,relatime)
none on /sys/kernel/dlm type ocfs2_dlmfs (rw,relatime)
/dev/drbd0 on /vmhost type ocfs2 
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-
ro,atime_quantum=60,localalloc=53,coherency=full,user_xattr,acl,_netdev)

Kernel 3.18.9, hardened Gentoo.

debugfs.ocfs2 -R "stats" /dev/drbd0:

        Revision: 0.90
        Mount Count: 0   Max Mount Count: 20
        State: 0   Errors: 0
        Check Interval: 0   Last Check: Sat Sep 19 14:02:48 2015
        Creator OS: 0
        Feature Compat: 3 backup-super strict-journal-super
        Feature Incompat: 14160 sparse extended-slotmap inline-data xattr 
indexed-dirs refcount discontig-bg
        Tunefs Incomplete: 0 
        Feature RO compat: 1 unwritten
        Root Blknum: 5   System Dir Blknum: 6
        First Cluster Group Blknum: 3
        Block Size Bits: 12   Cluster Size Bits: 12
        Max Node Slots: 8
        Extended Attributes Inline Size: 256
        Label: vmh1cluster
        UUID: CF2BAA51E994478587983E08B160930E
        Hash: 436666593 (0x1a0700e1)
        DX Seeds: 3101242030 1341766635 3133423927 (0xb8d932ae 0x4ff9bbeb 
0xbac44137)
        Cluster stack: classic o2cb
        Cluster flags: 0 
        Inode: 2   Mode: 00   Generation: 3336532616 (0xc6df7288)
        FS Generation: 3336532616 (0xc6df7288)
        CRC32: 00000000   ECC: 0000
        Type: Unknown   Attr: 0x0   Flags: Valid System Superblock 
        Dynamic Features: (0x0) 
        User: 0 (root)   Group: 0 (root)   Size: 0
        Links: 0   Clusters: 1054130508
        ctime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
        atime: 0x0 0x0 -- Wed Dec 31 16:00:00.0 1969
        mtime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
        dtime: 0x0 -- Wed Dec 31 16:00:00 1969
        Refcount Block: 0
        Last Extblk: 0   Orphan Slot: 0
        Sub Alloc Slot: Global   Sub Alloc Bit: 6553

 o2info --volinfo /dev/drbd0 :
       Label: vmh1cluster
        UUID: CF2BAA51E994478587983E08B160930E
  Block Size: 4096
Cluster Size: 4096
  Node Slots: 8
    Features: backup-super strict-journal-super sparse extended-slotmap 
    Features: inline-data xattr indexed-dirs refcount discontig-bg unwritten

Tariq Saeed

2015-Sep-20 01:03 UTC

head link

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

Hi,
First suspect if fragmented fs. Please run
the attached script and send the ouput.
Thanks.
-Tariq
On 09/19/2015 03:43 PM, Alan Hodgson wrote:> I've had this filesystem in production for 8 months or so. It's on
an array of
> Intel S3500 SSDs on an LSI hardware raid controller (without trim).
>
> This filesystem has pretty consistently delivered >500MB/sec writes, up
to 300
> from any particular guest, and has otherwise been responsive.
>
> Then, within the last couple of days, it is now writing at like 25-50
MB/sec
> on average, and seems to block reads for long enough to cause guest issues.
>
> It is a 2-node cluster, the file system is on top of a DRBD active/active
> cluster. The node interconnection is a dedicated 10 Gbit link.
>
> The SSD array doesn't seem to be the issue. I have local file systems
on the
> same array, and they write at close to 1GB/sec. Not quite as fast as new,
but
> still decent.
>
> DRBD still seems to be fast. Resync appears to be happening at over 400
> MB/sec, although not tested extensively as I don't want to resync the
whole
> partition. And the issue remains regardless of whether the second node is
even
> up.
>
> Writes to ocfs2 with either one or both nodes mounted ... 25-50 MB/sec. And
> super slow/blocked reads within the guests while it's doing them. The
cluster
> is really quite screwed as a result. A straight dd to a file on the host
> averages 25MB/sec. Reads are fine, though, well over 1GB/sec.
>
> The file system is a little less than half full. It hosts only KVM guest
images
> (raw sparse files).
>
> I have added maybe 300GB of data in the last 24 hours, but I do believe
this
> started happening before that.
>
> Random details below, happy to supply anything ... thanks in advance for
any
> help.
>
> df:
> /dev/drbd0       4216522032 1887421612 2329100420  45% /vmhost
>
> mount:
> configfs on /sys/kernel/config type configfs (rw,relatime)
> none on /sys/kernel/dlm type ocfs2_dlmfs (rw,relatime)
> /dev/drbd0 on /vmhost type ocfs2
> (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-
> ro,atime_quantum=60,localalloc=53,coherency=full,user_xattr,acl,_netdev)
>
> Kernel 3.18.9, hardened Gentoo.
>
> debugfs.ocfs2 -R "stats" /dev/drbd0:
>
>          Revision: 0.90
>          Mount Count: 0   Max Mount Count: 20
>          State: 0   Errors: 0
>          Check Interval: 0   Last Check: Sat Sep 19 14:02:48 2015
>          Creator OS: 0
>          Feature Compat: 3 backup-super strict-journal-super
>          Feature Incompat: 14160 sparse extended-slotmap inline-data xattr
> indexed-dirs refcount discontig-bg
>          Tunefs Incomplete: 0
>          Feature RO compat: 1 unwritten
>          Root Blknum: 5   System Dir Blknum: 6
>          First Cluster Group Blknum: 3
>          Block Size Bits: 12   Cluster Size Bits: 12
>          Max Node Slots: 8
>          Extended Attributes Inline Size: 256
>          Label: vmh1cluster
>          UUID: CF2BAA51E994478587983E08B160930E
>          Hash: 436666593 (0x1a0700e1)
>          DX Seeds: 3101242030 1341766635 3133423927 (0xb8d932ae 0x4ff9bbeb
> 0xbac44137)
>          Cluster stack: classic o2cb
>          Cluster flags: 0
>          Inode: 2   Mode: 00   Generation: 3336532616 (0xc6df7288)
>          FS Generation: 3336532616 (0xc6df7288)
>          CRC32: 00000000   ECC: 0000
>          Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
>          Dynamic Features: (0x0)
>          User: 0 (root)   Group: 0 (root)   Size: 0
>          Links: 0   Clusters: 1054130508
>          ctime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
>          atime: 0x0 0x0 -- Wed Dec 31 16:00:00.0 1969
>          mtime: 0x54b593da 0x0 -- Tue Jan 13 13:53:30.0 2015
>          dtime: 0x0 -- Wed Dec 31 16:00:00 1969
>          Refcount Block: 0
>          Last Extblk: 0   Orphan Slot: 0
>          Sub Alloc Slot: Global   Sub Alloc Bit: 6553
>
>   o2info --volinfo /dev/drbd0 :
>         Label: vmh1cluster
>          UUID: CF2BAA51E994478587983E08B160930E
>    Block Size: 4096
> Cluster Size: 4096
>    Node Slots: 8
>      Features: backup-super strict-journal-super sparse extended-slotmap
>      Features: inline-data xattr indexed-dirs refcount discontig-bg
unwritten
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stat_sysdir.sh
Type: application/x-shellscript
Size: 1508 bytes
Desc: not available
Url :
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20150919/5c4c3b99/attachment.bin

Alan Hodgson

2015-Sep-20 01:33 UTC

head link

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

On Saturday, September 19, 2015 03:43:56 PM Alan Hodgson
wrote:> Then, within the last couple of days, it is now writing at like 25-50
MB/sec
> on average, and seems to block reads for long enough to cause guest issues.
So I've had some more time to google on this and I'm thinking this is
probably
a fragmentation issue. systat_dir.sh output at 
https://bpaste.net/show/6532f24afec1

I guess I'll play with localalloc tomorrow, unless someone has a better idea
what's wrong?

Ocfs2 users - Sep 2015 - ocfs2 file system just became very slow and unresponsive for writes

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes

[Ocfs2-users] ocfs2 file system just became very slow and unresponsive for writes