Hi Liam,
I saw that your XFS uses ?imaxpct=25? which for an arbiter brick is a little bit
low.
If you have free space on the bricks, increase the maxpct to a bigger value,
like:xfs_growfs -m 80 /path/to/brickThat will set 80% of the Filesystem for
inodes, which you can verify with df -i /brick/path (compare before and
after).?This way?you won?t run out of inodes in the future.
Of course, always test that on non Prod first.
Are you using the volume for VM disk storage domain ? What is your main workload
?
Best Regards,Strahil Nikolov?
On Tuesday, July 4, 2023, 2:12 PM, Liam Smith <liam.smith at ek.co> wrote:
#yiv8784601153 P {margin-top:0;margin-bottom:0;}Hi,
Thanks for your response, please find the xfs_info for each brick on the arbiter
below:
root at uk3-prod-gfs-arb-01:~# xfs_info
/data/glusterfs/gv1/brick1meta-data=/dev/sdc1 ? ? ? ? ? ? ?isize=512 ?
?agcount=31, agsize=131007 blks? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ?
attr=2, projid32bit=1? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? crc=1 ? ? ? ?finobt=1,
sparse=1, rmapbt=0? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? reflink=1data ? ? = ? ? ? ? ?
? ? ? ? ? ? bsize=4096 ? blocks=3931899, imaxpct=25? ? ? ? ?= ? ? ? ? ? ? ? ? ?
? ? sunit=0 ? ? ?swidth=0 blksnaming ? =version 2 ? ? ? ? ? ? ?bsize=4096 ?
ascii-ci=0, ftype=1log ? ? ?=internal log ? ? ? ? ? bsize=4096 ? blocks=2560,
version=2? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? sunit=0 blks,
lazy-count=1realtime =none ? ? ? ? ? ? ? ? ? extsz=4096 ? blocks=0, rtextents=0
root at uk3-prod-gfs-arb-01:~# xfs_info
/data/glusterfs/gv1/brick2meta-data=/dev/sde1 ? ? ? ? ? ? ?isize=512 ?
?agcount=13, agsize=327616 blks? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ?
attr=2, projid32bit=1? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? crc=1 ? ? ? ?finobt=1,
sparse=1, rmapbt=0? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? reflink=1data ? ? = ? ? ? ? ?
? ? ? ? ? ? bsize=4096 ? blocks=3931899, imaxpct=25? ? ? ? ?= ? ? ? ? ? ? ? ? ?
? ? sunit=0 ? ? ?swidth=0 blksnaming ? =version 2 ? ? ? ? ? ? ?bsize=4096 ?
ascii-ci=0, ftype=1log ? ? ?=internal log ? ? ? ? ? bsize=4096 ? blocks=2560,
version=2? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? sunit=0 blks,
lazy-count=1realtime =none ? ? ? ? ? ? ? ? ? extsz=4096 ? blocks=0, rtextents=0
root at uk3-prod-gfs-arb-01:~# xfs_info
/data/glusterfs/gv1/brick3meta-data=/dev/sdd1 ? ? ? ? ? ? ?isize=512 ?
?agcount=13, agsize=327616 blks? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ?
attr=2, projid32bit=1? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? crc=1 ? ? ? ?finobt=1,
sparse=1, rmapbt=0? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? reflink=1data ? ? = ? ? ? ? ?
? ? ? ? ? ? bsize=4096 ? blocks=3931899, imaxpct=25? ? ? ? ?= ? ? ? ? ? ? ? ? ?
? ? sunit=0 ? ? ?swidth=0 blksnaming ? =version 2 ? ? ? ? ? ? ?bsize=4096 ?
ascii-ci=0, ftype=1log ? ? ?=internal log ? ? ? ? ? bsize=4096 ? blocks=2560,
version=2? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? sunit=0 blks,
lazy-count=1realtime =none ? ? ? ? ? ? ? ? ? extsz=4096 ? blocks=0, rtextents=0
I've also copied below some df output from the arb server:
root at uk3-prod-gfs-arb-01:~# df -hiFilesystem ? ? ? ? ? Inodes IUsed IFree
IUse% Mounted onudev ? ? ? ? ? ? ? ? ? 992K ? 473 ?991K ? ?1% /devtmpfs ? ? ? ?
? ? ? ? ?995K ? 788 ?994K ? ?1% /run/dev/sda1 ? ? ? ? ? ? ?768K ?105K ?664K ?
14% /tmpfs ? ? ? ? ? ? ? ? ?995K ? ? 3 ?995K ? ?1% /dev/shmtmpfs ? ? ? ? ? ? ? ?
?995K ? ? 4 ?995K ? ?1% /run/locktmpfs ? ? ? ? ? ? ? ? ?995K ? ?18 ?995K ? ?1%
/sys/fs/cgroup/dev/sdb1 ? ? ? ? ? ? ?128K ? 113 ?128K ? ?1%
/var/lib/glusterd/dev/sdd1 ? ? ? ? ? ? ?7.5M ?2.6M ?5.0M ? 35%
/data/glusterfs/gv1/brick3/dev/sdc1 ? ? ? ? ? ? ?7.5M ?600K ?7.0M ? ?8%
/data/glusterfs/gv1/brick1/dev/sde1 ? ? ? ? ? ? ?6.4M ?2.9M ?3.5M ? 46%
/data/glusterfs/gv1/brick2uk1-prod-gfs-01:/gv1 ? 150M ?6.5M ?144M ? ?5%
/mnt/gfstmpfs ? ? ? ? ? ? ? ? ?995K ? ?21 ?995K ? ?1% /run/user/1004
root at uk3-prod-gfs-arb-01:~# df -hFilesystem ? ? ? ? ? ?Size ?Used Avail Use%
Mounted onudev ? ? ? ? ? ? ? ? ?3.9G ? ? 0 ?3.9G ? 0% /devtmpfs ? ? ? ? ? ? ? ?
796M ?916K ?795M ? 1% /run/dev/sda1 ? ? ? ? ? ? ?12G ?3.9G ?7.3G ?35% /tmpfs ? ?
? ? ? ? ? ? 3.9G ?8.0K ?3.9G ? 1% /dev/shmtmpfs ? ? ? ? ? ? ? ? 5.0M ? ? 0 ?5.0M
? 0% /run/locktmpfs ? ? ? ? ? ? ? ? 3.9G ? ? 0 ?3.9G ? 0%
/sys/fs/cgroup/dev/sdb1 ? ? ? ? ? ? 2.0G ?456K ?1.9G ? 1%
/var/lib/glusterd/dev/sdd1 ? ? ? ? ? ? ?15G ? 12G ?3.5G ?78%
/data/glusterfs/gv1/brick3/dev/sdc1 ? ? ? ? ? ? ?15G ?2.6G ? 13G ?18%
/data/glusterfs/gv1/brick1/dev/sde1 ? ? ? ? ? ? ?15G ? 14G ?1.8G ?89%
/data/glusterfs/gv1/brick2uk1-prod-gfs-01:/gv1 ?300G ?139G ?162G ?47%
/mnt/gfstmpfs ? ? ? ? ? ? ? ? 796M ? ? 0 ?796M ? 0% /run/user/1004
Something I forgot to mention in my initial message is that the opversion was
upgraded from 70200 to 100000, which seems as though it could have been a
trigger for the issue as well.
Thanks,
|
| ? |
|
|
|
| Liam?Smith???? |
| Linux?Systems?Support?Engineer,?Scholar |
|
|
| ? |
|
| |
|
|
From: Strahil Nikolov <hunter86_bg at yahoo.com>
Sent: 03 July 2023 18:28
To: Liam Smith <liam.smith at ek.co>; gluster-users at gluster.org
<gluster-users at gluster.org>
Subject: Re: [Gluster-users] remove_me files building up?
|
CAUTION:?This e-mail originates from outside of Ekco. Do not click links or
attachments unless you recognise the sender.
|
Hi,
you mentioned that the arbiter bricks run out of inodes.Are you using XFS ?Can
you provide the xfs_info of each brick ?
Best Regards,Strahil Nikolov?
The contents of this email message and any attachments are intended solely for
the addressee(s) and may contain confidential and/or privileged information and
may be legally protected from disclosure.
On Sat, Jul 1, 2023 at 19:41, Liam Smith<liam.smith at ek.co> wrote:Hi,
We're running a cluster with two data nodes and one arbiter, and have
sharding enabled.
We had an issue a while back where one of the server's crashed, we got the
server back up and running and ensured that all healing entries cleared, and
also increased the server spec (CPU/Mem) as this seemed to be the potential
cause.
Since then however, we've seen some strange behaviour, whereby a lot of
'remove_me' files are building up under
`/data/glusterfs/gv1/brick2/brick/.shard/.remove_me/` and
`/data/glusterfs/gv1/brick3/brick/.shard/.remove_me/`.?This is causing the
arbiter to run out of space on brick2 and brick3, as the remove_me files are
constantly increasing.
brick1 appears to be fine, the disk usage increases throughout the day and drops
down in line with the trend of the brick on the data nodes. We see the disk
usage increase and drop throughout the day on the data nodes for brick2 and
brick3 as well, but while the arbiter follows the same trend of the disk usage
increasing, it doesn't drop at any point.
This is the output of some gluster commands, occasional heal entries come and
go:
root at uk3-prod-gfs-arb-01:~# gluster volume info gv1
Volume Name: gv1Type: Distributed-ReplicateVolume ID:
d3d1fdec-7df9-4f71-b9fc-660d12c2a046Status: StartedSnapshot Count: 0Number of
Bricks: 3 x (2 + 1) = 9Transport-type: tcpBricks:Brick1:
uk1-prod-gfs-01:/data/glusterfs/gv1/brick1/brickBrick2:
uk2-prod-gfs-01:/data/glusterfs/gv1/brick1/brickBrick3:
uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick1/brick (arbiter)Brick4:
uk1-prod-gfs-01:/data/glusterfs/gv1/brick3/brickBrick5:
uk2-prod-gfs-01:/data/glusterfs/gv1/brick3/brickBrick6:
uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick3/brick (arbiter)Brick7:
uk1-prod-gfs-01:/data/glusterfs/gv1/brick2/brickBrick8:
uk2-prod-gfs-01:/data/glusterfs/gv1/brick2/brickBrick9:
uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick2/brick (arbiter)Options
Reconfigured:cluster.entry-self-heal: oncluster.metadata-self-heal:
oncluster.data-self-heal: onperformance.client-io-threads:
offstorage.fips-mode-rchecksum: ontransport.address-family:
inetcluster.lookup-optimize: offperformance.readdir-ahead:
offcluster.readdir-optimize: offcluster.self-heal-daemon: enablefeatures.shard:
enablefeatures.shard-block-size: 512MBcluster.min-free-disk:
10%cluster.use-anonymous-inode: yes
root at uk3-prod-gfs-arb-01:~# gluster peer status
Number of Peers: 2
Hostname: uk2-prod-gfs-01Uuid: 2fdfa4a2-195d-4cc5-937c-f48466e76149State: Peer
in Cluster (Connected)
Hostname: uk1-prod-gfs-01Uuid: 43ec93d1-ad83-4103-aea3-80ded0903d88State: Peer
in Cluster (Connected)
root at uk3-prod-gfs-arb-01:~# gluster volume heal gv1 info
Brick
uk1-prod-gfs-01:/data/glusterfs/gv1/brick1/brick<gfid:5b57e1f6-3e3d-4334-a0db-b2560adae6d1>Status:
ConnectedNumber of entries: 1
Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick1/brickStatus: ConnectedNumber of
entries: 0
Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick1/brickStatus:
ConnectedNumber of entries: 0
Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick3/brickStatus: ConnectedNumber of
entries: 0
Brick uk2-prod-gfs-01:/data/glusterfs/gv1/brick3/brickStatus: ConnectedNumber of
entries: 0
Brick uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick3/brickStatus:
ConnectedNumber of entries: 0
Brick uk1-prod-gfs-01:/data/glusterfs/gv1/brick2/brickStatus: ConnectedNumber of
entries: 0
Brick
uk2-prod-gfs-01:/data/glusterfs/gv1/brick2/brick<gfid:6ba9c472-9232-4b45-b12f-a1232d6f4627>/.shard/.remove_me<gfid:0f042518-248d-426a-93f4-cfaa92b6ef3e>Status:
ConnectedNumber of entries: 3
Brick
uk3-prod-gfs-arb-01:/data/glusterfs/gv1/brick2/brick<gfid:6ba9c472-9232-4b45-b12f-a1232d6f4627>/.shard/.remove_me<gfid:0f042518-248d-426a-93f4-cfaa92b6ef3e>Status:
ConnectedNumber of entries: 3
root at uk3-prod-gfs-arb-01:~# gluster volume get all cluster.op-versionOption ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Value------ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
-----cluster.op-version ? ? ? ? ? ? ? ? ? ? ? 100000
We're not sure if this is a potential bug or if something's corrupted
that we don't have visibility of, so any pointers/suggestions about how to
approach this would be appreciated.?
Thanks,Liam
|
|
The contents of this email message and any attachments are intended solely for
the addressee(s) and may contain confidential and/or privileged information and
may be legally protected from disclosure.
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230704/e031ab90/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image135755.png
Type: image/png
Size: 24762 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230704/e031ab90/attachment.png>