thr3ads.net - Gluster users - [Gluster-users] Gluster extra large file on brick [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Dan Thomson

2021-Jul-06 16:28 UTC

[Gluster-users] Gluster extra large file on brick

Hi gluster users,

I'm having an issue that I'm hoping to get some help with on a
dispersed volume (EC: 2x(4+2)) that's causing me some headaches. This is
on a cluster running Gluster 6.9 on CentOS 7.

At some point in the last week, writes to one of my bricks have started
failing due to an "No Space Left on Device" error:

[2021-07-06 16:08:57.261307] E [MSGID: 115067]
[server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-01-server: 1853436561:
WRITEV -2 (f2d6f2f8-4fd7-4692-bd60-23124897be54), client:
CTX_ID:648a7383-46c8-4ed7-a921-acafc90bec1a-GRAPH_ID:4-PID:19471-HOST:rhevh08.mgmt.triumf.ca-PC_NAME:gluster-01-client-5-RECON_NO:-5,
error-xlator: gluster-01-posix [No space left on device]

The disk is quite full (listed as 100% on the server), but does have
some writable room left:

/dev/mapper/vg--brick1-brick1                                                   
11T   11T   97G 100% /data/glusterfs/gluster-01/brick1

however, I'm not sure if the amount of disk space used on the physical
drive is the true cause of the "No Space Left on Device" errors
anyway.
I can still manually write to this brick outside of Gluster, so it seems
like the operating system isn't preventing the writes from happening.

During my investigation, I noticed that one .glusterfs paths on the problem
server is using up much more space than it is on the other servers. I can't
quite figure out why that might be, or how that happened. I'm wondering
if there's any advice on what the cause might've been.

I had done some package updates on this server with the issue and not on the
other servers. This included the kernel version, but didn't include the
Gluster
packages. So possibly this, or the reboot to load the new kernel may
have caused a problem. I have scripts on my gluster machines to nicely kill
all of the brick processes before rebooting, so I'm not leaning towards
an abrupt shutdown being the cause, but it's a possibility.

I'm also looking for advice on how to safely remove the problem file and
rebuild it from the other Gluster peers. I've seen some documentation on
this, but I'm a little nervous about corrupting the volume if I
misunderstand the process. I'm not free to take the volume or cluster down
and
do maintenance at this point, but that might be something I'll have to
consider
if it's my only option.

For reference, here's the comparison of the same path that seems to be
taking up extra space on one of the hosts:

1: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
2: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
3: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
4: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
5: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
6: 3.0T    /data/gluster-01/brick1/vol/.glusterfs/99/56

Any and all advice is appreciated.

Thanks!
--

Daniel Thomson
DevOps Engineer
t +1 604 222 7428
dthomson at triumf.ca
TRIUMF Canada's particle accelerator centre
www.triumf.ca @TRIUMFLab
4004 Wesbrook Mall
Vancouver BC V6T 2A3 Canada
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210706/e58aac73/attachment.sig>

Diego Zuccato

2021-Jul-12 13:28 UTC

head link

[Gluster-users] Gluster extra large file on brick

Il 06/07/2021 18:28, Dan Thomson ha scritto:

Hi.

Maybe you're hitting the "reserved space for root" (usually 5%):
when
you try to write from the server directly to the brick, you're mos 
probably doing it from root and you use the reserved space. When you try 
writing from a client you're likely using a normal user and get the "no
space left".
Another possible issue to watch out for, is exhaustion of inodes (I've 
been bitten by it for arbiter bricks partition).

HIH,
Diego
> Hi gluster users,
> 
> I'm having an issue that I'm hoping to get some help with on a
> dispersed volume (EC: 2x(4+2)) that's causing me some headaches. This
is
> on a cluster running Gluster 6.9 on CentOS 7.
> 
> At some point in the last week, writes to one of my bricks have started
> failing due to an "No Space Left on Device" error:
> 
> [2021-07-06 16:08:57.261307] E [MSGID: 115067] 
> [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-01-server: 
> 1853436561: WRITEV -2 (f2d6f2f8-4fd7-4692-bd60-23124897be54), client: 
>
CTX_ID:648a7383-46c8-4ed7-a921-acafc90bec1a-GRAPH_ID:4-PID:19471-HOST:rhevh08.mgmt.triumf.ca-PC_NAME:gluster-01-client-5-RECON_NO:-5,
> error-xlator: gluster-01-posix [No space left on device]
> 
> The disk is quite full (listed as 100% on the server), but does have
> some writable room left:
> 
> /dev/mapper/vg--brick1-brick1
> 11T?? 11T?? 97G 100% /data/glusterfs/gluster-01/brick1
> 
> however, I'm not sure if the amount of disk space used on the physical
> drive is the true cause of the "No Space Left on Device" errors
anyway.
> I can still manually write to this brick outside of Gluster, so it seems
> like the operating system isn't preventing the writes from happening.
> 
> During my investigation, I noticed that one .glusterfs paths on the problem
> server is using up much more space than it is on the other servers. I
can't
> quite figure out why that might be, or how that happened. I'm wondering
> if there's any advice on what the cause might've been.
> 
> I had done some package updates on this server with the issue and not on 
> the
> other servers. This included the kernel version, but didn't include the
> Gluster
> packages. So possibly this, or the reboot to load the new kernel may
> have caused a problem. I have scripts on my gluster machines to nicely kill
> all of the brick processes before rebooting, so I'm not leaning towards
> an abrupt shutdown being the cause, but it's a possibility.
> 
> I'm also looking for advice on how to safely remove the problem file
and
> rebuild it from the other Gluster peers. I've seen some documentation
on
> this, but I'm a little nervous about corrupting the volume if I
> misunderstand the process. I'm not free to take the volume or cluster 
> down and
> do maintenance at this point, but that might be something I'll have to 
> consider
> if it's my only option.
> 
> For reference, here's the comparison of the same path that seems to be
> taking up extra space on one of the hosts:
> 
> 1: 26G???? /data/gluster-01/brick1/vol/.glusterfs/99/56
> 2: 26G???? /data/gluster-01/brick1/vol/.glusterfs/99/56
> 3: 26G???? /data/gluster-01/brick1/vol/.glusterfs/99/56
> 4: 26G???? /data/gluster-01/brick1/vol/.glusterfs/99/56
> 5: 26G???? /data/gluster-01/brick1/vol/.glusterfs/99/56
> 6: 3.0T??? /data/gluster-01/brick1/vol/.glusterfs/99/56
> 
> Any and all advice is appreciated.
> 
> Thanks!
> -- 
> 
> Daniel Thomson
> DevOps Engineer
> t +1 604 222 7428
> dthomson at triumf.ca
> TRIUMF Canada's particle accelerator centre
> www.triumf.ca @TRIUMFLab
> 4004 Wesbrook Mall
> Vancouver BC V6T 2A3 Canada
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Xavi Hernandez

2021-Jul-13 05:34 UTC

head link

[Gluster-users] Gluster extra large file on brick

Hi Dan,

On Mon, Jul 12, 2021 at 2:20 PM Dan Thomson <dthomson at triumf.ca> wrote:
> Hi gluster users,
>
> I'm having an issue that I'm hoping to get some help with on a
> dispersed volume (EC: 2x(4+2)) that's causing me some headaches. This
is
> on a cluster running Gluster 6.9 on CentOS 7.
>
> At some point in the last week, writes to one of my bricks have started
> failing due to an "No Space Left on Device" error:
>
> [2021-07-06 16:08:57.261307] E [MSGID: 115067]
> [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-01-server:
> 1853436561: WRITEV -2 (f2d6f2f8-4fd7-4692-bd60-23124897be54), client:
>
CTX_ID:648a7383-46c8-4ed7-a921-acafc90bec1a-GRAPH_ID:4-PID:19471-HOST:rhevh08.mgmt.triumf.ca-PC_NAME:gluster-01-client-5-RECON_NO:-5,
> error-xlator: gluster-01-posix [No space left on device]
>
> The disk is quite full (listed as 100% on the server), but does have
> some writable room left:
>
> /dev/mapper/vg--brick1-brick1
>               11T   11T   97G 100% /data/glusterfs/gluster-01/brick1
>
> however, I'm not sure if the amount of disk space used on the physical
> drive is the true cause of the "No Space Left on Device" errors
anyway.
> I can still manually write to this brick outside of Gluster, so it seems
> like the operating system isn't preventing the writes from happening.
>
As Strahil has said, you are probably hitting the minimum space reserved by
Gluster. You can try those options. However I don't recommend keeping
bricks above 90% utilization. All filesystems, including XFS, tend to
degrade performance when available space is limited. If the brick's
filesystem works worse, Gluster performance will also drop.

> During my investigation, I noticed that one .glusterfs paths on the problem
> server is using up much more space than it is on the other servers. I
can't
> quite figure out why that might be, or how that happened. I'm wondering
> if there's any advice on what the cause might've been.
>
> I had done some package updates on this server with the issue and not on
> the
> other servers. This included the kernel version, but didn't include the
> Gluster
> packages. So possibly this, or the reboot to load the new kernel may
> have caused a problem. I have scripts on my gluster machines to nicely kill
> all of the brick processes before rebooting, so I'm not leaning towards
> an abrupt shutdown being the cause, but it's a possibility.
>
> I'm also looking for advice on how to safely remove the problem file
and
> rebuild it from the other Gluster peers. I've seen some documentation
on
> this, but I'm a little nervous about corrupting the volume if I
> misunderstand the process. I'm not free to take the volume or cluster
down
> and
> do maintenance at this point, but that might be something I'll have to
> consider
> if it's my only option.
>
> For reference, here's the comparison of the same path that seems to be
> taking up extra space on one of the hosts:
>
> 1: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 2: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 3: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 4: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 5: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 6: 3.0T    /data/gluster-01/brick1/vol/.glusterfs/99/56
>
This is not normal at all. In a dispersed volume all bricks should use
roughly the same used space.

Can you provide the output of the following commands:

    # gluster volume info <volname>
    # gluster volume status <volname>

Also provide the output of this command from all bricks:

    # ls -ls /data/gluster-01/brick1/vol/.glusterfs/99/56

Regards,

Xavi

> Any and all advice is appreciated.
>
> Thanks!
> --
>
> Daniel Thomson
> DevOps Engineer
> t +1 604 222 7428
> dthomson at triumf.ca
> TRIUMF Canada's particle accelerator centre
> www.triumf.ca @TRIUMFLab
> 4004 Wesbrook Mall
> Vancouver BC V6T 2A3 Canada
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210713/c7ea009b/attachment.html>

Gluster users - Jul 2021 - Gluster extra large file on brick

[Gluster-users] Gluster extra large file on brick

[Gluster-users] Gluster extra large file on brick

[Gluster-users] Gluster extra large file on brick