Hi all,
maybe I should add some more information:
The container which filled up the space was running on node x, which
still shows a nearly filled fs:
192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster
nearly the same situation on the underlying brick partition on node x:
zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick
On node y the network card crashed, glusterfs shows the same values:
192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster
but different values on the brick:
zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick
I think this happened because glusterfs still has hardlinks to the
deleted files on node x? So I can find these files with:
find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> '
But now I am lost. How can I verify these files really belongs to the
right container? Or can I just delete this files because there is no way
to access it? Or offers glusterfs a way to solve this situation?
Mathias
On 05.08.20 15:48, Mathias Waack wrote:> Hi all,
>
> we are running a gluster setup with two nodes:
>
> Status of volume: gvol
> Gluster process???????????????????????????? TCP Port? RDMA Port
> Online? Pid
>
------------------------------------------------------------------------------
>
> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350
> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965
> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188
> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003
>
> Task Status of Volume gvol
>
------------------------------------------------------------------------------
>
> There are no active volume tasks
>
> The glusterfs hosts a bunch of containers with its data volumes. The
> underlying fs is zfs. Few days ago one of the containers created a lot
> of files in one of its data volumes, and at the end it completely
> filled up the space of the glusterfs volume. But this happened only on
> one host, on the other host there was still enough space. We finally
> were able to identify this container and found out, the sizes of the
> data on /zbrick were different on both hosts for this container. Now
> we made the big mistake to delete these files on both hosts in the
> /zbrick volume, not on the mounted glusterfs volume.
>
> Later we found the reason for this behavior: the network driver on the
> second node partially crashed (which means we ware able to login on
> the node, so we assumed the network was running, but the card was
> already dropping packets at this time) at the same time, as the failed
> container started to fill up the gluster volume. After rebooting the
> second node? the gluster became available again.
>
> Now the glusterfs volume is running again- but it is still (nearly)
> full: the files created by the container are not visible, but they
> still count into amount of free space. How can we fix this?
>
> In addition there are some files which are no longer accessible since
> this accident:
>
> tail access.log.old
> tail: cannot open 'access.log.old' for reading: Input/output error
>
> Looks like affected by this error are files which have been changed
> during the accident. Is there a way to fix this too?
>
> Thanks
> ??? Mathias
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users