thr3ads.net - Gluster users - [Gluster-users] GlusterFS mount crash [Nov 2022]

If this information is useful, please help other people find it:
Share via:

David Cunningham

2021-Mar-19 01:09 UTC

[Gluster-users] GlusterFS mount crash

Hi Xavi,

Thank you for that information. We'll look at upgrading it.


On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan at redhat.com>
wrote:
> Hi David,
>
> with so little information it's hard to tell, but given that there are
> several OPEN and UNLINK operations, it could be related to an already fixed
> bug (in recent versions) in open-behind.
>
> You can try disabling open-behind with this command:
>
>     # gluster volume set <volname> open-behind off
>
> But given the version you are using is very old and unmaintained, I would
> recommend you to upgrade to 8.x at least.
>
> Regards,
>
> Xavi
>
>
> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
> dcunningham at voisonics.com> wrote:
>
>> Hello,
>>
>> We have a GlusterFS 5.13 server which also mounts itself with the
native
>> FUSE client. Recently the FUSE mount crashed and we found the following
in
>> the syslog. There isn't anything logged in mnt-glusterfs.log for
that time.
>> After killing all processes with a file handle open on the filesystem
we
>> were able to unmount and then remount the filesystem successfully.
>>
>> Would anyone have advice on how to debug this crash? Thank you in
advance!
>>
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames:
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355 times:
[
>> frame : type(1) op(OPEN)]
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965 times:
[
>> frame : type(1) op(OPEN)]
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095 times:
[
>> frame : type(1) op(OPEN)]
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git://
>> git.gluster.org/glusterfs.git
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash:
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details:
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs
5.13
>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
>> ...
>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Main
>> process exited, code=killed, status=11/SEGV
>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Failed
>> with result 'signal'.
>> ...
>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service:
Service
>> hold-off time over, scheduling restart.
>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service:
>> Scheduled restart job, restart counter is at 2.
>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs sharedstorage.
>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs
sharedstorage...
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount point
>> does not exist
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify a
>> mount point
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage:
>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8
>> /sbin/mount.glusterfs
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210319/23b5d613/attachment.html>

Angel Docampo

2022-Nov-21 13:33 UTC

head link

[Gluster-users] GlusterFS mount crash

Sorry for necrobumping this, but this morning I've suffered this on my
Proxmox  + GlusterFS cluster. In the log I can see this

[2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
[shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
fbc063cb-874e-475d-b585-f89
f7518acdd. [Operation not supported]
pending frames:
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
...
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2022-11-21 07:38:00 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.3
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]

/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]

/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]

/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
---------
The mount point wasn't accessible with the "Transport endpoint is not
connected" message and it was shown like this.
d?????????   ? ?    ?            ?            ? vmdata

I had to stop all the VMs on that proxmox node, then stop the gluster
daemon to ummount de directory, and after starting the daemon and
re-mounting, all was working again.

My gluster volume info returns this

Volume Name: vmdata
Type: Distributed-Disperse
Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: g01:/data/brick1/brick
Brick2: g02:/data/brick2/brick
Brick3: g03:/data/brick1/brick
Brick4: g01:/data/brick2/brick
Brick5: g02:/data/brick1/brick
Brick6: g03:/data/brick2/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.shard: enable
features.shard-block-size: 256MB
performance.read-ahead: off
performance.quick-read: off
performance.io-cache: off
server.event-threads: 2
client.event-threads: 3
performance.client-io-threads: on
performance.stat-prefetch: off
dht.force-readdirp: off
performance.force-readdirp: off
network.remote-dio: on
features.cache-invalidation: on
performance.parallel-readdir: on
performance.readdir-ahead: on

Xavi, do you think the open-behind off setting can help somehow? I did try
to understand what it does (with no luck), and if it could impact the
performance of my VMs (I've the setup you know so well ;))
I would like to avoid more crashings like this, version 10.3 of gluster was
working since two weeks ago, quite well until this morning.

*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
  <angel.docampo at eoniantec.com>  <+34-93-1592929>


El vie, 19 mar 2021 a las 2:10, David Cunningham (<dcunningham at
voisonics.com>)
escribi?:
> Hi Xavi,
>
> Thank you for that information. We'll look at upgrading it.
>
>
> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan at redhat.com>
wrote:
>
>> Hi David,
>>
>> with so little information it's hard to tell, but given that there
are
>> several OPEN and UNLINK operations, it could be related to an already
fixed
>> bug (in recent versions) in open-behind.
>>
>> You can try disabling open-behind with this command:
>>
>>     # gluster volume set <volname> open-behind off
>>
>> But given the version you are using is very old and unmaintained, I
would
>> recommend you to upgrade to 8.x at least.
>>
>> Regards,
>>
>> Xavi
>>
>>
>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
>> dcunningham at voisonics.com> wrote:
>>
>>> Hello,
>>>
>>> We have a GlusterFS 5.13 server which also mounts itself with the
native
>>> FUSE client. Recently the FUSE mount crashed and we found the
following in
>>> the syslog. There isn't anything logged in mnt-glusterfs.log
for that time.
>>> After killing all processes with a file handle open on the
filesystem we
>>> were able to unmount and then remount the filesystem successfully.
>>>
>>> Would anyone have advice on how to debug this crash? Thank you in
>>> advance!
>>>
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames:
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(UNLINK)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(UNLINK)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355
times: [
>>> frame : type(1) op(OPEN)]
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965
times: [
>>> frame : type(1) op(OPEN)]
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095
times: [
>>> frame : type(1) op(OPEN)]
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git://
>>> git.gluster.org/glusterfs.git
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash:
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details:
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs
5.13
>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
>>> ...
>>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service:
Main
>>> process exited, code=killed, status=11/SEGV
>>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service:
Failed
>>> with result 'signal'.
>>> ...
>>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service:
Service
>>> hold-off time over, scheduling restart.
>>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service:
>>> Scheduled restart job, restart counter is at 2.
>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs
sharedstorage.
>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs
>>> sharedstorage...
>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount
point
>>> does not exist
>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify
a
>>> mount point
>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage:
>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8
>>> /sbin/mount.glusterfs
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221121/87554373/attachment.html>

Gluster users - Nov 2022 - GlusterFS mount crash

[Gluster-users] GlusterFS mount crash

[Gluster-users] GlusterFS mount crash