thr3ads.net - Gluster users - [Gluster-users] GlusterFS mount crash [Nov 2022]

If this information is useful, please help other people find it:
Share via:

Angel Docampo

2022-Nov-22 08:41 UTC

[Gluster-users] GlusterFS mount crash

Hi Xavi,

The OS is Debian 11 with the proxmox kernel. Gluster packages are the
official from gluster.org (
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/)

The system logs showed no other issues by the time of the crash, no OOM
kill or whatsoever, and no other process was interacting with the gluster
mountpoint besides proxmox.

I wasn't running gdb when it crashed, so I don't really know if I can
obtain a more detailed trace from logs or if there is a simple way to let
it running in the background to see if it happens again (or there is a flag
to start the systemd daemon in debug mode).

Best,

*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
  <angel.docampo at eoniantec.com>  <+34-93-1592929>


El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jahernan at redhat.com>)
escribi?:
> Hi Angel,
>
> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <angel.docampo at
eoniantec.com>
> wrote:
>
>> Sorry for necrobumping this, but this morning I've suffered this on
my
>> Proxmox  + GlusterFS cluster. In the log I can see this
>>
>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>> fbc063cb-874e-475d-b585-f89
>> f7518acdd. [Operation not supported]
>> pending frames:
>> frame : type(1) op(WRITE)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> frame : type(0) op(0)
>> ...
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> frame : type(1) op(FSYNC)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 11
>> time of crash:
>> 2022-11-21 07:38:00 +0000
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 10.3
>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>
>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>
>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>
>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>
>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>> ---------
>> The mount point wasn't accessible with the "Transport endpoint
is not
>> connected" message and it was shown like this.
>> d?????????   ? ?    ?            ?            ? vmdata
>>
>> I had to stop all the VMs on that proxmox node, then stop the gluster
>> daemon to ummount de directory, and after starting the daemon and
>> re-mounting, all was working again.
>>
>> My gluster volume info returns this
>>
>> Volume Name: vmdata
>> Type: Distributed-Disperse
>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 2 x (2 + 1) = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: g01:/data/brick1/brick
>> Brick2: g02:/data/brick2/brick
>> Brick3: g03:/data/brick1/brick
>> Brick4: g01:/data/brick2/brick
>> Brick5: g02:/data/brick1/brick
>> Brick6: g03:/data/brick2/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>> storage.fips-mode-rchecksum: on
>> features.shard: enable
>> features.shard-block-size: 256MB
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.io-cache: off
>> server.event-threads: 2
>> client.event-threads: 3
>> performance.client-io-threads: on
>> performance.stat-prefetch: off
>> dht.force-readdirp: off
>> performance.force-readdirp: off
>> network.remote-dio: on
>> features.cache-invalidation: on
>> performance.parallel-readdir: on
>> performance.readdir-ahead: on
>>
>> Xavi, do you think the open-behind off setting can help somehow? I did
>> try to understand what it does (with no luck), and if it could impact
the
>> performance of my VMs (I've the setup you know so well ;))
>> I would like to avoid more crashings like this, version 10.3 of gluster
>> was working since two weeks ago, quite well until this morning.
>>
>
> I don't think disabling open-behind will have any visible effect on
> performance. Open-behind is only useful for small files when the workload
> is mostly open + read + close, and quick-read is also enabled (which is not
> your case). The only effect it will have is that the latency
"saved" during
> open is "paid" on the next operation sent to the file, so the
total overall
> latency should be the same. Additionally, VM workload doesn't open
files
> frequently, so it shouldn't matter much in any case.
>
> That said, I'm not sure if the problem is the same in your case. Based
on
> the stack of the crash, it seems an issue inside the disperse module.
>
> What OS are you using ? are you using official packages ?  if so, which
> ones ?
>
> Is it possible to provide a backtrace from gdb ?
>
> Regards,
>
> Xavi
>
>
>> *Angel Docampo*
>>
>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>
>>
>> El vie, 19 mar 2021 a las 2:10, David Cunningham (<
>> dcunningham at voisonics.com>) escribi?:
>>
>>> Hi Xavi,
>>>
>>> Thank you for that information. We'll look at upgrading it.
>>>
>>>
>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan at
redhat.com>
>>> wrote:
>>>
>>>> Hi David,
>>>>
>>>> with so little information it's hard to tell, but given
that there are
>>>> several OPEN and UNLINK operations, it could be related to an
already fixed
>>>> bug (in recent versions) in open-behind.
>>>>
>>>> You can try disabling open-behind with this command:
>>>>
>>>>     # gluster volume set <volname> open-behind off
>>>>
>>>> But given the version you are using is very old and
unmaintained, I
>>>> would recommend you to upgrade to 8.x at least.
>>>>
>>>> Regards,
>>>>
>>>> Xavi
>>>>
>>>>
>>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
>>>> dcunningham at voisonics.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We have a GlusterFS 5.13 server which also mounts itself
with the
>>>>> native FUSE client. Recently the FUSE mount crashed and we
found the
>>>>> following in the syslog. There isn't anything logged in
mnt-glusterfs.log
>>>>> for that time. After killing all processes with a file
handle open on the
>>>>> filesystem we were able to unmount and then remount the
filesystem
>>>>> successfully.
>>>>>
>>>>> Would anyone have advice on how to debug this crash? Thank
you in
>>>>> advance!
>>>>>
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames:
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0)
op(0)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0)
op(0)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(UNLINK)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(UNLINK)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(OPEN)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated
3355 times:
>>>>> [ frame : type(1) op(OPEN)]
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(OPEN)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated
6965 times:
>>>>> [ frame : type(1) op(OPEN)]
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1)
op(OPEN)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated
4095 times:
>>>>> [ frame : type(1) op(OPEN)]
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0)
op(0)
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git://
>>>>> git.gluster.org/glusterfs.git
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received:
11
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash:
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09
03:12:31
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration
details:
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string:
glusterfs
>>>>> 5.13
>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
>>>>> ...
>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service: Main
>>>>> process exited, code=killed, status=11/SEGV
>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>> Failed with result 'signal'.
>>>>> ...
>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>> Service hold-off time over, scheduling restart.
>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>> Scheduled restart job, restart counter is at 2.
>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs
sharedstorage.
>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs
>>>>> sharedstorage...
>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR:
Mount
>>>>> point does not exist
>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please
specify a
>>>>> mount point
>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage:
>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8
>>>>> /sbin/mount.glusterfs
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>> ________
>>>>>
>>>>>
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> Schedule -
>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221122/570d3768/attachment.html>

Xavi Hernandez

2022-Nov-22 09:45 UTC

head link

[Gluster-users] GlusterFS mount crash

The crash seems related to some problem in ec xlator, but I don't have
enough information to determine what it is. The crash should have generated
a core dump somewhere in the system (I don't know where Debian keeps the
core dumps). If you find it, you should be able to open it using this
command (make sure debug symbols package is also installed before running
it):

    # gdb /usr/sbin/glusterfs <path to core dump>

And then run this command:

    # bt -full

Regards,

Xavi

On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <angel.docampo at
eoniantec.com>
wrote:
> Hi Xavi,
>
> The OS is Debian 11 with the proxmox kernel. Gluster packages are the
> official from gluster.org (
> https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
> )
>
> The system logs showed no other issues by the time of the crash, no OOM
> kill or whatsoever, and no other process was interacting with the gluster
> mountpoint besides proxmox.
>
> I wasn't running gdb when it crashed, so I don't really know if I
can
> obtain a more detailed trace from logs or if there is a simple way to let
> it running in the background to see if it happens again (or there is a flag
> to start the systemd daemon in debug mode).
>
> Best,
>
> *Angel Docampo*
>
>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>
>
> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jahernan at
redhat.com>)
> escribi?:
>
>> Hi Angel,
>>
>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>> angel.docampo at eoniantec.com> wrote:
>>
>>> Sorry for necrobumping this, but this morning I've suffered
this on my
>>> Proxmox  + GlusterFS cluster. In the log I can see this
>>>
>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>>> fbc063cb-874e-475d-b585-f89
>>> f7518acdd. [Operation not supported]
>>> pending frames:
>>> frame : type(1) op(WRITE)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> frame : type(0) op(0)
>>> ...
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> frame : type(1) op(FSYNC)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 11
>>> time of crash:
>>> 2022-11-21 07:38:00 +0000
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 10.3
>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>
>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>
>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>
>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>> ---------
>>> The mount point wasn't accessible with the "Transport
endpoint is not
>>> connected" message and it was shown like this.
>>> d?????????   ? ?    ?            ?            ? vmdata
>>>
>>> I had to stop all the VMs on that proxmox node, then stop the
gluster
>>> daemon to ummount de directory, and after starting the daemon and
>>> re-mounting, all was working again.
>>>
>>> My gluster volume info returns this
>>>
>>> Volume Name: vmdata
>>> Type: Distributed-Disperse
>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 2 x (2 + 1) = 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: g01:/data/brick1/brick
>>> Brick2: g02:/data/brick2/brick
>>> Brick3: g03:/data/brick1/brick
>>> Brick4: g01:/data/brick2/brick
>>> Brick5: g02:/data/brick1/brick
>>> Brick6: g03:/data/brick2/brick
>>> Options Reconfigured:
>>> nfs.disable: on
>>> transport.address-family: inet
>>> storage.fips-mode-rchecksum: on
>>> features.shard: enable
>>> features.shard-block-size: 256MB
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.io-cache: off
>>> server.event-threads: 2
>>> client.event-threads: 3
>>> performance.client-io-threads: on
>>> performance.stat-prefetch: off
>>> dht.force-readdirp: off
>>> performance.force-readdirp: off
>>> network.remote-dio: on
>>> features.cache-invalidation: on
>>> performance.parallel-readdir: on
>>> performance.readdir-ahead: on
>>>
>>> Xavi, do you think the open-behind off setting can help somehow? I
did
>>> try to understand what it does (with no luck), and if it could
impact the
>>> performance of my VMs (I've the setup you know so well ;))
>>> I would like to avoid more crashings like this, version 10.3 of
gluster
>>> was working since two weeks ago, quite well until this morning.
>>>
>>
>> I don't think disabling open-behind will have any visible effect on
>> performance. Open-behind is only useful for small files when the
workload
>> is mostly open + read + close, and quick-read is also enabled (which is
not
>> your case). The only effect it will have is that the latency
"saved" during
>> open is "paid" on the next operation sent to the file, so the
total overall
>> latency should be the same. Additionally, VM workload doesn't open
files
>> frequently, so it shouldn't matter much in any case.
>>
>> That said, I'm not sure if the problem is the same in your case.
Based on
>> the stack of the crash, it seems an issue inside the disperse module.
>>
>> What OS are you using ? are you using official packages ?  if so, which
>> ones ?
>>
>> Is it possible to provide a backtrace from gdb ?
>>
>> Regards,
>>
>> Xavi
>>
>>
>>> *Angel Docampo*
>>>
>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>>
>>>
>>> El vie, 19 mar 2021 a las 2:10, David Cunningham (<
>>> dcunningham at voisonics.com>) escribi?:
>>>
>>>> Hi Xavi,
>>>>
>>>> Thank you for that information. We'll look at upgrading it.
>>>>
>>>>
>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan at
redhat.com>
>>>> wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>> with so little information it's hard to tell, but given
that there are
>>>>> several OPEN and UNLINK operations, it could be related to
an already fixed
>>>>> bug (in recent versions) in open-behind.
>>>>>
>>>>> You can try disabling open-behind with this command:
>>>>>
>>>>>     # gluster volume set <volname> open-behind off
>>>>>
>>>>> But given the version you are using is very old and
unmaintained, I
>>>>> would recommend you to upgrade to 8.x at least.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Xavi
>>>>>
>>>>>
>>>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
>>>>> dcunningham at voisonics.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We have a GlusterFS 5.13 server which also mounts
itself with the
>>>>>> native FUSE client. Recently the FUSE mount crashed and
we found the
>>>>>> following in the syslog. There isn't anything
logged in mnt-glusterfs.log
>>>>>> for that time. After killing all processes with a file
handle open on the
>>>>>> filesystem we were able to unmount and then remount the
filesystem
>>>>>> successfully.
>>>>>>
>>>>>> Would anyone have advice on how to debug this crash?
Thank you in
>>>>>> advance!
>>>>>>
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending
frames:
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(UNLINK)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(UNLINK)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 3355
>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 6965
>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 4095
>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset:
git://
>>>>>> git.gluster.org/glusterfs.git
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal
received: 11
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of
crash:
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09
03:12:31
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration
details:
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
st_atim.tv_nsec 1
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
package-string: glusterfs
>>>>>> 5.13
>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
>>>>>> ...
>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service: Main
>>>>>> process exited, code=killed, status=11/SEGV
>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>> Failed with result 'signal'.
>>>>>> ...
>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>> Service hold-off time over, scheduling restart.
>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>> Scheduled restart job, restart counter is at 2.
>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount
glusterfs
>>>>>> sharedstorage.
>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount
glusterfs
>>>>>> sharedstorage...
>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]:
ERROR: Mount
>>>>>> point does not exist
>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]:
Please specify a
>>>>>> mount point
>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]:
Usage:
>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]:
man 8
>>>>>> /sbin/mount.glusterfs
>>>>>>
>>>>>> --
>>>>>> David Cunningham, Voisonics Limited
>>>>>> http://voisonics.com/
>>>>>> USA: +1 213 221 1092
>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>> ________
>>>>>>
>>>>>>
>>>>>>
>>>>>> Community Meeting Calendar:
>>>>>>
>>>>>> Schedule -
>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>> ________
>>>>
>>>>
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> Schedule -
>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221122/a5c422aa/attachment.html>

Gluster users - Nov 2022 - GlusterFS mount crash

[Gluster-users] GlusterFS mount crash

[Gluster-users] GlusterFS mount crash