I've taken a look into all possible places they should be, and I
couldn't
find it anywhere. Some people say the dump file is generated where the
application is running... well, I don't know where to look then, and I hope
they hadn't been generated on the failed mountpoint.
As Debian 11 has systemd, I've installed systemd-coredump, so in the case a
new crash happens, at least I will have the exact location and tool
(coredumpctl) to find them and will install then the debug symbols, which
is particularly tricky on debian. But I need to wait to happen again, now
the tool says there isn't any core dump on the system.
Thank you, Xavi, if this happens again (let's hope it won't), I will
report
back.
Best regards!
*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
<angel.docampo at eoniantec.com> <+34-93-1592929>
El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jahernan at redhat.com>)
escribi?:
> The crash seems related to some problem in ec xlator, but I don't have
> enough information to determine what it is. The crash should have generated
> a core dump somewhere in the system (I don't know where Debian keeps
the
> core dumps). If you find it, you should be able to open it using this
> command (make sure debug symbols package is also installed before running
> it):
>
> # gdb /usr/sbin/glusterfs <path to core dump>
>
> And then run this command:
>
> # bt -full
>
> Regards,
>
> Xavi
>
> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <angel.docampo at
eoniantec.com>
> wrote:
>
>> Hi Xavi,
>>
>> The OS is Debian 11 with the proxmox kernel. Gluster packages are the
>> official from gluster.org (
>>
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>> )
>>
>> The system logs showed no other issues by the time of the crash, no OOM
>> kill or whatsoever, and no other process was interacting with the
gluster
>> mountpoint besides proxmox.
>>
>> I wasn't running gdb when it crashed, so I don't really know if
I can
>> obtain a more detailed trace from logs or if there is a simple way to
let
>> it running in the background to see if it happens again (or there is a
flag
>> to start the systemd daemon in debug mode).
>>
>> Best,
>>
>> *Angel Docampo*
>>
>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>> <angel.docampo at eoniantec.com> <+34-93-1592929>
>>
>>
>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jahernan at
redhat.com>)
>> escribi?:
>>
>>> Hi Angel,
>>>
>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>> angel.docampo at eoniantec.com> wrote:
>>>
>>>> Sorry for necrobumping this, but this morning I've suffered
this on my
>>>> Proxmox + GlusterFS cluster. In the log I can see this
>>>>
>>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
>>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>>>> fbc063cb-874e-475d-b585-f89
>>>> f7518acdd. [Operation not supported]
>>>> pending frames:
>>>> frame : type(1) op(WRITE)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> ...
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 11
>>>> time of crash:
>>>> 2022-11-21 07:38:00 +0000
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 10.3
>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>>
>>>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>>
>>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>>
>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>>> ---------
>>>> The mount point wasn't accessible with the "Transport
endpoint is not
>>>> connected" message and it was shown like this.
>>>> d????????? ? ? ? ? ? vmdata
>>>>
>>>> I had to stop all the VMs on that proxmox node, then stop the
gluster
>>>> daemon to ummount de directory, and after starting the daemon
and
>>>> re-mounting, all was working again.
>>>>
>>>> My gluster volume info returns this
>>>>
>>>> Volume Name: vmdata
>>>> Type: Distributed-Disperse
>>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: g01:/data/brick1/brick
>>>> Brick2: g02:/data/brick2/brick
>>>> Brick3: g03:/data/brick1/brick
>>>> Brick4: g01:/data/brick2/brick
>>>> Brick5: g02:/data/brick1/brick
>>>> Brick6: g03:/data/brick2/brick
>>>> Options Reconfigured:
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>> storage.fips-mode-rchecksum: on
>>>> features.shard: enable
>>>> features.shard-block-size: 256MB
>>>> performance.read-ahead: off
>>>> performance.quick-read: off
>>>> performance.io-cache: off
>>>> server.event-threads: 2
>>>> client.event-threads: 3
>>>> performance.client-io-threads: on
>>>> performance.stat-prefetch: off
>>>> dht.force-readdirp: off
>>>> performance.force-readdirp: off
>>>> network.remote-dio: on
>>>> features.cache-invalidation: on
>>>> performance.parallel-readdir: on
>>>> performance.readdir-ahead: on
>>>>
>>>> Xavi, do you think the open-behind off setting can help
somehow? I did
>>>> try to understand what it does (with no luck), and if it could
impact the
>>>> performance of my VMs (I've the setup you know so well ;))
>>>> I would like to avoid more crashings like this, version 10.3 of
gluster
>>>> was working since two weeks ago, quite well until this morning.
>>>>
>>>
>>> I don't think disabling open-behind will have any visible
effect on
>>> performance. Open-behind is only useful for small files when the
workload
>>> is mostly open + read + close, and quick-read is also enabled
(which is not
>>> your case). The only effect it will have is that the latency
"saved" during
>>> open is "paid" on the next operation sent to the file, so
the total overall
>>> latency should be the same. Additionally, VM workload doesn't
open files
>>> frequently, so it shouldn't matter much in any case.
>>>
>>> That said, I'm not sure if the problem is the same in your
case. Based
>>> on the stack of the crash, it seems an issue inside the disperse
module.
>>>
>>> What OS are you using ? are you using official packages ? if so,
which
>>> ones ?
>>>
>>> Is it possible to provide a backtrace from gdb ?
>>>
>>> Regards,
>>>
>>> Xavi
>>>
>>>
>>>> *Angel Docampo*
>>>>
>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>> <angel.docampo at eoniantec.com>
<+34-93-1592929>
>>>>
>>>>
>>>> El vie, 19 mar 2021 a las 2:10, David Cunningham (<
>>>> dcunningham at voisonics.com>) escribi?:
>>>>
>>>>> Hi Xavi,
>>>>>
>>>>> Thank you for that information. We'll look at upgrading
it.
>>>>>
>>>>>
>>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan
at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> with so little information it's hard to tell, but
given that there
>>>>>> are several OPEN and UNLINK operations, it could be
related to an already
>>>>>> fixed bug (in recent versions) in open-behind.
>>>>>>
>>>>>> You can try disabling open-behind with this command:
>>>>>>
>>>>>> # gluster volume set <volname> open-behind
off
>>>>>>
>>>>>> But given the version you are using is very old and
unmaintained, I
>>>>>> would recommend you to upgrade to 8.x at least.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Xavi
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have a GlusterFS 5.13 server which also mounts
itself with the
>>>>>>> native FUSE client. Recently the FUSE mount crashed
and we found the
>>>>>>> following in the syslog. There isn't anything
logged in mnt-glusterfs.log
>>>>>>> for that time. After killing all processes with a
file handle open on the
>>>>>>> filesystem we were able to unmount and then remount
the filesystem
>>>>>>> successfully.
>>>>>>>
>>>>>>> Would anyone have advice on how to debug this
crash? Thank you in
>>>>>>> advance!
>>>>>>>
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending
frames:
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(UNLINK)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(UNLINK)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 3355
>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 6965
>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 4095
>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset:
git://
>>>>>>> git.gluster.org/glusterfs.git
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal
received: 11
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of
crash:
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
2021-03-09 03:12:31
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
configuration details:
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace
1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
libpthread 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
llistxattr 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock
1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
st_atim.tv_nsec 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
package-string: glusterfs
>>>>>>> 5.13
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
>>>>>>> ...
>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Main process exited, code=killed, status=11/SEGV
>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Failed with result 'signal'.
>>>>>>> ...
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Service hold-off time over, scheduling restart.
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Scheduled restart job, restart counter is at 2.
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount
glusterfs
>>>>>>> sharedstorage.
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount
glusterfs
>>>>>>> sharedstorage...
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: ERROR: Mount
>>>>>>> point does not exist
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Please specify
>>>>>>> a mount point
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Usage:
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: man 8
>>>>>>> /sbin/mount.glusterfs
>>>>>>>
>>>>>>> --
>>>>>>> David Cunningham, Voisonics Limited
>>>>>>> http://voisonics.com/
>>>>>>> USA: +1 213 221 1092
>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>> ________
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Community Meeting Calendar:
>>>>>>>
>>>>>>> Schedule -
>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>> ________
>>>>>
>>>>>
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> Schedule -
>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221122/55402205/attachment.html>