David Spisla
2019-May-17 09:17 UTC
[Gluster-users] Brick-Xlators crashes after Set-RO and Read
Hello Niels, Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <ndevos at redhat.com>:> On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > Hello Vijay, > > thank you for the clarification. Yes, there is an unconditional > dereference > > in stbuf. It seems plausible that this causes the crash. I think a check > > like this should help: > > > > if (buf == NULL) { > > goto out; > > } > > map_atime_from_server(this, buf); > > > > Is there a reason why buf can be NULL? > > It seems LOOKUP returned an error (errno=13: EACCES: Permission denied). > This is probably something you need to handle in worm_lookup_cbk. There > can be many reasons for a FOP to return an error, why it happened in > this case is a little difficult to say without (much) more details. >Yes, I will look for a way to handle that case. It is intended, that the struct stbuf ist NULL when an error happens? Regards David Spisla> HTH, > Niels > > > > > > Regards > > David Spisla > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > vbellur at redhat.com>: > > > > > Hello David, > > > > > > From the backtrace it looks like stbuf is NULL in > map_atime_from_server() > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can > you > > > please check if there is an unconditional dereference of stbuf in > > > map_atime_from_server()? > > > > > > Regards, > > > Vijay > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> > wrote: > > > > > >> Hello Vijay, > > >> > > >> yes, we are using custom patches. It s a helper function, which is > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > >> Do you think this could be the problem? The functions only manipulates > > >> the atime in struct iattr > > >> > > >> Regards > > >> David Spisla > > >> > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > >> vbellur at redhat.com>: > > >> > > >>> Hello David, > > >>> > > >>> Do you have any custom patches in your deployment? I looked up v5.5 > but > > >>> could not find the following functions referred to in the core: > > >>> > > >>> map_atime_from_server() > > >>> worm_lookup_cbk() > > >>> > > >>> Neither do I see xlator_helper.c in the codebase. > > >>> > > >>> Thanks, > > >>> Vijay > > >>> > > >>> > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > >>> __FUNCTION__ = "map_to_atime_from_server" > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > =0x7fdeac0015c8, > > >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1, > > >>> op_errno=op_errno at entry=13, > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at > > >>> worm.c:531 > > >>> priv = 0x7fdef4075378 > > >>> ret = 0 > > >>> __FUNCTION__ = "worm_lookup_cbk" > > >>> > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> > > >>> wrote: > > >>> > > >>>> Hello Vijay, > > >>>> > > >>>> I could reproduce the issue. After doing a simple DIR Listing from > > >>>> Win10 powershell, all brick processes crashes. Its not the same > scenario > > >>>> mentioned before but the crash report in the bricks log is the same. > > >>>> Attached you find the backtrace. > > >>>> > > >>>> Regards > > >>>> David Spisla > > >>>> > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > >>>> vbellur at redhat.com>: > > >>>> > > >>>>> Hello David, > > >>>>> > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Hello Vijay, > > >>>>>> > > >>>>>> how can I create such a core file? Or will it be created > > >>>>>> automatically if a gluster process crashes? > > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > > >>>>>> > > >>>>> > > >>>>> Generation of core file is dependent on the system configuration. > > >>>>> `man 5 core` contains useful information to generate a core file > in a > > >>>>> directory. Once a core file is generated, you can use gdb to get a > > >>>>> backtrace of all threads (using "thread apply all bt full"). > > >>>>> > > >>>>> > > >>>>>> Unfortunately this bug is not easy to reproduce because it appears > > >>>>>> only sometimes. > > >>>>>> > > >>>>> > > >>>>> If the bug is not easy to reproduce, having a backtrace from the > > >>>>> generated core would be very useful! > > >>>>> > > >>>>> Thanks, > > >>>>> Vijay > > >>>>> > > >>>>> > > >>>>>> > > >>>>>> Regards > > >>>>>> David Spisla > > >>>>>> > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > >>>>>> vbellur at redhat.com>: > > >>>>>> > > >>>>>>> Thank you for the report, David. Do you have core files > available on > > >>>>>>> any of the servers? If yes, would it be possible for you to > provide a > > >>>>>>> backtrace. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Vijay > > >>>>>>> > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hello folks, > > >>>>>>>> > > >>>>>>>> we have a client application (runs on Win10) which does some > FOPs > > >>>>>>>> on a gluster volume which is accessed by SMB. > > >>>>>>>> > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > >>>>>>>> successively and checks if the files data was correctly copied. > While doing > > >>>>>>>> this, all brick processes crashes and in the logs one have this > crash > > >>>>>>>> report on every brick log: > > >>>>>>>> > > >>>>>>>>> > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > gfid: 00000000-0000-0000-0000-000000000001, > req(uid:2000,gid:2000,perm:1,ngrps:1), > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission > denied] > > >>>>>>>>> pending frames: > > >>>>>>>>> frame : type(0) op(27) > > >>>>>>>>> frame : type(0) op(40) > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > >>>>>>>>> signal received: 11 > > >>>>>>>>> time of crash: > > >>>>>>>>> 2019-04-16 08:32:21 > > >>>>>>>>> configuration details: > > >>>>>>>>> argp 1 > > >>>>>>>>> backtrace 1 > > >>>>>>>>> dlfcn 1 > > >>>>>>>>> libpthread 1 > > >>>>>>>>> llistxattr 1 > > >>>>>>>>> setfsid 1 > > >>>>>>>>> spinlock 1 > > >>>>>>>>> epoll.h 1 > > >>>>>>>>> xattr.h 1 > > >>>>>>>>> st_atim.tv_nsec 1 > > >>>>>>>>> package-string: glusterfs 5.5 > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > >>>>>>>>> > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > crashes and again, > > >>>>>>>> one can read this crash report in every brick log: > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > 0-longterm-access-control: > > >>>>>>>>> client: > > >>>>>>>>> > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > acl:-) [Permission > > >>>>>>>>> denied] > > >>>>>>>>> > > >>>>>>>>> pending frames: > > >>>>>>>>> > > >>>>>>>>> frame : type(0) op(27) > > >>>>>>>>> > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > >>>>>>>>> > > >>>>>>>>> signal received: 11 > > >>>>>>>>> > > >>>>>>>>> time of crash: > > >>>>>>>>> > > >>>>>>>>> 2019-05-02 07:43:39 > > >>>>>>>>> > > >>>>>>>>> configuration details: > > >>>>>>>>> > > >>>>>>>>> argp 1 > > >>>>>>>>> > > >>>>>>>>> backtrace 1 > > >>>>>>>>> > > >>>>>>>>> dlfcn 1 > > >>>>>>>>> > > >>>>>>>>> libpthread 1 > > >>>>>>>>> > > >>>>>>>>> llistxattr 1 > > >>>>>>>>> > > >>>>>>>>> setfsid 1 > > >>>>>>>>> > > >>>>>>>>> spinlock 1 > > >>>>>>>>> > > >>>>>>>>> epoll.h 1 > > >>>>>>>>> > > >>>>>>>>> xattr.h 1 > > >>>>>>>>> > > >>>>>>>>> st_atim.tv_nsec 1 > > >>>>>>>>> > > >>>>>>>>> package-string: glusterfs 5.5 > > >>>>>>>>> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > >>>>>>>>> > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > >>>>>>>>> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > >>>>>>>>> > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > >>>>>>>>> > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different > > >>>>>>>> volumes. But both volumes has the same settings: > > >>>>>>>> > > >>>>>>>>> Volume Name: shortterm > > >>>>>>>>> Type: Replicate > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > >>>>>>>>> Status: Started > > >>>>>>>>> Snapshot Count: 0 > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > >>>>>>>>> Transport-type: tcp > > >>>>>>>>> Bricks: > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > >>>>>>>>> Options Reconfigured: > > >>>>>>>>> storage.reserve: 1 > > >>>>>>>>> performance.client-io-threads: off > > >>>>>>>>> nfs.disable: on > > >>>>>>>>> transport.address-family: inet > > >>>>>>>>> user.smb: disable > > >>>>>>>>> features.read-only: off > > >>>>>>>>> features.worm: off > > >>>>>>>>> features.worm-file-level: on > > >>>>>>>>> features.retention-mode: enterprise > > >>>>>>>>> features.default-retention-period: 120 > > >>>>>>>>> network.ping-timeout: 10 > > >>>>>>>>> features.cache-invalidation: on > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > >>>>>>>>> performance.nl-cache: on > > >>>>>>>>> performance.nl-cache-timeout: 600 > > >>>>>>>>> client.event-threads: 32 > > >>>>>>>>> server.event-threads: 32 > > >>>>>>>>> cluster.lookup-optimize: on > > >>>>>>>>> performance.stat-prefetch: on > > >>>>>>>>> performance.cache-invalidation: on > > >>>>>>>>> performance.md-cache-timeout: 600 > > >>>>>>>>> performance.cache-samba-metadata: on > > >>>>>>>>> performance.cache-ima-xattrs: on > > >>>>>>>>> performance.io-thread-count: 64 > > >>>>>>>>> cluster.use-compound-fops: on > > >>>>>>>>> performance.cache-size: 512MB > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > >>>>>>>>> performance.read-ahead: off > > >>>>>>>>> performance.write-behind-window-size: 4MB > > >>>>>>>>> performance.write-behind: on > > >>>>>>>>> storage.build-pgfid: on > > >>>>>>>>> features.utime: on > > >>>>>>>>> storage.ctime: on > > >>>>>>>>> cluster.quorum-type: fixed > > >>>>>>>>> cluster.quorum-count: 2 > > >>>>>>>>> features.bitrot: on > > >>>>>>>>> features.scrub: Active > > >>>>>>>>> features.scrub-freq: daily > > >>>>>>>>> cluster.enable-shared-storage: enable > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> Why can this happen to all Brick processes? I don't understand > the > > >>>>>>>> crash report. The FOPs are nothing special and after restart > brick > > >>>>>>>> processes everything works fine and our application was succeed. > > >>>>>>>> > > >>>>>>>> Regards > > >>>>>>>> David Spisla > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> _______________________________________________ > > >>>>>>>> Gluster-users mailing list > > >>>>>>>> Gluster-users at gluster.org > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > > >>>>>>> > > >>>>>>> > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/904c5686/attachment.html>
Niels de Vos
2019-May-17 09:35 UTC
[Gluster-users] Brick-Xlators crashes after Set-RO and Read
On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote:> Hello Niels, > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <ndevos at redhat.com>: > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > > Hello Vijay, > > > thank you for the clarification. Yes, there is an unconditional > > dereference > > > in stbuf. It seems plausible that this causes the crash. I think a check > > > like this should help: > > > > > > if (buf == NULL) { > > > goto out; > > > } > > > map_atime_from_server(this, buf); > > > > > > Is there a reason why buf can be NULL? > > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission denied). > > This is probably something you need to handle in worm_lookup_cbk. There > > can be many reasons for a FOP to return an error, why it happened in > > this case is a little difficult to say without (much) more details. > > > Yes, I will look for a way to handle that case. > It is intended, that the struct stbuf ist NULL when an error happens?Yes, in most error occasions it will not be possible to get a valid stbuf. Niels> > Regards > David Spisla > > > > HTH, > > Niels > > > > > > > > > > Regards > > > David Spisla > > > > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > > vbellur at redhat.com>: > > > > > > > Hello David, > > > > > > > > From the backtrace it looks like stbuf is NULL in > > map_atime_from_server() > > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can > > you > > > > please check if there is an unconditional dereference of stbuf in > > > > map_atime_from_server()? > > > > > > > > Regards, > > > > Vijay > > > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> > > wrote: > > > > > > > >> Hello Vijay, > > > >> > > > >> yes, we are using custom patches. It s a helper function, which is > > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > > >> Do you think this could be the problem? The functions only manipulates > > > >> the atime in struct iattr > > > >> > > > >> Regards > > > >> David Spisla > > > >> > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > > >> vbellur at redhat.com>: > > > >> > > > >>> Hello David, > > > >>> > > > >>> Do you have any custom patches in your deployment? I looked up v5.5 > > but > > > >>> could not find the following functions referred to in the core: > > > >>> > > > >>> map_atime_from_server() > > > >>> worm_lookup_cbk() > > > >>> > > > >>> Neither do I see xlator_helper.c in the codebase. > > > >>> > > > >>> Thanks, > > > >>> Vijay > > > >>> > > > >>> > > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > > >>> __FUNCTION__ = "map_to_atime_from_server" > > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > > =0x7fdeac0015c8, > > > >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1, > > > >>> op_errno=op_errno at entry=13, > > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at > > > >>> worm.c:531 > > > >>> priv = 0x7fdef4075378 > > > >>> ret = 0 > > > >>> __FUNCTION__ = "worm_lookup_cbk" > > > >>> > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> > > > >>> wrote: > > > >>> > > > >>>> Hello Vijay, > > > >>>> > > > >>>> I could reproduce the issue. After doing a simple DIR Listing from > > > >>>> Win10 powershell, all brick processes crashes. Its not the same > > scenario > > > >>>> mentioned before but the crash report in the bricks log is the same. > > > >>>> Attached you find the backtrace. > > > >>>> > > > >>>> Regards > > > >>>> David Spisla > > > >>>> > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > > >>>> vbellur at redhat.com>: > > > >>>> > > > >>>>> Hello David, > > > >>>>> > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hello Vijay, > > > >>>>>> > > > >>>>>> how can I create such a core file? Or will it be created > > > >>>>>> automatically if a gluster process crashes? > > > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > > > >>>>>> > > > >>>>> > > > >>>>> Generation of core file is dependent on the system configuration. > > > >>>>> `man 5 core` contains useful information to generate a core file > > in a > > > >>>>> directory. Once a core file is generated, you can use gdb to get a > > > >>>>> backtrace of all threads (using "thread apply all bt full"). > > > >>>>> > > > >>>>> > > > >>>>>> Unfortunately this bug is not easy to reproduce because it appears > > > >>>>>> only sometimes. > > > >>>>>> > > > >>>>> > > > >>>>> If the bug is not easy to reproduce, having a backtrace from the > > > >>>>> generated core would be very useful! > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Vijay > > > >>>>> > > > >>>>> > > > >>>>>> > > > >>>>>> Regards > > > >>>>>> David Spisla > > > >>>>>> > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > > >>>>>> vbellur at redhat.com>: > > > >>>>>> > > > >>>>>>> Thank you for the report, David. Do you have core files > > available on > > > >>>>>>> any of the servers? If yes, would it be possible for you to > > provide a > > > >>>>>>> backtrace. > > > >>>>>>> > > > >>>>>>> Regards, > > > >>>>>>> Vijay > > > >>>>>>> > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Hello folks, > > > >>>>>>>> > > > >>>>>>>> we have a client application (runs on Win10) which does some > > FOPs > > > >>>>>>>> on a gluster volume which is accessed by SMB. > > > >>>>>>>> > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > > >>>>>>>> successively and checks if the files data was correctly copied. > > While doing > > > >>>>>>>> this, all brick processes crashes and in the logs one have this > > crash > > > >>>>>>>> report on every brick log: > > > >>>>>>>> > > > >>>>>>>>> > > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > > gfid: 00000000-0000-0000-0000-000000000001, > > req(uid:2000,gid:2000,perm:1,ngrps:1), > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission > > denied] > > > >>>>>>>>> pending frames: > > > >>>>>>>>> frame : type(0) op(27) > > > >>>>>>>>> frame : type(0) op(40) > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > >>>>>>>>> signal received: 11 > > > >>>>>>>>> time of crash: > > > >>>>>>>>> 2019-04-16 08:32:21 > > > >>>>>>>>> configuration details: > > > >>>>>>>>> argp 1 > > > >>>>>>>>> backtrace 1 > > > >>>>>>>>> dlfcn 1 > > > >>>>>>>>> libpthread 1 > > > >>>>>>>>> llistxattr 1 > > > >>>>>>>>> setfsid 1 > > > >>>>>>>>> spinlock 1 > > > >>>>>>>>> epoll.h 1 > > > >>>>>>>>> xattr.h 1 > > > >>>>>>>>> st_atim.tv_nsec 1 > > > >>>>>>>>> package-string: glusterfs 5.5 > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > > >>>>>>>>> > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > > crashes and again, > > > >>>>>>>> one can read this crash report in every brick log: > > > >>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > > 0-longterm-access-control: > > > >>>>>>>>> client: > > > >>>>>>>>> > > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > > acl:-) [Permission > > > >>>>>>>>> denied] > > > >>>>>>>>> > > > >>>>>>>>> pending frames: > > > >>>>>>>>> > > > >>>>>>>>> frame : type(0) op(27) > > > >>>>>>>>> > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > >>>>>>>>> > > > >>>>>>>>> signal received: 11 > > > >>>>>>>>> > > > >>>>>>>>> time of crash: > > > >>>>>>>>> > > > >>>>>>>>> 2019-05-02 07:43:39 > > > >>>>>>>>> > > > >>>>>>>>> configuration details: > > > >>>>>>>>> > > > >>>>>>>>> argp 1 > > > >>>>>>>>> > > > >>>>>>>>> backtrace 1 > > > >>>>>>>>> > > > >>>>>>>>> dlfcn 1 > > > >>>>>>>>> > > > >>>>>>>>> libpthread 1 > > > >>>>>>>>> > > > >>>>>>>>> llistxattr 1 > > > >>>>>>>>> > > > >>>>>>>>> setfsid 1 > > > >>>>>>>>> > > > >>>>>>>>> spinlock 1 > > > >>>>>>>>> > > > >>>>>>>>> epoll.h 1 > > > >>>>>>>>> > > > >>>>>>>>> xattr.h 1 > > > >>>>>>>>> > > > >>>>>>>>> st_atim.tv_nsec 1 > > > >>>>>>>>> > > > >>>>>>>>> package-string: glusterfs 5.5 > > > >>>>>>>>> > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > > >>>>>>>>> > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > > >>>>>>>>> > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > > >>>>>>>>> > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > > >>>>>>>>> > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different > > > >>>>>>>> volumes. But both volumes has the same settings: > > > >>>>>>>> > > > >>>>>>>>> Volume Name: shortterm > > > >>>>>>>>> Type: Replicate > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > > >>>>>>>>> Status: Started > > > >>>>>>>>> Snapshot Count: 0 > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > > >>>>>>>>> Transport-type: tcp > > > >>>>>>>>> Bricks: > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > > >>>>>>>>> Options Reconfigured: > > > >>>>>>>>> storage.reserve: 1 > > > >>>>>>>>> performance.client-io-threads: off > > > >>>>>>>>> nfs.disable: on > > > >>>>>>>>> transport.address-family: inet > > > >>>>>>>>> user.smb: disable > > > >>>>>>>>> features.read-only: off > > > >>>>>>>>> features.worm: off > > > >>>>>>>>> features.worm-file-level: on > > > >>>>>>>>> features.retention-mode: enterprise > > > >>>>>>>>> features.default-retention-period: 120 > > > >>>>>>>>> network.ping-timeout: 10 > > > >>>>>>>>> features.cache-invalidation: on > > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > > >>>>>>>>> performance.nl-cache: on > > > >>>>>>>>> performance.nl-cache-timeout: 600 > > > >>>>>>>>> client.event-threads: 32 > > > >>>>>>>>> server.event-threads: 32 > > > >>>>>>>>> cluster.lookup-optimize: on > > > >>>>>>>>> performance.stat-prefetch: on > > > >>>>>>>>> performance.cache-invalidation: on > > > >>>>>>>>> performance.md-cache-timeout: 600 > > > >>>>>>>>> performance.cache-samba-metadata: on > > > >>>>>>>>> performance.cache-ima-xattrs: on > > > >>>>>>>>> performance.io-thread-count: 64 > > > >>>>>>>>> cluster.use-compound-fops: on > > > >>>>>>>>> performance.cache-size: 512MB > > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > > >>>>>>>>> performance.read-ahead: off > > > >>>>>>>>> performance.write-behind-window-size: 4MB > > > >>>>>>>>> performance.write-behind: on > > > >>>>>>>>> storage.build-pgfid: on > > > >>>>>>>>> features.utime: on > > > >>>>>>>>> storage.ctime: on > > > >>>>>>>>> cluster.quorum-type: fixed > > > >>>>>>>>> cluster.quorum-count: 2 > > > >>>>>>>>> features.bitrot: on > > > >>>>>>>>> features.scrub: Active > > > >>>>>>>>> features.scrub-freq: daily > > > >>>>>>>>> cluster.enable-shared-storage: enable > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> Why can this happen to all Brick processes? I don't understand > > the > > > >>>>>>>> crash report. The FOPs are nothing special and after restart > > brick > > > >>>>>>>> processes everything works fine and our application was succeed. > > > >>>>>>>> > > > >>>>>>>> Regards > > > >>>>>>>> David Spisla > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> Gluster-users mailing list > > > >>>>>>>> Gluster-users at gluster.org > > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > > > >>>>>>> > > > >>>>>>> > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > >