Vijay Bellur
2019-May-16 23:50 UTC
[Gluster-users] Brick-Xlators crashes after Set-RO and Read
Hello David,>From the backtrace it looks like stbuf is NULL in map_atime_from_server()as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you please check if there is an unconditional dereference of stbuf in map_atime_from_server()? Regards, Vijay On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> wrote:> Hello Vijay, > > yes, we are using custom patches. It s a helper function, which is defined > in xlator_helper.c and used in worm_lookup_cbk. > Do you think this could be the problem? The functions only manipulates the > atime in struct iattr > > Regards > David Spisla > > Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <vbellur at redhat.com > >: > >> Hello David, >> >> Do you have any custom patches in your deployment? I looked up v5.5 but >> could not find the following functions referred to in the core: >> >> map_atime_from_server() >> worm_lookup_cbk() >> >> Neither do I see xlator_helper.c in the codebase. >> >> Thanks, >> Vijay >> >> >> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at >> ../../../../xlators/lib/src/xlator_helper.c:21 >> __FUNCTION__ = "map_to_atime_from_server" >> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, >> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1, >> op_errno=op_errno at entry=13, >> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at >> worm.c:531 >> priv = 0x7fdef4075378 >> ret = 0 >> __FUNCTION__ = "worm_lookup_cbk" >> >> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> wrote: >> >>> Hello Vijay, >>> >>> I could reproduce the issue. After doing a simple DIR Listing from Win10 >>> powershell, all brick processes crashes. Its not the same scenario >>> mentioned before but the crash report in the bricks log is the same. >>> Attached you find the backtrace. >>> >>> Regards >>> David Spisla >>> >>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < >>> vbellur at redhat.com>: >>> >>>> Hello David, >>>> >>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote: >>>> >>>>> Hello Vijay, >>>>> >>>>> how can I create such a core file? Or will it be created automatically >>>>> if a gluster process crashes? >>>>> Maybe you can give me a hint and will try to get a backtrace. >>>>> >>>> >>>> Generation of core file is dependent on the system configuration. `man >>>> 5 core` contains useful information to generate a core file in a directory. >>>> Once a core file is generated, you can use gdb to get a backtrace of all >>>> threads (using "thread apply all bt full"). >>>> >>>> >>>>> Unfortunately this bug is not easy to reproduce because it appears >>>>> only sometimes. >>>>> >>>> >>>> If the bug is not easy to reproduce, having a backtrace from the >>>> generated core would be very useful! >>>> >>>> Thanks, >>>> Vijay >>>> >>>> >>>>> >>>>> Regards >>>>> David Spisla >>>>> >>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < >>>>> vbellur at redhat.com>: >>>>> >>>>>> Thank you for the report, David. Do you have core files available on >>>>>> any of the servers? If yes, would it be possible for you to provide a >>>>>> backtrace. >>>>>> >>>>>> Regards, >>>>>> Vijay >>>>>> >>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello folks, >>>>>>> >>>>>>> we have a client application (runs on Win10) which does some FOPs on >>>>>>> a gluster volume which is accessed by SMB. >>>>>>> >>>>>>> *Scenario 1* is a READ Operation which reads all files successively >>>>>>> and checks if the files data was correctly copied. While doing this, all >>>>>>> brick processes crashes and in the logs one have this crash report on every >>>>>>> brick log: >>>>>>> >>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>>>>> pending frames: >>>>>>>> frame : type(0) op(27) >>>>>>>> frame : type(0) op(40) >>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>> signal received: 11 >>>>>>>> time of crash: >>>>>>>> 2019-04-16 08:32:21 >>>>>>>> configuration details: >>>>>>>> argp 1 >>>>>>>> backtrace 1 >>>>>>>> dlfcn 1 >>>>>>>> libpthread 1 >>>>>>>> llistxattr 1 >>>>>>>> setfsid 1 >>>>>>>> spinlock 1 >>>>>>>> epoll.h 1 >>>>>>>> xattr.h 1 >>>>>>>> st_atim.tv_nsec 1 >>>>>>>> package-string: glusterfs 5.5 >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>>>>> >>>>>>>> *Scenario 2 *The application just SET Read-Only on each file >>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>>>>> one can read this crash report in every brick log: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>>>>> client: >>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>>>>> denied] >>>>>>>> >>>>>>>> pending frames: >>>>>>>> >>>>>>>> frame : type(0) op(27) >>>>>>>> >>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>> >>>>>>>> signal received: 11 >>>>>>>> >>>>>>>> time of crash: >>>>>>>> >>>>>>>> 2019-05-02 07:43:39 >>>>>>>> >>>>>>>> configuration details: >>>>>>>> >>>>>>>> argp 1 >>>>>>>> >>>>>>>> backtrace 1 >>>>>>>> >>>>>>>> dlfcn 1 >>>>>>>> >>>>>>>> libpthread 1 >>>>>>>> >>>>>>>> llistxattr 1 >>>>>>>> >>>>>>>> setfsid 1 >>>>>>>> >>>>>>>> spinlock 1 >>>>>>>> >>>>>>>> epoll.h 1 >>>>>>>> >>>>>>>> xattr.h 1 >>>>>>>> >>>>>>>> st_atim.tv_nsec 1 >>>>>>>> >>>>>>>> package-string: glusterfs 5.5 >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>>>>> >>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>>>>> >>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>>>>> >>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>>>>> >>>>>>> >>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different >>>>>>> volumes. But both volumes has the same settings: >>>>>>> >>>>>>>> Volume Name: shortterm >>>>>>>> Type: Replicate >>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>>>>> Options Reconfigured: >>>>>>>> storage.reserve: 1 >>>>>>>> performance.client-io-threads: off >>>>>>>> nfs.disable: on >>>>>>>> transport.address-family: inet >>>>>>>> user.smb: disable >>>>>>>> features.read-only: off >>>>>>>> features.worm: off >>>>>>>> features.worm-file-level: on >>>>>>>> features.retention-mode: enterprise >>>>>>>> features.default-retention-period: 120 >>>>>>>> network.ping-timeout: 10 >>>>>>>> features.cache-invalidation: on >>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>> performance.nl-cache: on >>>>>>>> performance.nl-cache-timeout: 600 >>>>>>>> client.event-threads: 32 >>>>>>>> server.event-threads: 32 >>>>>>>> cluster.lookup-optimize: on >>>>>>>> performance.stat-prefetch: on >>>>>>>> performance.cache-invalidation: on >>>>>>>> performance.md-cache-timeout: 600 >>>>>>>> performance.cache-samba-metadata: on >>>>>>>> performance.cache-ima-xattrs: on >>>>>>>> performance.io-thread-count: 64 >>>>>>>> cluster.use-compound-fops: on >>>>>>>> performance.cache-size: 512MB >>>>>>>> performance.cache-refresh-timeout: 10 >>>>>>>> performance.read-ahead: off >>>>>>>> performance.write-behind-window-size: 4MB >>>>>>>> performance.write-behind: on >>>>>>>> storage.build-pgfid: on >>>>>>>> features.utime: on >>>>>>>> storage.ctime: on >>>>>>>> cluster.quorum-type: fixed >>>>>>>> cluster.quorum-count: 2 >>>>>>>> features.bitrot: on >>>>>>>> features.scrub: Active >>>>>>>> features.scrub-freq: daily >>>>>>>> cluster.enable-shared-storage: enable >>>>>>>> >>>>>>>> >>>>>>> Why can this happen to all Brick processes? I don't understand the >>>>>>> crash report. The FOPs are nothing special and after restart brick >>>>>>> processes everything works fine and our application was succeed. >>>>>>> >>>>>>> Regards >>>>>>> David Spisla >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/8b983d16/attachment.html>
David Spisla
2019-May-17 07:50 UTC
[Gluster-users] Brick-Xlators crashes after Set-RO and Read
Hello Vijay, thank you for the clarification. Yes, there is an unconditional dereference in stbuf. It seems plausible that this causes the crash. I think a check like this should help: if (buf == NULL) { goto out; } map_atime_from_server(this, buf); Is there a reason why buf can be NULL? Regards David Spisla Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <vbellur at redhat.com>:> Hello David, > > From the backtrace it looks like stbuf is NULL in map_atime_from_server() > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you > please check if there is an unconditional dereference of stbuf in > map_atime_from_server()? > > Regards, > Vijay > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> wrote: > >> Hello Vijay, >> >> yes, we are using custom patches. It s a helper function, which is >> defined in xlator_helper.c and used in worm_lookup_cbk. >> Do you think this could be the problem? The functions only manipulates >> the atime in struct iattr >> >> Regards >> David Spisla >> >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < >> vbellur at redhat.com>: >> >>> Hello David, >>> >>> Do you have any custom patches in your deployment? I looked up v5.5 but >>> could not find the following functions referred to in the core: >>> >>> map_atime_from_server() >>> worm_lookup_cbk() >>> >>> Neither do I see xlator_helper.c in the codebase. >>> >>> Thanks, >>> Vijay >>> >>> >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at >>> ../../../../xlators/lib/src/xlator_helper.c:21 >>> __FUNCTION__ = "map_to_atime_from_server" >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1, >>> op_errno=op_errno at entry=13, >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at >>> worm.c:531 >>> priv = 0x7fdef4075378 >>> ret = 0 >>> __FUNCTION__ = "worm_lookup_cbk" >>> >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> >>> wrote: >>> >>>> Hello Vijay, >>>> >>>> I could reproduce the issue. After doing a simple DIR Listing from >>>> Win10 powershell, all brick processes crashes. Its not the same scenario >>>> mentioned before but the crash report in the bricks log is the same. >>>> Attached you find the backtrace. >>>> >>>> Regards >>>> David Spisla >>>> >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < >>>> vbellur at redhat.com>: >>>> >>>>> Hello David, >>>>> >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> >>>>> wrote: >>>>> >>>>>> Hello Vijay, >>>>>> >>>>>> how can I create such a core file? Or will it be created >>>>>> automatically if a gluster process crashes? >>>>>> Maybe you can give me a hint and will try to get a backtrace. >>>>>> >>>>> >>>>> Generation of core file is dependent on the system configuration. >>>>> `man 5 core` contains useful information to generate a core file in a >>>>> directory. Once a core file is generated, you can use gdb to get a >>>>> backtrace of all threads (using "thread apply all bt full"). >>>>> >>>>> >>>>>> Unfortunately this bug is not easy to reproduce because it appears >>>>>> only sometimes. >>>>>> >>>>> >>>>> If the bug is not easy to reproduce, having a backtrace from the >>>>> generated core would be very useful! >>>>> >>>>> Thanks, >>>>> Vijay >>>>> >>>>> >>>>>> >>>>>> Regards >>>>>> David Spisla >>>>>> >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < >>>>>> vbellur at redhat.com>: >>>>>> >>>>>>> Thank you for the report, David. Do you have core files available on >>>>>>> any of the servers? If yes, would it be possible for you to provide a >>>>>>> backtrace. >>>>>>> >>>>>>> Regards, >>>>>>> Vijay >>>>>>> >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello folks, >>>>>>>> >>>>>>>> we have a client application (runs on Win10) which does some FOPs >>>>>>>> on a gluster volume which is accessed by SMB. >>>>>>>> >>>>>>>> *Scenario 1* is a READ Operation which reads all files >>>>>>>> successively and checks if the files data was correctly copied. While doing >>>>>>>> this, all brick processes crashes and in the logs one have this crash >>>>>>>> report on every brick log: >>>>>>>> >>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>>>>>> pending frames: >>>>>>>>> frame : type(0) op(27) >>>>>>>>> frame : type(0) op(40) >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> signal received: 11 >>>>>>>>> time of crash: >>>>>>>>> 2019-04-16 08:32:21 >>>>>>>>> configuration details: >>>>>>>>> argp 1 >>>>>>>>> backtrace 1 >>>>>>>>> dlfcn 1 >>>>>>>>> libpthread 1 >>>>>>>>> llistxattr 1 >>>>>>>>> setfsid 1 >>>>>>>>> spinlock 1 >>>>>>>>> epoll.h 1 >>>>>>>>> xattr.h 1 >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> package-string: glusterfs 5.5 >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>>>>>> >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file >>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>>>>>> one can read this crash report in every brick log: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>>>>>> client: >>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>>>>>> denied] >>>>>>>>> >>>>>>>>> pending frames: >>>>>>>>> >>>>>>>>> frame : type(0) op(27) >>>>>>>>> >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> >>>>>>>>> signal received: 11 >>>>>>>>> >>>>>>>>> time of crash: >>>>>>>>> >>>>>>>>> 2019-05-02 07:43:39 >>>>>>>>> >>>>>>>>> configuration details: >>>>>>>>> >>>>>>>>> argp 1 >>>>>>>>> >>>>>>>>> backtrace 1 >>>>>>>>> >>>>>>>>> dlfcn 1 >>>>>>>>> >>>>>>>>> libpthread 1 >>>>>>>>> >>>>>>>>> llistxattr 1 >>>>>>>>> >>>>>>>>> setfsid 1 >>>>>>>>> >>>>>>>>> spinlock 1 >>>>>>>>> >>>>>>>>> epoll.h 1 >>>>>>>>> >>>>>>>>> xattr.h 1 >>>>>>>>> >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> >>>>>>>>> package-string: glusterfs 5.5 >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>>>>>> >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>>>>>> >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>>>>>> >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>>>>>> >>>>>>>> >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different >>>>>>>> volumes. But both volumes has the same settings: >>>>>>>> >>>>>>>>> Volume Name: shortterm >>>>>>>>> Type: Replicate >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>>>>>> Status: Started >>>>>>>>> Snapshot Count: 0 >>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>> Transport-type: tcp >>>>>>>>> Bricks: >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>>>>>> Options Reconfigured: >>>>>>>>> storage.reserve: 1 >>>>>>>>> performance.client-io-threads: off >>>>>>>>> nfs.disable: on >>>>>>>>> transport.address-family: inet >>>>>>>>> user.smb: disable >>>>>>>>> features.read-only: off >>>>>>>>> features.worm: off >>>>>>>>> features.worm-file-level: on >>>>>>>>> features.retention-mode: enterprise >>>>>>>>> features.default-retention-period: 120 >>>>>>>>> network.ping-timeout: 10 >>>>>>>>> features.cache-invalidation: on >>>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>>> performance.nl-cache: on >>>>>>>>> performance.nl-cache-timeout: 600 >>>>>>>>> client.event-threads: 32 >>>>>>>>> server.event-threads: 32 >>>>>>>>> cluster.lookup-optimize: on >>>>>>>>> performance.stat-prefetch: on >>>>>>>>> performance.cache-invalidation: on >>>>>>>>> performance.md-cache-timeout: 600 >>>>>>>>> performance.cache-samba-metadata: on >>>>>>>>> performance.cache-ima-xattrs: on >>>>>>>>> performance.io-thread-count: 64 >>>>>>>>> cluster.use-compound-fops: on >>>>>>>>> performance.cache-size: 512MB >>>>>>>>> performance.cache-refresh-timeout: 10 >>>>>>>>> performance.read-ahead: off >>>>>>>>> performance.write-behind-window-size: 4MB >>>>>>>>> performance.write-behind: on >>>>>>>>> storage.build-pgfid: on >>>>>>>>> features.utime: on >>>>>>>>> storage.ctime: on >>>>>>>>> cluster.quorum-type: fixed >>>>>>>>> cluster.quorum-count: 2 >>>>>>>>> features.bitrot: on >>>>>>>>> features.scrub: Active >>>>>>>>> features.scrub-freq: daily >>>>>>>>> cluster.enable-shared-storage: enable >>>>>>>>> >>>>>>>>> >>>>>>>> Why can this happen to all Brick processes? I don't understand the >>>>>>>> crash report. The FOPs are nothing special and after restart brick >>>>>>>> processes everything works fine and our application was succeed. >>>>>>>> >>>>>>>> Regards >>>>>>>> David Spisla >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/d4892d50/attachment.html>