thr3ads.net - Gluster users - [Gluster-users] Brick-Xlators crashes after Set-RO and Read [May 2019]

If this information is useful, please help other people find it:
Share via:

David Spisla

2019-May-17 09:17 UTC

[Gluster-users] Brick-Xlators crashes after Set-RO and Read

Hello Niels,

Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <ndevos at
redhat.com>:
> On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > Hello Vijay,
> > thank you for the clarification. Yes, there is an unconditional
> dereference
> > in stbuf. It seems plausible that this causes the crash. I think a
check
> > like this should help:
> >
> > if (buf == NULL) {
> >         goto out;
> > }
> > map_atime_from_server(this, buf);
> >
> > Is there a reason why buf can be NULL?
>
> It seems LOOKUP returned an error (errno=13: EACCES: Permission denied).
> This is probably something you need to handle in worm_lookup_cbk. There
> can be many reasons for a FOP to return an error, why it happened in
> this case is a little difficult to say without (much) more details.
>Yes, I will look for a way to handle that case.
It is intended, that the struct stbuf ist NULL when an error happens?

Regards
David Spisla

> HTH,
> Niels
>
>
> >
> > Regards
> > David Spisla
> >
> >
> > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> vbellur at redhat.com>:
> >
> > > Hello David,
> > >
> > > From the backtrace it looks like stbuf is NULL in
> map_atime_from_server()
> > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno =
13). Can
> you
> > > please check if there is an unconditional dereference of stbuf in
> > > map_atime_from_server()?
> > >
> > > Regards,
> > > Vijay
> > >
> > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at
gmail.com>
> wrote:
> > >
> > >> Hello Vijay,
> > >>
> > >> yes, we are using custom patches. It s a helper function,
which is
> > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > >> Do you think this could be the problem? The functions only
manipulates
> > >> the atime in struct iattr
> > >>
> > >> Regards
> > >> David Spisla
> > >>
> > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> > >> vbellur at redhat.com>:
> > >>
> > >>> Hello David,
> > >>>
> > >>> Do you have any custom patches in your deployment? I
looked up v5.5
> but
> > >>> could not find the following functions referred to in the
core:
> > >>>
> > >>> map_atime_from_server()
> > >>> worm_lookup_cbk()
> > >>>
> > >>> Neither do I see xlator_helper.c in the codebase.
> > >>>
> > >>> Thanks,
> > >>> Vijay
> > >>>
> > >>>
> > >>> #0  map_atime_from_server (this=0x7fdef401af00,
stbuf=0x0) at
> > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > >>>         __FUNCTION__ =
"map_to_atime_from_server"
> > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at
entry
> =0x7fdeac0015c8,
> > >>> cookie=<optimized out>, this=0x7fdef401af00,
op_ret=op_ret at entry=-1,
> > >>> op_errno=op_errno at entry=13,
> > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0,
postparent=0x0) at
> > >>> worm.c:531
> > >>>         priv = 0x7fdef4075378
> > >>>         ret = 0
> > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > >>>
> > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla
<spisla80 at gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hello Vijay,
> > >>>>
> > >>>> I could reproduce the issue. After doing a simple DIR
Listing from
> > >>>> Win10 powershell, all brick processes crashes. Its
not the same
> scenario
> > >>>> mentioned before but the crash report in the bricks
log is the same.
> > >>>> Attached you find the backtrace.
> > >>>>
> > >>>> Regards
> > >>>> David Spisla
> > >>>>
> > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur
<
> > >>>> vbellur at redhat.com>:
> > >>>>
> > >>>>> Hello David,
> > >>>>>
> > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla
<spisla80 at gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hello Vijay,
> > >>>>>>
> > >>>>>> how can I create such a core file? Or will it
be created
> > >>>>>> automatically if a gluster process crashes?
> > >>>>>> Maybe you can give me a hint and will try to
get a backtrace.
> > >>>>>>
> > >>>>>
> > >>>>> Generation of core file is dependent on the
system configuration.
> > >>>>> `man 5 core` contains useful information to
generate a core file
> in a
> > >>>>> directory. Once a core file is generated, you can
use gdb to get a
> > >>>>> backtrace of all threads (using "thread
apply all bt full").
> > >>>>>
> > >>>>>
> > >>>>>> Unfortunately this bug is not easy to
reproduce because it appears
> > >>>>>> only sometimes.
> > >>>>>>
> > >>>>>
> > >>>>> If the bug is not easy to reproduce, having a
backtrace from the
> > >>>>> generated core would be very useful!
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Vijay
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> David Spisla
> > >>>>>>
> > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb
Vijay Bellur <
> > >>>>>> vbellur at redhat.com>:
> > >>>>>>
> > >>>>>>> Thank you for the report, David. Do you
have core files
> available on
> > >>>>>>> any of the servers? If yes, would it be
possible for you to
> provide a
> > >>>>>>> backtrace.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Vijay
> > >>>>>>>
> > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David
Spisla <spisla80 at gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hello folks,
> > >>>>>>>>
> > >>>>>>>> we have a client application (runs on
Win10) which does some
> FOPs
> > >>>>>>>> on a gluster volume which is accessed
by SMB.
> > >>>>>>>>
> > >>>>>>>> *Scenario 1* is a READ Operation
which reads all files
> > >>>>>>>> successively and checks if the files
data was correctly copied.
> While doing
> > >>>>>>>> this, all brick processes crashes and
in the logs one have this
> crash
> > >>>>>>>> report on every brick log:
> > >>>>>>>>
> > >>>>>>>>>
>
CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> gfid: 00000000-0000-0000-0000-000000000001,
> req(uid:2000,gid:2000,perm:1,ngrps:1),
> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
> denied]
> > >>>>>>>>> pending frames:
> > >>>>>>>>> frame : type(0) op(27)
> > >>>>>>>>> frame : type(0) op(40)
> > >>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
> > >>>>>>>>> signal received: 11
> > >>>>>>>>> time of crash:
> > >>>>>>>>> 2019-04-16 08:32:21
> > >>>>>>>>> configuration details:
> > >>>>>>>>> argp 1
> > >>>>>>>>> backtrace 1
> > >>>>>>>>> dlfcn 1
> > >>>>>>>>> libpthread 1
> > >>>>>>>>> llistxattr 1
> > >>>>>>>>> setfsid 1
> > >>>>>>>>> spinlock 1
> > >>>>>>>>> epoll.h 1
> > >>>>>>>>> xattr.h 1
> > >>>>>>>>> st_atim.tv_nsec 1
> > >>>>>>>>> package-string: glusterfs 5.5
> > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > >>>>>>>>>
/lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > >>>>>>>>>
>
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > >>>>>>>>>
>
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > >>>>>>>>>
/lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > >>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > >>>>>>>>>
> > >>>>>>>>> *Scenario 2 *The application just
SET Read-Only on each file
> > >>>>>>>> sucessively. After the 70th file was
set, all the bricks
> crashes and again,
> > >>>>>>>> one can read this crash report in
every brick log:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> [2019-05-02 07:43:39.953591] I
[MSGID: 139001]
> > >>>>>>>>>
[posix-acl.c:263:posix_acl_log_permit_denied]
> 0-longterm-access-control:
> > >>>>>>>>> client:
> > >>>>>>>>>
>
CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > >>>>>>>>> gfid:
00000000-0000-0000-0000-000000000001,
> > >>>>>>>>>
req(uid:2000,gid:2000,perm:1,ngrps:1),
> > >>>>>>>>>
ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> acl:-) [Permission
> > >>>>>>>>> denied]
> > >>>>>>>>>
> > >>>>>>>>> pending frames:
> > >>>>>>>>>
> > >>>>>>>>> frame : type(0) op(27)
> > >>>>>>>>>
> > >>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
> > >>>>>>>>>
> > >>>>>>>>> signal received: 11
> > >>>>>>>>>
> > >>>>>>>>> time of crash:
> > >>>>>>>>>
> > >>>>>>>>> 2019-05-02 07:43:39
> > >>>>>>>>>
> > >>>>>>>>> configuration details:
> > >>>>>>>>>
> > >>>>>>>>> argp 1
> > >>>>>>>>>
> > >>>>>>>>> backtrace 1
> > >>>>>>>>>
> > >>>>>>>>> dlfcn 1
> > >>>>>>>>>
> > >>>>>>>>> libpthread 1
> > >>>>>>>>>
> > >>>>>>>>> llistxattr 1
> > >>>>>>>>>
> > >>>>>>>>> setfsid 1
> > >>>>>>>>>
> > >>>>>>>>> spinlock 1
> > >>>>>>>>>
> > >>>>>>>>> epoll.h 1
> > >>>>>>>>>
> > >>>>>>>>> xattr.h 1
> > >>>>>>>>>
> > >>>>>>>>> st_atim.tv_nsec 1
> > >>>>>>>>>
> > >>>>>>>>> package-string: glusterfs 5.5
> > >>>>>>>>>
> > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > >>>>>>>>>
> > >>>>>>>>>
/lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
>
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > >>>>>>>>>
> > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
>
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > >>>>>>>>>
> > >>>>>>>>>
/lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > >>>>>>>>>
> > >>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> This happens on a 3-Node Gluster v5.5
Cluster on two different
> > >>>>>>>> volumes. But both volumes has the
same settings:
> > >>>>>>>>
> > >>>>>>>>> Volume Name: shortterm
> > >>>>>>>>> Type: Replicate
> > >>>>>>>>> Volume ID:
5307e5c5-e8a1-493a-a846-342fb0195dee
> > >>>>>>>>> Status: Started
> > >>>>>>>>> Snapshot Count: 0
> > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > >>>>>>>>> Transport-type: tcp
> > >>>>>>>>> Bricks:
> > >>>>>>>>> Brick1:
fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > >>>>>>>>> Brick2:
fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > >>>>>>>>> Brick3:
fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > >>>>>>>>> Options Reconfigured:
> > >>>>>>>>> storage.reserve: 1
> > >>>>>>>>> performance.client-io-threads:
off
> > >>>>>>>>> nfs.disable: on
> > >>>>>>>>> transport.address-family: inet
> > >>>>>>>>> user.smb: disable
> > >>>>>>>>> features.read-only: off
> > >>>>>>>>> features.worm: off
> > >>>>>>>>> features.worm-file-level: on
> > >>>>>>>>> features.retention-mode:
enterprise
> > >>>>>>>>>
features.default-retention-period: 120
> > >>>>>>>>> network.ping-timeout: 10
> > >>>>>>>>> features.cache-invalidation: on
> > >>>>>>>>>
features.cache-invalidation-timeout: 600
> > >>>>>>>>> performance.nl-cache: on
> > >>>>>>>>> performance.nl-cache-timeout: 600
> > >>>>>>>>> client.event-threads: 32
> > >>>>>>>>> server.event-threads: 32
> > >>>>>>>>> cluster.lookup-optimize: on
> > >>>>>>>>> performance.stat-prefetch: on
> > >>>>>>>>> performance.cache-invalidation:
on
> > >>>>>>>>> performance.md-cache-timeout: 600
> > >>>>>>>>> performance.cache-samba-metadata:
on
> > >>>>>>>>> performance.cache-ima-xattrs: on
> > >>>>>>>>> performance.io-thread-count: 64
> > >>>>>>>>> cluster.use-compound-fops: on
> > >>>>>>>>> performance.cache-size: 512MB
> > >>>>>>>>>
performance.cache-refresh-timeout: 10
> > >>>>>>>>> performance.read-ahead: off
> > >>>>>>>>>
performance.write-behind-window-size: 4MB
> > >>>>>>>>> performance.write-behind: on
> > >>>>>>>>> storage.build-pgfid: on
> > >>>>>>>>> features.utime: on
> > >>>>>>>>> storage.ctime: on
> > >>>>>>>>> cluster.quorum-type: fixed
> > >>>>>>>>> cluster.quorum-count: 2
> > >>>>>>>>> features.bitrot: on
> > >>>>>>>>> features.scrub: Active
> > >>>>>>>>> features.scrub-freq: daily
> > >>>>>>>>> cluster.enable-shared-storage:
enable
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>> Why can this happen to all Brick
processes? I don't understand
> the
> > >>>>>>>> crash report. The FOPs are nothing
special and after restart
> brick
> > >>>>>>>> processes everything works fine and
our application was succeed.
> > >>>>>>>>
> > >>>>>>>> Regards
> > >>>>>>>> David Spisla
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
_______________________________________________
> > >>>>>>>> Gluster-users mailing list
> > >>>>>>>> Gluster-users at gluster.org
> > >>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
> > >>>>>>>
> > >>>>>>>
>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/904c5686/attachment.html>

Niels de Vos

2019-May-17 09:35 UTC

head link

[Gluster-users] Brick-Xlators crashes after Set-RO and Read

On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla
wrote:> Hello Niels,
> 
> Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <ndevos at
redhat.com>:
> 
> > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > > Hello Vijay,
> > > thank you for the clarification. Yes, there is an unconditional
> > dereference
> > > in stbuf. It seems plausible that this causes the crash. I think
a check
> > > like this should help:
> > >
> > > if (buf == NULL) {
> > >         goto out;
> > > }
> > > map_atime_from_server(this, buf);
> > >
> > > Is there a reason why buf can be NULL?
> >
> > It seems LOOKUP returned an error (errno=13: EACCES: Permission
denied).
> > This is probably something you need to handle in worm_lookup_cbk.
There
> > can be many reasons for a FOP to return an error, why it happened in
> > this case is a little difficult to say without (much) more details.
> >
> Yes, I will look for a way to handle that case.
> It is intended, that the struct stbuf ist NULL when an error happens?
Yes, in most error occasions it will not be possible to get a valid
stbuf.

Niels

> 
> Regards
> David Spisla
> 
> 
> > HTH,
> > Niels
> >
> >
> > >
> > > Regards
> > > David Spisla
> > >
> > >
> > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> > vbellur at redhat.com>:
> > >
> > > > Hello David,
> > > >
> > > > From the backtrace it looks like stbuf is NULL in
> > map_atime_from_server()
> > > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno
= 13). Can
> > you
> > > > please check if there is an unconditional dereference of
stbuf in
> > > > map_atime_from_server()?
> > > >
> > > > Regards,
> > > > Vijay
> > > >
> > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at
gmail.com>
> > wrote:
> > > >
> > > >> Hello Vijay,
> > > >>
> > > >> yes, we are using custom patches. It s a helper
function, which is
> > > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > > >> Do you think this could be the problem? The functions
only manipulates
> > > >> the atime in struct iattr
> > > >>
> > > >> Regards
> > > >> David Spisla
> > > >>
> > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur
<
> > > >> vbellur at redhat.com>:
> > > >>
> > > >>> Hello David,
> > > >>>
> > > >>> Do you have any custom patches in your deployment? I
looked up v5.5
> > but
> > > >>> could not find the following functions referred to
in the core:
> > > >>>
> > > >>> map_atime_from_server()
> > > >>> worm_lookup_cbk()
> > > >>>
> > > >>> Neither do I see xlator_helper.c in the codebase.
> > > >>>
> > > >>> Thanks,
> > > >>> Vijay
> > > >>>
> > > >>>
> > > >>> #0  map_atime_from_server (this=0x7fdef401af00,
stbuf=0x0) at
> > > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > > >>>         __FUNCTION__ =
"map_to_atime_from_server"
> > > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk
(frame=frame at entry
> > =0x7fdeac0015c8,
> > > >>> cookie=<optimized out>, this=0x7fdef401af00,
op_ret=op_ret at entry=-1,
> > > >>> op_errno=op_errno at entry=13,
> > > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0,
postparent=0x0) at
> > > >>> worm.c:531
> > > >>>         priv = 0x7fdef4075378
> > > >>>         ret = 0
> > > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > > >>>
> > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla
<spisla80 at gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hello Vijay,
> > > >>>>
> > > >>>> I could reproduce the issue. After doing a
simple DIR Listing from
> > > >>>> Win10 powershell, all brick processes crashes.
Its not the same
> > scenario
> > > >>>> mentioned before but the crash report in the
bricks log is the same.
> > > >>>> Attached you find the backtrace.
> > > >>>>
> > > >>>> Regards
> > > >>>> David Spisla
> > > >>>>
> > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay
Bellur <
> > > >>>> vbellur at redhat.com>:
> > > >>>>
> > > >>>>> Hello David,
> > > >>>>>
> > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla
<spisla80 at gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hello Vijay,
> > > >>>>>>
> > > >>>>>> how can I create such a core file? Or
will it be created
> > > >>>>>> automatically if a gluster process
crashes?
> > > >>>>>> Maybe you can give me a hint and will
try to get a backtrace.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Generation of core file is dependent on the
system configuration.
> > > >>>>> `man 5 core` contains useful information to
generate a core file
> > in a
> > > >>>>> directory. Once a core file is generated,
you can use gdb to get a
> > > >>>>> backtrace of all threads (using "thread
apply all bt full").
> > > >>>>>
> > > >>>>>
> > > >>>>>> Unfortunately this bug is not easy to
reproduce because it appears
> > > >>>>>> only sometimes.
> > > >>>>>>
> > > >>>>>
> > > >>>>> If the bug is not easy to reproduce, having
a backtrace from the
> > > >>>>> generated core would be very useful!
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Vijay
> > > >>>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Regards
> > > >>>>>> David Spisla
> > > >>>>>>
> > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb
Vijay Bellur <
> > > >>>>>> vbellur at redhat.com>:
> > > >>>>>>
> > > >>>>>>> Thank you for the report, David. Do
you have core files
> > available on
> > > >>>>>>> any of the servers? If yes, would it
be possible for you to
> > provide a
> > > >>>>>>> backtrace.
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> Vijay
> > > >>>>>>>
> > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David
Spisla <spisla80 at gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hello folks,
> > > >>>>>>>>
> > > >>>>>>>> we have a client application
(runs on Win10) which does some
> > FOPs
> > > >>>>>>>> on a gluster volume which is
accessed by SMB.
> > > >>>>>>>>
> > > >>>>>>>> *Scenario 1* is a READ Operation
which reads all files
> > > >>>>>>>> successively and checks if the
files data was correctly copied.
> > While doing
> > > >>>>>>>> this, all brick processes
crashes and in the logs one have this
> > crash
> > > >>>>>>>> report on every brick log:
> > > >>>>>>>>
> > > >>>>>>>>>
> >
CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> > gfid: 00000000-0000-0000-0000-000000000001,
> > req(uid:2000,gid:2000,perm:1,ngrps:1),
> > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
[Permission
> > denied]
> > > >>>>>>>>> pending frames:
> > > >>>>>>>>> frame : type(0) op(27)
> > > >>>>>>>>> frame : type(0) op(40)
> > > >>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
> > > >>>>>>>>> signal received: 11
> > > >>>>>>>>> time of crash:
> > > >>>>>>>>> 2019-04-16 08:32:21
> > > >>>>>>>>> configuration details:
> > > >>>>>>>>> argp 1
> > > >>>>>>>>> backtrace 1
> > > >>>>>>>>> dlfcn 1
> > > >>>>>>>>> libpthread 1
> > > >>>>>>>>> llistxattr 1
> > > >>>>>>>>> setfsid 1
> > > >>>>>>>>> spinlock 1
> > > >>>>>>>>> epoll.h 1
> > > >>>>>>>>> xattr.h 1
> > > >>>>>>>>> st_atim.tv_nsec 1
> > > >>>>>>>>> package-string: glusterfs
5.5
> > > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > > >>>>>>>>>
/lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > > >>>>>>>>>
> >
/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > > >>>>>>>>>
/lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > > >>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > > >>>>>>>>>
> > > >>>>>>>>> *Scenario 2 *The application
just SET Read-Only on each file
> > > >>>>>>>> sucessively. After the 70th file
was set, all the bricks
> > crashes and again,
> > > >>>>>>>> one can read this crash report
in every brick log:
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> [2019-05-02 07:43:39.953591]
I [MSGID: 139001]
> > > >>>>>>>>>
[posix-acl.c:263:posix_acl_log_permit_denied]
> > 0-longterm-access-control:
> > > >>>>>>>>> client:
> > > >>>>>>>>>
> >
CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > > >>>>>>>>> gfid:
00000000-0000-0000-0000-000000000001,
> > > >>>>>>>>>
req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > >>>>>>>>>
ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> > acl:-) [Permission
> > > >>>>>>>>> denied]
> > > >>>>>>>>>
> > > >>>>>>>>> pending frames:
> > > >>>>>>>>>
> > > >>>>>>>>> frame : type(0) op(27)
> > > >>>>>>>>>
> > > >>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
> > > >>>>>>>>>
> > > >>>>>>>>> signal received: 11
> > > >>>>>>>>>
> > > >>>>>>>>> time of crash:
> > > >>>>>>>>>
> > > >>>>>>>>> 2019-05-02 07:43:39
> > > >>>>>>>>>
> > > >>>>>>>>> configuration details:
> > > >>>>>>>>>
> > > >>>>>>>>> argp 1
> > > >>>>>>>>>
> > > >>>>>>>>> backtrace 1
> > > >>>>>>>>>
> > > >>>>>>>>> dlfcn 1
> > > >>>>>>>>>
> > > >>>>>>>>> libpthread 1
> > > >>>>>>>>>
> > > >>>>>>>>> llistxattr 1
> > > >>>>>>>>>
> > > >>>>>>>>> setfsid 1
> > > >>>>>>>>>
> > > >>>>>>>>> spinlock 1
> > > >>>>>>>>>
> > > >>>>>>>>> epoll.h 1
> > > >>>>>>>>>
> > > >>>>>>>>> xattr.h 1
> > > >>>>>>>>>
> > > >>>>>>>>> st_atim.tv_nsec 1
> > > >>>>>>>>>
> > > >>>>>>>>> package-string: glusterfs
5.5
> > > >>>>>>>>>
> > > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > > >>>>>>>>>
> > > >>>>>>>>>
/lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > > >>>>>>>>>
> > > >>>>>>>>>
/usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> >
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > > >>>>>>>>>
> > > >>>>>>>>>
/lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > > >>>>>>>>>
> > > >>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> This happens on a 3-Node Gluster
v5.5 Cluster on two different
> > > >>>>>>>> volumes. But both volumes has
the same settings:
> > > >>>>>>>>
> > > >>>>>>>>> Volume Name: shortterm
> > > >>>>>>>>> Type: Replicate
> > > >>>>>>>>> Volume ID:
5307e5c5-e8a1-493a-a846-342fb0195dee
> > > >>>>>>>>> Status: Started
> > > >>>>>>>>> Snapshot Count: 0
> > > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > > >>>>>>>>> Transport-type: tcp
> > > >>>>>>>>> Bricks:
> > > >>>>>>>>> Brick1:
fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > > >>>>>>>>> Brick2:
fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > > >>>>>>>>> Brick3:
fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > > >>>>>>>>> Options Reconfigured:
> > > >>>>>>>>> storage.reserve: 1
> > > >>>>>>>>>
performance.client-io-threads: off
> > > >>>>>>>>> nfs.disable: on
> > > >>>>>>>>> transport.address-family:
inet
> > > >>>>>>>>> user.smb: disable
> > > >>>>>>>>> features.read-only: off
> > > >>>>>>>>> features.worm: off
> > > >>>>>>>>> features.worm-file-level: on
> > > >>>>>>>>> features.retention-mode:
enterprise
> > > >>>>>>>>>
features.default-retention-period: 120
> > > >>>>>>>>> network.ping-timeout: 10
> > > >>>>>>>>> features.cache-invalidation:
on
> > > >>>>>>>>>
features.cache-invalidation-timeout: 600
> > > >>>>>>>>> performance.nl-cache: on
> > > >>>>>>>>>
performance.nl-cache-timeout: 600
> > > >>>>>>>>> client.event-threads: 32
> > > >>>>>>>>> server.event-threads: 32
> > > >>>>>>>>> cluster.lookup-optimize: on
> > > >>>>>>>>> performance.stat-prefetch:
on
> > > >>>>>>>>>
performance.cache-invalidation: on
> > > >>>>>>>>>
performance.md-cache-timeout: 600
> > > >>>>>>>>>
performance.cache-samba-metadata: on
> > > >>>>>>>>>
performance.cache-ima-xattrs: on
> > > >>>>>>>>> performance.io-thread-count:
64
> > > >>>>>>>>> cluster.use-compound-fops:
on
> > > >>>>>>>>> performance.cache-size:
512MB
> > > >>>>>>>>>
performance.cache-refresh-timeout: 10
> > > >>>>>>>>> performance.read-ahead: off
> > > >>>>>>>>>
performance.write-behind-window-size: 4MB
> > > >>>>>>>>> performance.write-behind: on
> > > >>>>>>>>> storage.build-pgfid: on
> > > >>>>>>>>> features.utime: on
> > > >>>>>>>>> storage.ctime: on
> > > >>>>>>>>> cluster.quorum-type: fixed
> > > >>>>>>>>> cluster.quorum-count: 2
> > > >>>>>>>>> features.bitrot: on
> > > >>>>>>>>> features.scrub: Active
> > > >>>>>>>>> features.scrub-freq: daily
> > > >>>>>>>>>
cluster.enable-shared-storage: enable
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>> Why can this happen to all Brick
processes? I don't understand
> > the
> > > >>>>>>>> crash report. The FOPs are
nothing special and after restart
> > brick
> > > >>>>>>>> processes everything works fine
and our application was succeed.
> > > >>>>>>>>
> > > >>>>>>>> Regards
> > > >>>>>>>> David Spisla
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
_______________________________________________
> > > >>>>>>>> Gluster-users mailing list
> > > >>>>>>>> Gluster-users at gluster.org
> > > >>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >>>>>>>
> > > >>>>>>>
> >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >

Gluster users - May 2019 - Brick-Xlators crashes after Set-RO and Read

[Gluster-users] Brick-Xlators crashes after Set-RO and Read

[Gluster-users] Brick-Xlators crashes after Set-RO and Read