thr3ads.net - Gluster users - [Gluster-users] Brick-Xlators crashes after Set-RO and Read [May 2019]

If this information is useful, please help other people find it:
Share via:

David Spisla

2019-May-07 09:15 UTC

[Gluster-users] Brick-Xlators crashes after Set-RO and Read

Hello Vijay,

how can I create such a core file? Or will it be created automatically if a
gluster process crashes?
Maybe you can give me a hint and will try to get a backtrace.

Unfortunately this bug is not easy to reproduce because it appears only
sometimes.

Regards
David Spisla

Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <vbellur at
redhat.com>:
> Thank you for the report, David. Do you have core files available on any
> of the servers? If yes, would it be possible for you to provide a
backtrace.
>
> Regards,
> Vijay
>
> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
wrote:
>
>> Hello folks,
>>
>> we have a client application (runs on Win10) which does some FOPs on a
>> gluster volume which is accessed by SMB.
>>
>> *Scenario 1* is a READ Operation which reads all files successively and
>> checks if the files data was correctly copied. While doing this, all
brick
>> processes crashes and in the logs one have this crash report on every
brick
>> log:
>>
>>>
CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
gfid: 00000000-0000-0000-0000-000000000001,
req(uid:2000,gid:2000,perm:1,ngrps:1),
ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
denied]
>>> pending frames:
>>> frame : type(0) op(27)
>>> frame : type(0) op(40)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 11
>>> time of crash:
>>> 2019-04-16 08:32:21
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.5
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>
/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>
>>> *Scenario 2 *The application just SET Read-Only on each file
>> sucessively. After the 70th file was set, all the bricks crashes and
again,
>> one can read this crash report in every brick log:
>>
>>>
>>>
>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>> [posix-acl.c:263:posix_acl_log_permit_denied]
0-longterm-access-control:
>>> client:
>>>
CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>> gfid: 00000000-0000-0000-0000-000000000001,
>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
[Permission
>>> denied]
>>>
>>> pending frames:
>>>
>>> frame : type(0) op(27)
>>>
>>> patchset: git://git.gluster.org/glusterfs.git
>>>
>>> signal received: 11
>>>
>>> time of crash:
>>>
>>> 2019-05-02 07:43:39
>>>
>>> configuration details:
>>>
>>> argp 1
>>>
>>> backtrace 1
>>>
>>> dlfcn 1
>>>
>>> libpthread 1
>>>
>>> llistxattr 1
>>>
>>> setfsid 1
>>>
>>> spinlock 1
>>>
>>> epoll.h 1
>>>
>>> xattr.h 1
>>>
>>> st_atim.tv_nsec 1
>>>
>>> package-string: glusterfs 5.5
>>>
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>
>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>
>>>
/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>
>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>
>>>
>>>
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>
>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>
>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>
>>
>> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes.
>> But both volumes has the same settings:
>>
>>> Volume Name: shortterm
>>> Type: Replicate
>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>> Options Reconfigured:
>>> storage.reserve: 1
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>> user.smb: disable
>>> features.read-only: off
>>> features.worm: off
>>> features.worm-file-level: on
>>> features.retention-mode: enterprise
>>> features.default-retention-period: 120
>>> network.ping-timeout: 10
>>> features.cache-invalidation: on
>>> features.cache-invalidation-timeout: 600
>>> performance.nl-cache: on
>>> performance.nl-cache-timeout: 600
>>> client.event-threads: 32
>>> server.event-threads: 32
>>> cluster.lookup-optimize: on
>>> performance.stat-prefetch: on
>>> performance.cache-invalidation: on
>>> performance.md-cache-timeout: 600
>>> performance.cache-samba-metadata: on
>>> performance.cache-ima-xattrs: on
>>> performance.io-thread-count: 64
>>> cluster.use-compound-fops: on
>>> performance.cache-size: 512MB
>>> performance.cache-refresh-timeout: 10
>>> performance.read-ahead: off
>>> performance.write-behind-window-size: 4MB
>>> performance.write-behind: on
>>> storage.build-pgfid: on
>>> features.utime: on
>>> storage.ctime: on
>>> cluster.quorum-type: fixed
>>> cluster.quorum-count: 2
>>> features.bitrot: on
>>> features.scrub: Active
>>> features.scrub-freq: daily
>>> cluster.enable-shared-storage: enable
>>>
>>>
>> Why can this happen to all Brick processes? I don't understand the
crash
>> report. The FOPs are nothing special and after restart brick processes
>> everything works fine and our application was succeed.
>>
>> Regards
>> David Spisla
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/fe8551af/attachment-0001.html>

Vijay Bellur

2019-May-07 18:08 UTC

head link

[Gluster-users] Brick-Xlators crashes after Set-RO and Read

Hello David,

On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:
> Hello Vijay,
>
> how can I create such a core file? Or will it be created automatically if
> a gluster process crashes?
> Maybe you can give me a hint and will try to get a backtrace.
>
Generation of core file is dependent on the system configuration.  `man 5
core` contains useful information to generate a core file in a directory.
Once a core file is generated, you can use gdb to get a backtrace of all
threads (using "thread apply all bt full").

> Unfortunately this bug is not easy to reproduce because it appears only
> sometimes.
>
If the bug is not easy to reproduce, having a backtrace from the generated
core would be very useful!

Thanks,
Vijay

>
> Regards
> David Spisla
>
> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <vbellur at
redhat.com
> >:
>
>> Thank you for the report, David. Do you have core files available on
any
>> of the servers? If yes, would it be possible for you to provide a
backtrace.
>>
>> Regards,
>> Vijay
>>
>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at
gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> we have a client application (runs on Win10) which does some FOPs
on a
>>> gluster volume which is accessed by SMB.
>>>
>>> *Scenario 1* is a READ Operation which reads all files successively
and
>>> checks if the files data was correctly copied. While doing this,
all brick
>>> processes crashes and in the logs one have this crash report on
every brick
>>> log:
>>>
>>>>
CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
gfid: 00000000-0000-0000-0000-000000000001,
req(uid:2000,gid:2000,perm:1,ngrps:1),
ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
denied]
>>>> pending frames:
>>>> frame : type(0) op(27)
>>>> frame : type(0) op(40)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 11
>>>> time of crash:
>>>> 2019-04-16 08:32:21
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> spinlock 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 5.5
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>
/usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>
/usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>
/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>
>>>> *Scenario 2 *The application just SET Read-Only on each file
>>> sucessively. After the 70th file was set, all the bricks crashes
and again,
>>> one can read this crash report in every brick log:
>>>
>>>>
>>>>
>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>> [posix-acl.c:263:posix_acl_log_permit_denied]
0-longterm-access-control:
>>>> client:
>>>>
CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
[Permission
>>>> denied]
>>>>
>>>> pending frames:
>>>>
>>>> frame : type(0) op(27)
>>>>
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>
>>>> signal received: 11
>>>>
>>>> time of crash:
>>>>
>>>> 2019-05-02 07:43:39
>>>>
>>>> configuration details:
>>>>
>>>> argp 1
>>>>
>>>> backtrace 1
>>>>
>>>> dlfcn 1
>>>>
>>>> libpthread 1
>>>>
>>>> llistxattr 1
>>>>
>>>> setfsid 1
>>>>
>>>> spinlock 1
>>>>
>>>> epoll.h 1
>>>>
>>>> xattr.h 1
>>>>
>>>> st_atim.tv_nsec 1
>>>>
>>>> package-string: glusterfs 5.5
>>>>
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>
>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>
>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>
>>>>
/usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>
>>>>
/usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>
>>>>
>>>>
/usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>
>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>
>>>>
>>>>
/usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>
>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>
>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>
>>>
>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
volumes.
>>> But both volumes has the same settings:
>>>
>>>> Volume Name: shortterm
>>>> Type: Replicate
>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>> Options Reconfigured:
>>>> storage.reserve: 1
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>> user.smb: disable
>>>> features.read-only: off
>>>> features.worm: off
>>>> features.worm-file-level: on
>>>> features.retention-mode: enterprise
>>>> features.default-retention-period: 120
>>>> network.ping-timeout: 10
>>>> features.cache-invalidation: on
>>>> features.cache-invalidation-timeout: 600
>>>> performance.nl-cache: on
>>>> performance.nl-cache-timeout: 600
>>>> client.event-threads: 32
>>>> server.event-threads: 32
>>>> cluster.lookup-optimize: on
>>>> performance.stat-prefetch: on
>>>> performance.cache-invalidation: on
>>>> performance.md-cache-timeout: 600
>>>> performance.cache-samba-metadata: on
>>>> performance.cache-ima-xattrs: on
>>>> performance.io-thread-count: 64
>>>> cluster.use-compound-fops: on
>>>> performance.cache-size: 512MB
>>>> performance.cache-refresh-timeout: 10
>>>> performance.read-ahead: off
>>>> performance.write-behind-window-size: 4MB
>>>> performance.write-behind: on
>>>> storage.build-pgfid: on
>>>> features.utime: on
>>>> storage.ctime: on
>>>> cluster.quorum-type: fixed
>>>> cluster.quorum-count: 2
>>>> features.bitrot: on
>>>> features.scrub: Active
>>>> features.scrub-freq: daily
>>>> cluster.enable-shared-storage: enable
>>>>
>>>>
>>> Why can this happen to all Brick processes? I don't understand
the crash
>>> report. The FOPs are nothing special and after restart brick
processes
>>> everything works fine and our application was succeed.
>>>
>>> Regards
>>> David Spisla
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/a3fa2730/attachment.html>

Gluster users - May 2019 - Brick-Xlators crashes after Set-RO and Read

[Gluster-users] Brick-Xlators crashes after Set-RO and Read

[Gluster-users] Brick-Xlators crashes after Set-RO and Read