thr3ads.net - Gluster users - [Gluster-users] RDMA inline threshold? [May 2018]

If this information is useful, please help other people find it:
Share via:

Dan Lavu

2018-May-30 01:00 UTC

[Gluster-users] RDMA inline threshold?

Forgot to mention, sometimes I have to do force start other volumes as
well, its hard to determine which brick process is locked up from the logs.


Status of volume: rhev_vms_primary
Gluster process
                      TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
     0         49157      Y       15666
Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
   0         49156      Y       2542
Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
     0         49156      Y       2180
Self-heal Daemon on localhost
                  N/A       N/A        N       N/A  << Brick process is
not
running on any node.
Self-heal Daemon on spidey.ib.runlevelone.lan
           N/A       N/A        N       N/A
Self-heal Daemon on groot.ib.runlevelone.lan
           N/A       N/A        N       N/A

Task Status of Volume rhev_vms_primary
------------------------------------------------------------------------------
There are no active volume tasks


 3081  gluster volume start rhev_vms_noshards force
 3082  gluster volume status
 3083  gluster volume start rhev_vms_primary force
 3084  gluster volume status
 3085  gluster volume start rhev_vms_primary rhev_vms
 3086  gluster volume start rhev_vms_primary rhev_vms force

Status of volume: rhev_vms_primary
Gluster process
                         TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
        0         49157      Y       15666
Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
      0         49156      Y       2542
Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
        0         49156      Y       2180
Self-heal Daemon on localhost
                     N/A       N/A        Y       8343
Self-heal Daemon on spidey.ib.runlevelone.lan
              N/A       N/A        Y       22381
Self-heal Daemon on groot.ib.runlevelone.lan
              N/A       N/A        Y       20633

Finally..

Dan




On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan at redhat.com> wrote:
> Stefan,
>
> Sounds like a brick process is not running. I have notice some strangeness
> in my lab when using RDMA, I often have to forcibly restart the brick
> process, often as in every single time I do a major operation, add a new
> volume, remove a volume, stop a volume, etc.
>
> gluster volume status <vol>
>
> Does any of the self heal daemons show N/A? If that's the case, try
> forcing a restart on the volume.
>
> gluster volume start <vol> force
>
> This will also explain why your volumes aren't being replicated
properly.
>
> On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at
ur.de>
> wrote:
>
>> Dear all,
>>
>> I faced a problem with a glusterfs volume (pure distributed, _not_
>> dispersed) over RDMA transport.  One user had a directory with a large
>> number of files (50,000 files) and just doing an "ls" in this
directory
>> yields a "Transport endpoint not connected" error. The effect
is, that "ls"
>> only shows some files, but not all.
>>
>> The respective log file shows this error message:
>>
>> [2018-05-20 20:38:25.114978] W [MSGID: 114031]
>> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0:
>> remote operation failed [Transport endpoint is not connected]
>> [2018-05-20 20:38:27.732796] W [MSGID: 103046]
>> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (
>> 10.100.245.18:49153), couldn't encode or decode the msg properly or
>> write chunks were not provided for replies that were bigger than
>> RDMA_INLINE_THRESHOLD (2048)
>> [2018-05-20 20:38:27.732844] W [MSGID: 114031]
>> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3:
>> remote operation failed [Transport endpoint is not connected]
>> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
>> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is
not
>> connected)
>>
>> I already set the memlock limit for glusterd to unlimited, but the
>> problem persists.
>>
>> Only going from RDMA transport to TCP transport solved the problem. 
(I'm
>> running the volume now in mixed mode, config.transport=tcp,rdma). 
Mounting
>> with transport=rdma shows this error, mouting with transport=tcp is
fine.
>>
>> however, this problem does not arise on all large directories, not on
>> all. I didn't recognize a pattern yet.
>>
>> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs .
>>
>> Is this a known issue with RDMA transport?
>>
>> best wishes,
>> Stefan
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180529/a54d142c/attachment.html>

Stefan Solbrig

2018-May-30 10:34 UTC

head link

[Gluster-users] RDMA inline threshold?

Dear Dan,

thanks for the quick reply!  

I actually tried restarting all processes (and even rebooting all servers), but
the error persists. I can also confirm that all birck processes are running.  
My volume is a distrubute-only volume (not dispersed, no sharding).

I also tried mounting with use_readdirp=no,  because the error seems to be
connected to readdirp, but this option does not change anything.

I found to options I might try:  (gluster volume get myvolumename  all | grep
readdirp )
   performance.force-readdirp              true
   dht.force-readdirp                      on
Can I turn off these safely? (or what precisely do they do?)

I also assured that all glusterd processes have  unlimited locked memory. 

Just to state it clearly:  I do _not_ see any data corruption.  Just the
directory listings do not work (in very rare cases) with rdma transport:
"ls"  shows only a part of the files.
but then I do:
     stat  /path/to/known/filename
it succeeds, and even
   md5sum /path/to/known/filename/that/does/not/get/listed/with/ls
yields the correct result.

best wishes,
Stefan
> Am 30.05.2018 um 03:00 schrieb Dan Lavu <dan at redhat.com>:
> 
> Forgot to mention, sometimes I have to do force start other volumes as
well, its hard to determine which brick process is locked up from the logs.
> 
> 
> Status of volume: rhev_vms_primary
> Gluster process                                                            
TCP Port  RDMA Port  Online  Pid
>
------------------------------------------------------------------------------
> Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary            
0         49157      Y       15666
> Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary          
0         49156      Y       2542
> Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary             
0         49156      Y       2180
> Self-heal Daemon on localhost                                              
N/A       N/A        N       N/A  << Brick process is not running on any
node.
> Self-heal Daemon on spidey.ib.runlevelone.lan                              
N/A       N/A        N       N/A
> Self-heal Daemon on groot.ib.runlevelone.lan                               
N/A       N/A        N       N/A
>  
> Task Status of Volume rhev_vms_primary
>
------------------------------------------------------------------------------
> There are no active volume tasks
> 
> 
>  3081  gluster volume start rhev_vms_noshards force
>  3082  gluster volume status
>  3083  gluster volume start rhev_vms_primary force
>  3084  gluster volume status
>  3085  gluster volume start rhev_vms_primary rhev_vms
>  3086  gluster volume start rhev_vms_primary rhev_vms force
> 
> Status of volume: rhev_vms_primary
> Gluster process                                                            
TCP Port  RDMA Port  Online  Pid
>
------------------------------------------------------------------------------
> Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary            
0         49157      Y       15666
> Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary          
0         49156      Y       2542
> Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary             
0         49156      Y       2180
> Self-heal Daemon on localhost                                              
N/A       N/A        Y       8343
> Self-heal Daemon on spidey.ib.runlevelone.lan                              
N/A       N/A        Y       22381
> Self-heal Daemon on groot.ib.runlevelone.lan                               
N/A       N/A        Y       20633
> 
> Finally..
> 
> Dan
> 
> 
> 
> 
> On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan at redhat.com> wrote:
> Stefan, 
> 
> Sounds like a brick process is not running. I have notice some strangeness
in my lab when using RDMA, I often have to forcibly restart the brick process,
often as in every single time I do a major operation, add a new volume, remove a
volume, stop a volume, etc.
> 
> gluster volume status <vol>  
> 
> Does any of the self heal daemons show N/A? If that's the case, try
forcing a restart on the volume.
> 
> gluster volume start <vol> force
> 
> This will also explain why your volumes aren't being replicated
properly.
> 
> On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at
ur.de> wrote:
> Dear all,
> 
> I faced a problem with a glusterfs volume (pure distributed, _not_
dispersed) over RDMA transport.  One user had a directory with a large number of
files (50,000 files) and just doing an "ls" in this directory yields a
"Transport endpoint not connected" error. The effect is, that
"ls" only shows some files, but not all.
> 
> The respective log file shows this error message:
> 
> [2018-05-20 20:38:25.114978] W [MSGID: 114031]
[client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: remote
operation failed [Transport endpoint is not connected]
> [2018-05-20 20:38:27.732796] W [MSGID: 103046]
[rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer
(10.100.245.18:49153), couldn't encode or decode the msg properly or write
chunks were not provided for replies that were bigger than RDMA_INLINE_THRESHOLD
(2048)
> [2018-05-20 20:38:27.732844] W [MSGID: 114031]
[client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: remote
operation failed [Transport endpoint is not connected]
> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not
connected)
> 
> I already set the memlock limit for glusterd to unlimited, but the problem
persists.
> 
> Only going from RDMA transport to TCP transport solved the problem. 
(I'm running the volume now in mixed mode, config.transport=tcp,rdma). 
Mounting with transport=rdma shows this error, mouting with transport=tcp is
fine.
> 
> however, this problem does not arise on all large directories, not on all.
I didn't recognize a pattern yet.
> 
> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . 
> 
> Is this a known issue with RDMA transport?
> 
> best wishes,
> Stefan
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
>

Dan Lavu

2018-May-30 14:20 UTC

head link

[Gluster-users] RDMA inline threshold?

Stefan,

We'll have to let somebody else chime in. I don't work on this project,
just another user, enthusiast and I've spent, still spending much time
tuning my own RDMA gluster configuration. In short, I won't have an answer
for you. If nobody can answer, I'd suggest filing a bug, that way it can be
tracked and reviewed by developers.

- Dan

On Wed, May 30, 2018 at 6:34 AM, Stefan Solbrig <stefan.solbrig at ur.de>
wrote:
> Dear Dan,
>
> thanks for the quick reply!
>
> I actually tried restarting all processes (and even rebooting all
> servers), but the error persists. I can also confirm that all birck
> processes are running.   My volume is a distrubute-only volume (not
> dispersed, no sharding).
>
> I also tried mounting with use_readdirp=no,  because the error seems to be
> connected to readdirp, but this option does not change anything.
>
> I found to options I might try:  (gluster volume get myvolumename  all |
> grep readdirp )
>    performance.force-readdirp              true
>    dht.force-readdirp                      on
> Can I turn off these safely? (or what precisely do they do?)
>
> I also assured that all glusterd processes have  unlimited locked memory.
>
> Just to state it clearly:  I do _not_ see any data corruption.  Just the
> directory listings do not work (in very rare cases) with rdma transport:
> "ls"  shows only a part of the files.
> but then I do:
>      stat  /path/to/known/filename
> it succeeds, and even
>    md5sum /path/to/known/filename/that/does/not/get/listed/with/ls
> yields the correct result.
>
> best wishes,
> Stefan
>
> > Am 30.05.2018 um 03:00 schrieb Dan Lavu <dan at redhat.com>:
> >
> > Forgot to mention, sometimes I have to do force start other volumes as
> well, its hard to determine which brick process is locked up from the logs.
> >
> >
> > Status of volume: rhev_vms_primary
> > Gluster process
>                         TCP Port  RDMA Port  Online  Pid
> > ------------------------------------------------------------
> ------------------
> > Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
>          0         49157      Y       15666
> > Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
>        0         49156      Y       2542
> > Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
>          0         49156      Y       2180
> > Self-heal Daemon on localhost
>                     N/A       N/A        N       N/A  << Brick
process is
> not running on any node.
> > Self-heal Daemon on spidey.ib.runlevelone.lan
>              N/A       N/A        N       N/A
> > Self-heal Daemon on groot.ib.runlevelone.lan
>                N/A       N/A        N       N/A
> >
> > Task Status of Volume rhev_vms_primary
> > ------------------------------------------------------------
> ------------------
> > There are no active volume tasks
> >
> >
> >  3081  gluster volume start rhev_vms_noshards force
> >  3082  gluster volume status
> >  3083  gluster volume start rhev_vms_primary force
> >  3084  gluster volume status
> >  3085  gluster volume start rhev_vms_primary rhev_vms
> >  3086  gluster volume start rhev_vms_primary rhev_vms force
> >
> > Status of volume: rhev_vms_primary
> > Gluster process
>                            TCP Port  RDMA Port  Online  Pid
> > ------------------------------------------------------------
> ------------------
> > Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
>             0         49157      Y       15666
> > Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
>           0         49156      Y       2542
> > Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
>             0         49156      Y       2180
> > Self-heal Daemon on localhost
>                        N/A       N/A        Y       8343
> > Self-heal Daemon on spidey.ib.runlevelone.lan
>                 N/A       N/A        Y       22381
> > Self-heal Daemon on groot.ib.runlevelone.lan
>                   N/A       N/A        Y       20633
> >
> > Finally..
> >
> > Dan
> >
> >
> >
> >
> > On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan at redhat.com>
wrote:
> > Stefan,
> >
> > Sounds like a brick process is not running. I have notice some
> strangeness in my lab when using RDMA, I often have to forcibly restart the
> brick process, often as in every single time I do a major operation, add a
> new volume, remove a volume, stop a volume, etc.
> >
> > gluster volume status <vol>
> >
> > Does any of the self heal daemons show N/A? If that's the case,
try
> forcing a restart on the volume.
> >
> > gluster volume start <vol> force
> >
> > This will also explain why your volumes aren't being replicated
> properly.
> >
> > On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at
ur.de>
> wrote:
> > Dear all,
> >
> > I faced a problem with a glusterfs volume (pure distributed, _not_
> dispersed) over RDMA transport.  One user had a directory with a large
> number of files (50,000 files) and just doing an "ls" in this
directory
> yields a "Transport endpoint not connected" error. The effect is,
that "ls"
> only shows some files, but not all.
> >
> > The respective log file shows this error message:
> >
> > [2018-05-20 20:38:25.114978] W [MSGID: 114031]
[client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-0: remote operation failed [Transport endpoint is not
> connected]
> > [2018-05-20 20:38:27.732796] W [MSGID: 103046]
> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (
> 10.100.245.18:49153), couldn't encode or decode the msg properly or
write
> chunks were not provided for replies that were bigger than
> RDMA_INLINE_THRESHOLD (2048)
> > [2018-05-20 20:38:27.732844] W [MSGID: 114031]
[client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-3: remote operation failed [Transport endpoint is not
> connected]
> > [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not
> connected)
> >
> > I already set the memlock limit for glusterd to unlimited, but the
> problem persists.
> >
> > Only going from RDMA transport to TCP transport solved the problem.
> (I'm running the volume now in mixed mode, config.transport=tcp,rdma).
> Mounting with transport=rdma shows this error, mouting with transport=tcp
> is fine.
> >
> > however, this problem does not arise on all large directories, not on
> all. I didn't recognize a pattern yet.
> >
> > I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs .
> >
> > Is this a known issue with RDMA transport?
> >
> > best wishes,
> > Stefan
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180530/286fb69c/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

Gluster users - May 2018 - RDMA inline threshold?

[Gluster-users] RDMA inline threshold?

[Gluster-users] RDMA inline threshold?

[Gluster-users] RDMA inline threshold?

Seemingly Similar Threads