Forgot to mention, sometimes I have to do force start other volumes as well, its hard to determine which brick process is locked up from the logs. Status of volume: rhev_vms_primary Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49157 Y 15666 Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2542 Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2180 Self-heal Daemon on localhost N/A N/A N N/A << Brick process is not running on any node. Self-heal Daemon on spidey.ib.runlevelone.lan N/A N/A N N/A Self-heal Daemon on groot.ib.runlevelone.lan N/A N/A N N/A Task Status of Volume rhev_vms_primary ------------------------------------------------------------------------------ There are no active volume tasks 3081 gluster volume start rhev_vms_noshards force 3082 gluster volume status 3083 gluster volume start rhev_vms_primary force 3084 gluster volume status 3085 gluster volume start rhev_vms_primary rhev_vms 3086 gluster volume start rhev_vms_primary rhev_vms force Status of volume: rhev_vms_primary Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49157 Y 15666 Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2542 Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2180 Self-heal Daemon on localhost N/A N/A Y 8343 Self-heal Daemon on spidey.ib.runlevelone.lan N/A N/A Y 22381 Self-heal Daemon on groot.ib.runlevelone.lan N/A N/A Y 20633 Finally.. Dan On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan at redhat.com> wrote:> Stefan, > > Sounds like a brick process is not running. I have notice some strangeness > in my lab when using RDMA, I often have to forcibly restart the brick > process, often as in every single time I do a major operation, add a new > volume, remove a volume, stop a volume, etc. > > gluster volume status <vol> > > Does any of the self heal daemons show N/A? If that's the case, try > forcing a restart on the volume. > > gluster volume start <vol> force > > This will also explain why your volumes aren't being replicated properly. > > On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at ur.de> > wrote: > >> Dear all, >> >> I faced a problem with a glusterfs volume (pure distributed, _not_ >> dispersed) over RDMA transport. One user had a directory with a large >> number of files (50,000 files) and just doing an "ls" in this directory >> yields a "Transport endpoint not connected" error. The effect is, that "ls" >> only shows some files, but not all. >> >> The respective log file shows this error message: >> >> [2018-05-20 20:38:25.114978] W [MSGID: 114031] >> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: >> remote operation failed [Transport endpoint is not connected] >> [2018-05-20 20:38:27.732796] W [MSGID: 103046] >> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer ( >> 10.100.245.18:49153), couldn't encode or decode the msg properly or >> write chunks were not provided for replies that were bigger than >> RDMA_INLINE_THRESHOLD (2048) >> [2018-05-20 20:38:27.732844] W [MSGID: 114031] >> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: >> remote operation failed [Transport endpoint is not connected] >> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] >> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not >> connected) >> >> I already set the memlock limit for glusterd to unlimited, but the >> problem persists. >> >> Only going from RDMA transport to TCP transport solved the problem. (I'm >> running the volume now in mixed mode, config.transport=tcp,rdma). Mounting >> with transport=rdma shows this error, mouting with transport=tcp is fine. >> >> however, this problem does not arise on all large directories, not on >> all. I didn't recognize a pattern yet. >> >> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . >> >> Is this a known issue with RDMA transport? >> >> best wishes, >> Stefan >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180529/a54d142c/attachment.html>
Dear Dan, thanks for the quick reply! I actually tried restarting all processes (and even rebooting all servers), but the error persists. I can also confirm that all birck processes are running. My volume is a distrubute-only volume (not dispersed, no sharding). I also tried mounting with use_readdirp=no, because the error seems to be connected to readdirp, but this option does not change anything. I found to options I might try: (gluster volume get myvolumename all | grep readdirp ) performance.force-readdirp true dht.force-readdirp on Can I turn off these safely? (or what precisely do they do?) I also assured that all glusterd processes have unlimited locked memory. Just to state it clearly: I do _not_ see any data corruption. Just the directory listings do not work (in very rare cases) with rdma transport: "ls" shows only a part of the files. but then I do: stat /path/to/known/filename it succeeds, and even md5sum /path/to/known/filename/that/does/not/get/listed/with/ls yields the correct result. best wishes, Stefan> Am 30.05.2018 um 03:00 schrieb Dan Lavu <dan at redhat.com>: > > Forgot to mention, sometimes I have to do force start other volumes as well, its hard to determine which brick process is locked up from the logs. > > > Status of volume: rhev_vms_primary > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49157 Y 15666 > Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2542 > Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2180 > Self-heal Daemon on localhost N/A N/A N N/A << Brick process is not running on any node. > Self-heal Daemon on spidey.ib.runlevelone.lan N/A N/A N N/A > Self-heal Daemon on groot.ib.runlevelone.lan N/A N/A N N/A > > Task Status of Volume rhev_vms_primary > ------------------------------------------------------------------------------ > There are no active volume tasks > > > 3081 gluster volume start rhev_vms_noshards force > 3082 gluster volume status > 3083 gluster volume start rhev_vms_primary force > 3084 gluster volume status > 3085 gluster volume start rhev_vms_primary rhev_vms > 3086 gluster volume start rhev_vms_primary rhev_vms force > > Status of volume: rhev_vms_primary > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49157 Y 15666 > Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2542 > Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2180 > Self-heal Daemon on localhost N/A N/A Y 8343 > Self-heal Daemon on spidey.ib.runlevelone.lan N/A N/A Y 22381 > Self-heal Daemon on groot.ib.runlevelone.lan N/A N/A Y 20633 > > Finally.. > > Dan > > > > > On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan at redhat.com> wrote: > Stefan, > > Sounds like a brick process is not running. I have notice some strangeness in my lab when using RDMA, I often have to forcibly restart the brick process, often as in every single time I do a major operation, add a new volume, remove a volume, stop a volume, etc. > > gluster volume status <vol> > > Does any of the self heal daemons show N/A? If that's the case, try forcing a restart on the volume. > > gluster volume start <vol> force > > This will also explain why your volumes aren't being replicated properly. > > On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at ur.de> wrote: > Dear all, > > I faced a problem with a glusterfs volume (pure distributed, _not_ dispersed) over RDMA transport. One user had a directory with a large number of files (50,000 files) and just doing an "ls" in this directory yields a "Transport endpoint not connected" error. The effect is, that "ls" only shows some files, but not all. > > The respective log file shows this error message: > > [2018-05-20 20:38:25.114978] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: remote operation failed [Transport endpoint is not connected] > [2018-05-20 20:38:27.732796] W [MSGID: 103046] [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (10.100.245.18:49153), couldn't encode or decode the msg properly or write chunks were not provided for replies that were bigger than RDMA_INLINE_THRESHOLD (2048) > [2018-05-20 20:38:27.732844] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: remote operation failed [Transport endpoint is not connected] > [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not connected) > > I already set the memlock limit for glusterd to unlimited, but the problem persists. > > Only going from RDMA transport to TCP transport solved the problem. (I'm running the volume now in mixed mode, config.transport=tcp,rdma). Mounting with transport=rdma shows this error, mouting with transport=tcp is fine. > > however, this problem does not arise on all large directories, not on all. I didn't recognize a pattern yet. > > I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . > > Is this a known issue with RDMA transport? > > best wishes, > Stefan > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > >
Stefan, We'll have to let somebody else chime in. I don't work on this project, just another user, enthusiast and I've spent, still spending much time tuning my own RDMA gluster configuration. In short, I won't have an answer for you. If nobody can answer, I'd suggest filing a bug, that way it can be tracked and reviewed by developers. - Dan On Wed, May 30, 2018 at 6:34 AM, Stefan Solbrig <stefan.solbrig at ur.de> wrote:> Dear Dan, > > thanks for the quick reply! > > I actually tried restarting all processes (and even rebooting all > servers), but the error persists. I can also confirm that all birck > processes are running. My volume is a distrubute-only volume (not > dispersed, no sharding). > > I also tried mounting with use_readdirp=no, because the error seems to be > connected to readdirp, but this option does not change anything. > > I found to options I might try: (gluster volume get myvolumename all | > grep readdirp ) > performance.force-readdirp true > dht.force-readdirp on > Can I turn off these safely? (or what precisely do they do?) > > I also assured that all glusterd processes have unlimited locked memory. > > Just to state it clearly: I do _not_ see any data corruption. Just the > directory listings do not work (in very rare cases) with rdma transport: > "ls" shows only a part of the files. > but then I do: > stat /path/to/known/filename > it succeeds, and even > md5sum /path/to/known/filename/that/does/not/get/listed/with/ls > yields the correct result. > > best wishes, > Stefan > > > Am 30.05.2018 um 03:00 schrieb Dan Lavu <dan at redhat.com>: > > > > Forgot to mention, sometimes I have to do force start other volumes as > well, its hard to determine which brick process is locked up from the logs. > > > > > > Status of volume: rhev_vms_primary > > Gluster process > TCP Port RDMA Port Online Pid > > ------------------------------------------------------------ > ------------------ > > Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary > 0 49157 Y 15666 > > Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary > 0 49156 Y 2542 > > Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary > 0 49156 Y 2180 > > Self-heal Daemon on localhost > N/A N/A N N/A << Brick process is > not running on any node. > > Self-heal Daemon on spidey.ib.runlevelone.lan > N/A N/A N N/A > > Self-heal Daemon on groot.ib.runlevelone.lan > N/A N/A N N/A > > > > Task Status of Volume rhev_vms_primary > > ------------------------------------------------------------ > ------------------ > > There are no active volume tasks > > > > > > 3081 gluster volume start rhev_vms_noshards force > > 3082 gluster volume status > > 3083 gluster volume start rhev_vms_primary force > > 3084 gluster volume status > > 3085 gluster volume start rhev_vms_primary rhev_vms > > 3086 gluster volume start rhev_vms_primary rhev_vms force > > > > Status of volume: rhev_vms_primary > > Gluster process > TCP Port RDMA Port Online Pid > > ------------------------------------------------------------ > ------------------ > > Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary > 0 49157 Y 15666 > > Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary > 0 49156 Y 2542 > > Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary > 0 49156 Y 2180 > > Self-heal Daemon on localhost > N/A N/A Y 8343 > > Self-heal Daemon on spidey.ib.runlevelone.lan > N/A N/A Y 22381 > > Self-heal Daemon on groot.ib.runlevelone.lan > N/A N/A Y 20633 > > > > Finally.. > > > > Dan > > > > > > > > > > On Tue, May 29, 2018 at 8:47 PM, Dan Lavu <dan at redhat.com> wrote: > > Stefan, > > > > Sounds like a brick process is not running. I have notice some > strangeness in my lab when using RDMA, I often have to forcibly restart the > brick process, often as in every single time I do a major operation, add a > new volume, remove a volume, stop a volume, etc. > > > > gluster volume status <vol> > > > > Does any of the self heal daemons show N/A? If that's the case, try > forcing a restart on the volume. > > > > gluster volume start <vol> force > > > > This will also explain why your volumes aren't being replicated > properly. > > > > On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <stefan.solbrig at ur.de> > wrote: > > Dear all, > > > > I faced a problem with a glusterfs volume (pure distributed, _not_ > dispersed) over RDMA transport. One user had a directory with a large > number of files (50,000 files) and just doing an "ls" in this directory > yields a "Transport endpoint not connected" error. The effect is, that "ls" > only shows some files, but not all. > > > > The respective log file shows this error message: > > > > [2018-05-20 20:38:25.114978] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] > 0-glurch-client-0: remote operation failed [Transport endpoint is not > connected] > > [2018-05-20 20:38:27.732796] W [MSGID: 103046] > [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer ( > 10.100.245.18:49153), couldn't encode or decode the msg properly or write > chunks were not provided for replies that were bigger than > RDMA_INLINE_THRESHOLD (2048) > > [2018-05-20 20:38:27.732844] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] > 0-glurch-client-3: remote operation failed [Transport endpoint is not > connected] > > [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] > 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not > connected) > > > > I already set the memlock limit for glusterd to unlimited, but the > problem persists. > > > > Only going from RDMA transport to TCP transport solved the problem. > (I'm running the volume now in mixed mode, config.transport=tcp,rdma). > Mounting with transport=rdma shows this error, mouting with transport=tcp > is fine. > > > > however, this problem does not arise on all large directories, not on > all. I didn't recognize a pattern yet. > > > > I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . > > > > Is this a known issue with RDMA transport? > > > > best wishes, > > Stefan > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180530/286fb69c/attachment.html>