Shane StClair
2013-Oct-08 23:22 UTC
[Gluster-users] Fwd: RDMA status in Gluster 3.4.1 (mounts hanging)
Hi all, After many days of experimentation, doc and mailing list reading, irc asking, etc, I think the crippled RDMA status in current versions of Gluster (3.3.x - 3.4.1) is a known issue. I'd like to confirm that, share my findings, and ask about any status updates/timelines. * * After noticing that RDMA mounts were hanging with a new install of Gluster 3.4.1, I tested a series of different Gluster volumes. Simple (single brick), distributed, replicate, and distributed-replicate volumes were each tested for both tcp and rdma transport types. Detailed results are below, but the short version is that while *all volume types worked over tcp, only the simple (single brick) volume worked using rdma. All other volume types failed over rdma, *meaning that mount commands from the client hung forever. *Environment details:* OS: Debian Wheezy Server type: Dell M610 Gluster version: 3.4.1, from Gluster Debian repository Infiniband software: OFED 1.4.2, from Debian Wheezy stock packages Infiniband card info: http://fpaste.org/45305/81273796/ Loaded modules: http://fpaste.org/45306/73881138/ * * *RDMA successful configs:* Single brick *RDMA failed configs:* Distributed (2 bricks) Replicate (2 bricks) Distributed-Replicate (2 x 2 bricks) *TCP successful configs (all):* Single brick Distributed (2 bricks) Replicate (2 bricks) Distributed-Replicate (2 x 2 bricks) *Example RDMA volume creation command:* gluster volume create dist-rdma transport rdma 192.168.255.120:/home/axiom/dist-rdma-1 192.168.255.120:/home/axiom/dist-rdma-2 *Example RDMA mounting command:* mount -t glusterfs -o transport=rdma 192.168.255.120:/dist-rdma dist-rdma *Logs from example failed RDMA config (distributed/two bricks):* gluster volume info: http://fpaste.org/45298/38127208/ gluster volume status: http://fpaste.org/45299/13812721/ glusterd.vol.log excerpt: http://fpaste.org/45302/13812722/ client log: http://fpaste.org/45303/38127234/ These results somewhat agree with Justin Clift's findings during the GlusterFest ( http://www.gluster.org/community/documentation/index.php/GlusterFest) testing, which evolved into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=978148 However, in the bug report it's mentioned that only the distributed-replicate volume variant is failing, while I'm seeing distributed and replicate volumes fail also. I'd be happy to create a new bug or update the existing bug if needed. Let me know if any additional information is needed. Also, I dunno if there's a proper place to post a warning about RDMA's status, but it seems that a handful of people have banged their head against this problem. I'd suggest that if the resource don't exist to address this issue by 3.4.2 that a warning be issued when creating an RDMA volume, or perhaps that RDMA volume creation be disabled altogether. Please let me know if we can be of any help in the future (testing, log output, etc). Best, Shane -- Shane StClair Software Engineer Axiom Consulting & Design http://www.axiomalaska.com -- Shane StClair Software Engineer Axiom Consulting & Design http://www.axiomalaska.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131008/395f46b9/attachment.html>