Philippe Muller
2012-Jul-16 16:16 UTC
[Gluster-users] RDMA "not fully supported" by GlusterFS 3.3.0 ?!
Hi RedHat & GlusterFS users, Last week-end, I worked on a GlusterFS cluster upgrade, from 3.0.3 to 3.3.0. We were using hand-made volume files defining 2 volumes, a distributed one, and a replicated-distribute one; both using the "transport-type ib-verbs" option. One of our objectives was to use the "gluster" CLI tool (which doesn't existed in 3.0.3 - from what I remember). Here is what we did: 1 - Shutdown all glusterfs instances 2 - Install the Gluster 3.3.0 3 - Start glusterd on all hosts 4 - Create a trusted pool with all our hosts 5 - Create "compatible volumes" using the CLI tool; using the same bricks we were using with our hand-made volfiles and using the "rdma" transport (since ib-verbs was no longer an option...) 6 - Mount the volumes Of course, we tested that scenario on VMs. No issues with data. We tested everything except.... RDMA ! When we finally made the upgrade, everything went fine, except mounting the volumes. We got this kind of error messages in the log files: "E [rdma.c:4458:tcp_connect_finish] 0-zodiac-client-3: tcp connect to failed (Connection refused)" (notice the 2 white spaces between "connect to" and "failed") That reminded me of an issue when we had a problem with the subnet manager running on the IB switch. But this time, the switch wasn't responsible; IPoIB was still running fine... I scratched my head more than once, thinking about what I could possibly have forgotten. Then I searched for all information I could find about RDMA and 3.3.0. Here is what I found: - On page 123 of the "GlusterFS Administration Guide 3.3.0", a small note saying: "NOTE: with 3.3.0 release, transport type 'rdma' and 'tcp,rdma' are not fully supported." - On July 7, Ling Ho started a thread on this mailing-list, with very similar symptoms: http://www.mail-archive.com/gluster-users at gluster.org/msg09326.html ; but he doesn't got any answer. In the upgrade urgency, we weren't sure rollbacking to 3.0.3 was a good option (since we don't precisely known what XFS attributes were modified by 3.3.0 on the backend FS). So we switched to TCP (over IPoIB). It's working. We are now running 3.3.0. But we are no longer taking advantage of RDMA. So here are a few questions: - Did I missed something that prevented me to use RDMA in 3.3.0 ? - Is there a way to use RDMA in 3.3.0 ? - Is there any official communication about the 3.3.0 RDMA issue ? - Is there a 3.3.x release with RDMA support planned ? For when ? - Will the RDMA transport be dropped in future releases ? Thanks ! (and yeah, despite that issue, I still love GlusterFS :-) Philippe Muller -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120716/c6651928/attachment.html>
Joe Landman
2012-Jul-16 19:37 UTC
[Gluster-users] RDMA "not fully supported" by GlusterFS 3.3.0 ?!
On 07/16/2012 12:16 PM, Philippe Muller wrote:> Here is what I found: - On page 123 of the "GlusterFS Administration > Guide 3.3.0", a small note saying: "NOTE: with 3.3.0 release, > transport type 'rdma' and 'tcp,rdma' are not fully supported."I don't see this indicated in the 3.2.x series, though arguably, it didn't work well (tcp,rdma or even pure rdma). Last time it worked well for us was the 3.0.x series. I definitely see it now in the 3.3.0 docs. Oh well. Should we assume this is a feature deprecation and RDMA support will be removed going forward? Need to know soon for planning purposes ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Ling Ho
2012-Jul-23 19:30 UTC
[Gluster-users] RDMA "not fully supported" by GlusterFS 3.3.0 ?!
On 07/16/2012 09:16 AM, Philippe Muller wrote:> Hi RedHat & GlusterFS users, > > Last week-end, I worked on a GlusterFS cluster upgrade, from 3.0.3 to > 3.3.0. > We were using hand-made volume files defining 2 volumes, a distributed > one, and a replicated-distribute one; both using the "transport-type > ib-verbs" option. > > One of our objectives was to use the "gluster" CLI tool (which doesn't > existed in 3.0.3 - from what I remember). > > Here is what we did: > 1 - Shutdown all glusterfs instances > 2 - Install the Gluster 3.3.0 > 3 - Start glusterd on all hosts > 4 - Create a trusted pool with all our hosts > 5 - Create "compatible volumes" using the CLI tool; using the same > bricks we were using with our hand-made volfiles and using the "rdma" > transport (since ib-verbs was no longer an option...) > 6 - Mount the volumes > > Of course, we tested that scenario on VMs. No issues with data. We > tested everything except.... RDMA ! > > When we finally made the upgrade, everything went fine, except > mounting the volumes. We got this kind of error messages in the log files: > "E [rdma.c:4458:tcp_connect_finish] 0-zodiac-client-3: tcp connect to > failed (Connection refused)" > (notice the 2 white spaces between "connect to" and "failed") > That reminded me of an issue when we had a problem with the subnet manager running on the IB switch. But this time, the switch wasn't responsible; IPoIB was still running fine... > > > > I scratched my head more than once, thinking about what I could possibly have forgotten. Then I searched for all information I could find about RDMA and 3.3.0. > > Here is what I found: > - On page 123 of the "GlusterFS Administration Guide 3.3.0", a small note saying: "NOTE: with 3.3.0 release, transport type 'rdma' and 'tcp,rdma' are not fully supported." > > > - On July 7, Ling Ho started a thread on this mailing-list, with very similar symptoms:http://www.mail-archive.com/gluster-users at gluster.org/msg09326.html ; but he doesn't got any answer. > > > > In the upgrade urgency, we weren't sure rollbacking to 3.0.3 was a good option (since we don't precisely known what XFS attributes were modified by 3.3.0 on the backend FS). So we switched to TCP (over IPoIB). > > > It's working. We are now running 3.3.0. But we are no longer taking advantage of RDMA. > > So here are a few questions: > - Did I missed something that prevented me to use RDMA in 3.3.0 ? > - Is there a way to use RDMA in 3.3.0 ? > > > - Is there any official communication about the 3.3.0 RDMA issue ? > - Is there a 3.3.x release with RDMA support planned ? For when ? > - Will the RDMA transport be dropped in future releases ? > > Thanks ! > (and yeah, despite that issue, I still love GlusterFS :-) > > > Philippe MullerI just came back from one week vacation. Yes, I didn't get any reply from the list, and were not able to get RDMA working when the server is configured for tcp,rdma. When I was doing testing, I had set up the server using rdma only and totally missed this. I ended up using tcp with ipoverib. The performance is much better than tcp over 10G/s. However, since I am in a mix environment, and my I have to do some static routing on the gluster server. Basically routing the ipoverib subnet to the 10G/s subnet which the bricks are all set up with. Things have been working fine. ... ling -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120723/2bfd2934/attachment.html>