Joerg Blank
2010-Dec-31 18:02 UTC
[Gluster-users] glusterfs 3.1.1 rdma module crashing when mounting volume
Hi all, We have a small HPC cluster and I tried to harness the spare disk space of our compute nodes to take of some load from the cluster's nfs server. I started of using glusterfs 3.0.x packaged with Debian Squeeze (all nodes use this version) I tried updating to glusterfs 3.1.x using the prepackaged files [1] from gluster.org, but found out I was no longer able to use the Infiniband interconnect, because the packages seem to be compiled without rdma support. To get the faster interconnect back I repackaged glusterfs 3.1.1 from the source tarball and installed it on all nodes. However rdma crashes when mounting a volume on the head node [2], it works fine from the compute nodes. The only significant in respect to infiniband is, that the head node uses another nic: Work Nodes: 02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) Head Node: 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) If anyone has an idea how to get this working, please let me know. Regards, J?rg Blank [1] http://download.gluster.com/pub/gluster/glusterfs/3.1/LATEST/Debian/ [2] Backtrace from logs: [2010-12-24 22:45:11.516902] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-24 22:45:11.516943] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-24 22:45:11.516955] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-24 22:45:11.527333] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-24 22:45:11.527529] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-24 22:45:11.527541] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-24 22:45:11.527611] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-24 22:45:11.527623] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-24 22:45:11 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib/libc.so.6(+0x321e0)[0x7f8e2c3ea1e0] /lib/libc.so.6(+0x7a126)[0x7f8e2c432126] /usr/lib/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x37c)[0x7f8e28956e7c] /usr/lib/libgfrpc.so.0(rpc_transport_load+0x365)[0x7f8e2cd5a035] /usr/lib/libgfrpc.so.0(rpc_clnt_new+0xf9)[0x7f8e2cd5de59] /usr/lib/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa9)[0x7f8e29c32b09] /usr/lib/glusterfs/3.1.1/xlator/protocol/client.so(init+0xf1)[0x7f8e29c32cb1] /usr/lib/libglusterfs.so.0(xlator_init+0x58)[0x7f8e2cf7c978] /usr/lib/libglusterfs.so.0(glusterfs_graph_init+0x35)[0x7f8e2cfa5b05] /usr/lib/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x7f8e2cfa5c48] /usr/sbin/glusterfs(glusterfs_process_volfp+0xba)[0x40447a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc7)[0x405cc7] /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f8e2cd5cb75] /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)[0x7f8e2cd5cdc9] /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7f8e2cd57d7d] /usr/lib/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f8e2a870c94] /usr/lib/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0xb3)[0x7f8e2a870d63] /usr/lib/libglusterfs.so.0(+0x3a272)[0x7f8e2cf9d272] /usr/sbin/glusterfs(main+0x247)[0x4054c7] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f8e2c3d6c4d] /usr/sbin/glusterfs[0x403179] ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
Jeremy Stout
2010-Dec-31 18:44 UTC
[Gluster-users] glusterfs 3.1.1 rdma module crashing when mounting volume
This looks like the same issue I experienced earlier in the month. I would suggest rebuilding GlusterFS from source using the patch that was posted here: http://gluster.org/pipermail/gluster-users/2010-December/006141.html The patch resolved the queue creation issues I was experiencing. Jeremy Stout On Fri, Dec 31, 2010 at 1:02 PM, Joerg Blank <j.blank at fz-juelich.de> wrote:> Hi all, > > We have a small HPC cluster and I tried to harness the spare disk space > of our compute nodes to take of some load from the cluster's nfs server. > > I started of using glusterfs 3.0.x packaged with Debian Squeeze (all > nodes use this version) > > I tried updating to glusterfs 3.1.x using the prepackaged files [1] > from gluster.org, but found out I was no longer able to use the > Infiniband interconnect, because the packages seem to be compiled > without rdma support. > > To get the faster interconnect back I repackaged glusterfs 3.1.1 from > the source tarball and installed it on all nodes. However rdma crashes > when mounting a volume on the head node [2], it works fine from the > compute nodes. The only significant in respect to infiniband is, that > the head node uses another nic: > > Work Nodes: > 02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 > 5GT/s - IB QDR / 10GigE] (rev b0) > > Head Node: > 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx > HCA] (rev 20) > > > If anyone has an idea how to get this working, please let me know. > > Regards, > > J?rg Blank > > > [1] http://download.gluster.com/pub/gluster/glusterfs/3.1/LATEST/Debian/ > > [2] Backtrace from logs: > > [2010-12-24 22:45:11.516902] W [io-stats.c:1644:init] test-volume: > dangling volume. check volfile > [2010-12-24 22:45:11.516943] W [dict.c:1204:data_to_str] dict: @data=(nil) > [2010-12-24 22:45:11.516955] W [dict.c:1204:data_to_str] dict: @data=(nil) > [2010-12-24 22:45:11.527333] E [rdma.c:2066:rdma_create_cq] > rpc-transport/rdma: test-volume-client-1: creation of send_cq failed > [2010-12-24 22:45:11.527529] E [rdma.c:3771:rdma_get_device] > rpc-transport/rdma: test-volume-client-1: could not create CQ > [2010-12-24 22:45:11.527541] E [rdma.c:3957:rdma_init] > rpc-transport/rdma: could not create rdma device for mthca0 > [2010-12-24 22:45:11.527611] E [rdma.c:4789:init] test-volume-client-1: > Failed to initialize IB Device > [2010-12-24 22:45:11.527623] E [rpc-transport.c:971:rpc_transport_load] > rpc-transport: 'rdma' initialization failed > pending frames: > > patchset: v3.1.1 > signal received: 11 > time of crash: 2010-12-24 22:45:11 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > fdatasync 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.1.1 > /lib/libc.so.6(+0x321e0)[0x7f8e2c3ea1e0] > /lib/libc.so.6(+0x7a126)[0x7f8e2c432126] > /usr/lib/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x37c)[0x7f8e28956e7c] > /usr/lib/libgfrpc.so.0(rpc_transport_load+0x365)[0x7f8e2cd5a035] > /usr/lib/libgfrpc.so.0(rpc_clnt_new+0xf9)[0x7f8e2cd5de59] > /usr/lib/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa9)[0x7f8e29c32b09] > /usr/lib/glusterfs/3.1.1/xlator/protocol/client.so(init+0xf1)[0x7f8e29c32cb1] > /usr/lib/libglusterfs.so.0(xlator_init+0x58)[0x7f8e2cf7c978] > /usr/lib/libglusterfs.so.0(glusterfs_graph_init+0x35)[0x7f8e2cfa5b05] > /usr/lib/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x7f8e2cfa5c48] > /usr/sbin/glusterfs(glusterfs_process_volfp+0xba)[0x40447a] > /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc7)[0x405cc7] > /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f8e2cd5cb75] > /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)[0x7f8e2cd5cdc9] > /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7f8e2cd57d7d] > /usr/lib/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f8e2a870c94] > /usr/lib/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0xb3)[0x7f8e2a870d63] > /usr/lib/libglusterfs.so.0(+0x3a272)[0x7f8e2cf9d272] > /usr/sbin/glusterfs(main+0x247)[0x4054c7] > /lib/libc.so.6(__libc_start_main+0xfd)[0x7f8e2c3d6c4d] > /usr/sbin/glusterfs[0x403179] > > > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), > Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Prof. Dr. Sebastian M. Schmidt > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
Joerg Blank
2011-Jan-03 16:39 UTC
[Gluster-users] glusterfs 3.1.1 rdma module crashing when mounting volume
Hi,> This looks like the same issue I experienced earlier in the month. I > would suggest rebuilding GlusterFS from source using the patch that > was posted here: > http://gluster.org/pipermail/gluster-users/2010-December/006141.html > > The patch resolved the queue creation issues I was experiencing.Thanks for the tip. I created a patched version of my package and it solved the problem. Regards, J?rg Blank ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------