thr3ads.net - Gluster users - [Gluster-users] Gluster Infiniband/RDMA Help [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Dan Lavu

2016-Aug-08 02:28 UTC

[Gluster-users] Gluster Infiniband/RDMA Help

Hello,

I'm having some major problems with Gluster and oVirt, I've been ripping
my
hair out with this, so if anybody can provide insight, that will be
fantastic. I've tried both transports TCP and RDMA... both are having
instability problems.

So the first thing I'm running into, intermittently, on one specific node,
will get spammed with the following message;

"[2016-08-08 00:42:50.837992] E [rpc-clnt.c:357:saved_frames_unwind]
(-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fb728b0f293] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fb7288d73d1] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb7288d74ee] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7fb7288d8d0e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7fb7288d9528] )))))
0-vmdata1-client-0: forced unwinding frame type(GlusterFS 3.3)
op(WRITE(13)) called at 2016-08-08 00:42:43.620710 (xid=0x6800b)"

Then the infiniband device will get bounced and VMs will get stuck.

Another problem I'm seeing, once a day, or every two days, an oVirt node
will hang on gluster mounts. Issuing a df to check the mounts will just
stall, this occurs hourly if RDMA is used. I can log into the hypervisor
remount the gluster volumes most of the time.

This is on Fedora 23; Gluster 3.8.1-1, the Infiniband gear is 40Gb/s QDR
Qlogic, using the ib_qib module, this configuration was working with our
old infinihost III. I couldn't get OFED to compile so all the infiniband
modules are Fedora installed.

So a volume looks like the following, (please if there is anything I need
to adjust, the settings was pulled from several examples)

Volume Name: vmdata_ha
Type: Replicate
Volume ID: 325a5fda-a491-4c40-8502-f89776a3c642
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp,rdma
Bricks:
Brick1: deadpool.ib.runlevelone.lan:/gluster/vmdata_ha
Brick2: spidey.ib.runlevelone.lan:/gluster/vmdata_ha
Brick3: groot.ib.runlevelone.lan:/gluster/vmdata_ha (arbiter)
Options Reconfigured:
performance.least-prio-threads: 4
performance.low-prio-threads: 16
performance.normal-prio-threads: 24
performance.high-prio-threads: 24
cluster.self-heal-window-size: 32
cluster.self-heal-daemon: on
performance.md-cache-timeout: 1
performance.cache-max-file-size: 2MB
performance.io-thread-count: 32
network.ping-timeout: 5
performance.write-behind-window-size: 4MB
performance.cache-size: 256MB
performance.cache-refresh-timeout: 10
server.allow-insecure: on
network.remote-dio: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
nfs.disable: on
config.transport: tcp,rdma
performance.stat-prefetch: off
cluster.eager-lock: enable

Volume Name: vmdata1
Type: Distribute
Volume ID: 3afefcb3-887c-4315-b9dc-f4e890f786eb
Status: Started
Number of Bricks: 2
Transport-type: tcp,rdma
Bricks:
Brick1: spidey.ib.runlevelone.lan:/gluster/vmdata1
Brick2: deadpool.ib.runlevelone.lan:/gluster/vmdata1
Options Reconfigured:
config.transport: tcp,rdma
network.remote-dio: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.disable: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
server.allow-insecure: on
performance.stat-prefetch: off
performance.cache-refresh-timeout: 10
performance.cache-size: 256MB
performance.write-behind-window-size: 4MB
network.ping-timeout: 5
performance.io-thread-count: 32
performance.cache-max-file-size: 2MB
performance.md-cache-timeout: 1
performance.high-prio-threads: 24
performance.normal-prio-threads: 24
performance.low-prio-threads: 16
performance.least-prio-threads: 4


/etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,tcp
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 0
    option event-threads 1
#    option rpc-auth-allow-insecure on
    option transport.socket.bind-address 0.0.0.0
#   option transport.address-family inet6
#   option base-port 49152
end-volume

I think that's a good start, thank you so much for taking the time to look
at this. You can find me on freenode, nick side_control if you want to
chat, I'm GMT -5.

Cheers,

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160807/db843789/attachment.html>

Pranith Kumar Karampuri

2016-Aug-11 06:53 UTC

head link

[Gluster-users] Gluster Infiniband/RDMA Help

Added Rafi, Raghavendra who work on RDMA

On Mon, Aug 8, 2016 at 7:58 AM, Dan Lavu <dan at redhat.com> wrote:
> Hello,
>
> I'm having some major problems with Gluster and oVirt, I've been
ripping
> my hair out with this, so if anybody can provide insight, that will be
> fantastic. I've tried both transports TCP and RDMA... both are having
> instability problems.
>
> So the first thing I'm running into, intermittently, on one specific
node,
> will get spammed with the following message;
>
> "[2016-08-08 00:42:50.837992] E [rpc-clnt.c:357:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fb728b0f293] (-->
> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fb7288d73d1] (-->
> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb7288d74ee] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7fb7288d8d0e]
> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7fb7288d9528] )))))
> 0-vmdata1-client-0: forced unwinding frame type(GlusterFS 3.3)
> op(WRITE(13)) called at 2016-08-08 00:42:43.620710 (xid=0x6800b)"
>
> Then the infiniband device will get bounced and VMs will get stuck.
>
> Another problem I'm seeing, once a day, or every two days, an oVirt
node
> will hang on gluster mounts. Issuing a df to check the mounts will just
> stall, this occurs hourly if RDMA is used. I can log into the hypervisor
> remount the gluster volumes most of the time.
>
> This is on Fedora 23; Gluster 3.8.1-1, the Infiniband gear is 40Gb/s QDR
> Qlogic, using the ib_qib module, this configuration was working with our
> old infinihost III. I couldn't get OFED to compile so all the
infiniband
> modules are Fedora installed.
>
> So a volume looks like the following, (please if there is anything I need
> to adjust, the settings was pulled from several examples)
>
> Volume Name: vmdata_ha
> Type: Replicate
> Volume ID: 325a5fda-a491-4c40-8502-f89776a3c642
> Status: Started
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp,rdma
> Bricks:
> Brick1: deadpool.ib.runlevelone.lan:/gluster/vmdata_ha
> Brick2: spidey.ib.runlevelone.lan:/gluster/vmdata_ha
> Brick3: groot.ib.runlevelone.lan:/gluster/vmdata_ha (arbiter)
> Options Reconfigured:
> performance.least-prio-threads: 4
> performance.low-prio-threads: 16
> performance.normal-prio-threads: 24
> performance.high-prio-threads: 24
> cluster.self-heal-window-size: 32
> cluster.self-heal-daemon: on
> performance.md-cache-timeout: 1
> performance.cache-max-file-size: 2MB
> performance.io-thread-count: 32
> network.ping-timeout: 5
> performance.write-behind-window-size: 4MB
> performance.cache-size: 256MB
> performance.cache-refresh-timeout: 10
> server.allow-insecure: on
> network.remote-dio: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> storage.owner-gid: 36
> storage.owner-uid: 36
> performance.readdir-ahead: on
> nfs.disable: on
> config.transport: tcp,rdma
> performance.stat-prefetch: off
> cluster.eager-lock: enable
>
> Volume Name: vmdata1
> Type: Distribute
> Volume ID: 3afefcb3-887c-4315-b9dc-f4e890f786eb
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp,rdma
> Bricks:
> Brick1: spidey.ib.runlevelone.lan:/gluster/vmdata1
> Brick2: deadpool.ib.runlevelone.lan:/gluster/vmdata1
> Options Reconfigured:
> config.transport: tcp,rdma
> network.remote-dio: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> nfs.disable: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> performance.readdir-ahead: on
> server.allow-insecure: on
> performance.stat-prefetch: off
> performance.cache-refresh-timeout: 10
> performance.cache-size: 256MB
> performance.write-behind-window-size: 4MB
> network.ping-timeout: 5
> performance.io-thread-count: 32
> performance.cache-max-file-size: 2MB
> performance.md-cache-timeout: 1
> performance.high-prio-threads: 24
> performance.normal-prio-threads: 24
> performance.low-prio-threads: 16
> performance.least-prio-threads: 4
>
>
> /etc/glusterfs/glusterd.vol
> volume management
>     type mgmt/glusterd
>     option working-directory /var/lib/glusterd
>     option transport-type socket,tcp
>     option transport.socket.keepalive-time 10
>     option transport.socket.keepalive-interval 2
>     option transport.socket.read-fail-log off
>     option ping-timeout 0
>     option event-threads 1
> #    option rpc-auth-allow-insecure on
>     option transport.socket.bind-address 0.0.0.0
> #   option transport.address-family inet6
> #   option base-port 49152
> end-volume
>
> I think that's a good start, thank you so much for taking the time to
look
> at this. You can find me on freenode, nick side_control if you want to
> chat, I'm GMT -5.
>
> Cheers,
>
> Dan
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160811/7a209c78/attachment.html>

Gluster users - Aug 2016 - Gluster Infiniband/RDMA Help

[Gluster-users] Gluster Infiniband/RDMA Help

[Gluster-users] Gluster Infiniband/RDMA Help