冷波
2016-Jun-24 14:32 UTC
[Gluster-users] Fuse client hangs on doing multithreading IO tests
Hi, We found a problem when doing traffic tests. We created a replicated volume with two storage nodes (CentOS 6.5). There was one FUSE client (CentOS 6.7) which did multi-threading reads and writes. Most of IOs are reads for big files. All machines used 10Gbe NICs. And the typical read throught was 4-6Gbps (0.5-1.5GB/s). After the test ran several minutes, the test program hung. The throughput suddenly dropped to zero. Then there was no traffic any more. If we ran df, df would hang, too. But we could still read or write the volume from other clients. We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had this problem. We also tried to restore default GlusterFS options, but the problem still existed. The GlusterFS version was 3.7.11 for the following stacks. This was the stack of dd when hanging: [ffffffffa046d211] wait_answer_interruptible+0x81/0xc0 [fuse] [ffffffffa046d42b] __fuse_request_send+0x1db/0x2b0 [fuse] [ffffffffa046d512] fuse_request_send+0x12/0x20 [fuse] [ffffffffa0477d4a] fuse_statfs+0xda/0x150 [fuse] [ffffffff811c2b64] statfs_by_dentry+0x74/0xa0 [ffffffff811c2c9b] vfs_statfs+0x1b/0xb0 [ffffffff811c2e97] user_statfs+0x47/0xb0 [ffffffff811c2f9a] sys_statfs+0x2a/0x50 [ffffffff8100b072] system_call_fastpath+0x16/0x1b [ffffffffffffffff] 0xffffffffffffffff This was the stack of gluster: [ffffffff810b226a] futex_wait_queue_me+0xba/0xf0 [ffffffff810b33a0] futex_wait+0x1c0/0x310 [ffffffff810b4c91] do_futex+0x121/0xae0 [ffffffff810b56cb] sys_futex+0x7b/0x170 [ffffffff8100b072] system_call_fastpath+0x16/0x1b [ffffffffffffffff] 0xffffffffffffffff This was the stack of the test program: [ffffffff810a3f74] hrtimer_nanosleep+0xc4/0x180 [ffffffff810a409e] sys_nanosleep+0x6e/0x80 [ffffffff8100b072] system_call_fastpath+0x16/0x1b [ffffffffffffffff] 0xffffffffffffffff Any clue? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160624/90f5146e/attachment.html>
FNU Raghavendra Manjunath
2016-Jun-24 15:18 UTC
[Gluster-users] [Gluster-devel] Fuse client hangs on doing multithreading IO tests
Hi, Any idea how big were the files that were being read? Can you please attach the logs from all the gluster server and client nodes? (the logs can be found in /var/log/glusterfs) Also please provide the /var/log/messages from all the server and client nodes. Regards, Raghavendra On Fri, Jun 24, 2016 at 10:32 AM, ?? <lengbo at storswift.com> wrote:> Hi, > > > We found a problem when doing traffic tests. We created a replicated > volume with two storage nodes (CentOS 6.5). There was one FUSE client > (CentOS 6.7) which did multi-threading reads and writes. Most of IOs are > reads for big files. All machines used 10Gbe NICs. And the typical read > throught was 4-6Gbps (0.5-1.5GB/s). > > > After the test ran several minutes, the test program hung. The throughput > suddenly dropped to zero. Then there was no traffic any more. If we ran df, > df would hang, too. But we could still read or write the volume from other > clients. > > > We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had > this problem. We also tried to restore default GlusterFS options, but the > problem still existed. > > > The GlusterFS version was 3.7.11 for the following stacks. > > > This was the stack of dd when hanging: > > [<ffffffffa046d211>] wait_answer_interruptible+0x81/0xc0 [fuse] > > [<ffffffffa046d42b>] __fuse_request_send+0x1db/0x2b0 [fuse] > > [<ffffffffa046d512>] fuse_request_send+0x12/0x20 [fuse] > > [<ffffffffa0477d4a>] fuse_statfs+0xda/0x150 [fuse] > > [<ffffffff811c2b64>] statfs_by_dentry+0x74/0xa0 > > [<ffffffff811c2c9b>] vfs_statfs+0x1b/0xb0 > > [<ffffffff811c2e97>] user_statfs+0x47/0xb0 > > [<ffffffff811c2f9a>] sys_statfs+0x2a/0x50 > > [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b > > [<ffffffffffffffff>] 0xffffffffffffffff > > > This was the stack of gluster: > > [<ffffffff810b226a>] futex_wait_queue_me+0xba/0xf0 > > [<ffffffff810b33a0>] futex_wait+0x1c0/0x310 > > [<ffffffff810b4c91>] do_futex+0x121/0xae0 > > [<ffffffff810b56cb>] sys_futex+0x7b/0x170 > > [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b > > [<ffffffffffffffff>] 0xffffffffffffffff > > > This was the stack of the test program: > > [<ffffffff810a3f74>] hrtimer_nanosleep+0xc4/0x180 > > [<ffffffff810a409e>] sys_nanosleep+0x6e/0x80 > > [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b > > [<ffffffffffffffff>] 0xffffffffffffffff > > > Any clue? > > Thanks, > Paul > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160624/ef1c47d5/attachment.html>