Pruner, Anne (Anne)
2013-Oct-09 18:22 UTC
[Gluster-users] Can't access volume during self-healing
I'm evaluating gluster for use in our product, and I want to ensure that I understand the failover behavior. What I'm seeing isn't great, but it doesn't look from the docs I've read that this is what everyone else is experiencing. Is this normal? Setup: - one volume, distributed, replicated (2), with two bricks on two different servers - 35,000 files on volume, about 1MB each, all in one directory (I'm open to changing this, if that's the problem. ls -l takes a really long time) - volume is mounted (mount -t gluster) on server 1 Procedure: - I stop glusterd and glusterfsd on server1, and send a few files to the volume. This is fine. I can write and read the files. - I start glusterd on server1, and this starts glusterfsd. This triggers self-heal. - Send a file to the server, and try to read it. - Sending takes a couple of minutes. Reading is immediate. - Once self-heal is done, subsequent sends and reads are immediate. I tried profiling this operation, and it seems like it's stuck on locking the file: (server1 is uca-amm3.cnda.avaya.com. server2 is uc-amm4.cnda.avaya.com) Brick: uc-amm4.cnda.avaya.com:/media/data/brick1 ------------------------------------------------ Cumulative Stats: Block Size: 1024b+ 4096b+ 8192b+ No. of Reads: 0 0 0 No. of Writes: 112 1216 38216 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 0 1765 12554 No. of Writes: 144493 15648 3032 Block Size: 131072b+ No. of Reads: 91441 No. of Writes: 247 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 5141 FORGET 0.00 0.00 us 0.00 us 0.00 us 21270 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6 RELEASEDIR 0.00 61.00 us 61.00 us 61.00 us 1 GETXATTR 0.00 62.00 us 62.00 us 62.00 us 1 OPENDIR 0.00 36.86 us 27.00 us 74.00 us 7 FSTAT 0.00 45.70 us 21.00 us 81.00 us 10 FLUSH 0.00 123.00 us 83.00 us 135.00 us 5 OPEN 0.00 118.29 us 41.00 us 315.00 us 7 STATFS 0.00 419.60 us 266.00 us 539.00 us 5 CREATE 0.00 422.69 us 118.00 us 2087.00 us 13 XATTROP 0.00 1202.54 us 18.00 us 14631.00 us 13 ENTRYLK 0.00 151.12 us 75.00 us 200.00 us 125 READ 0.00 37.29 us 13.00 us 1232.00 us 1549 FINODELK 0.00 80.78 us 43.00 us 151.00 us 762 WRITE 0.00 74.75 us 40.00 us 371.00 us 1524 FXATTROP 0.04 4004.48 us 95.00 us 17214.00 us 1156 READDIRP 99.96 16538.92 us 58.00 us 976002.00 us 660602 LOOKUP Duration: 2820 seconds Data Read: 13651676656 bytes Data Written: 3941825592 bytes Interval 0 Stats: Block Size: 1024b+ 4096b+ 8192b+ No. of Reads: 0 0 0 No. of Writes: 112 1216 38216 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 0 1765 12554 No. of Writes: 144493 15648 3032 Block Size: 131072b+ No. of Reads: 91441 No. of Writes: 247 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 5141 FORGET 0.00 0.00 us 0.00 us 0.00 us 21270 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6 RELEASEDIR 0.00 61.00 us 61.00 us 61.00 us 1 GETXATTR 0.00 62.00 us 62.00 us 62.00 us 1 OPENDIR 0.00 36.86 us 27.00 us 74.00 us 7 FSTAT 0.00 45.70 us 21.00 us 81.00 us 10 FLUSH 0.00 123.00 us 83.00 us 135.00 us 5 OPEN 0.00 118.29 us 41.00 us 315.00 us 7 STATFS 0.00 419.60 us 266.00 us 539.00 us 5 CREATE 0.00 422.69 us 118.00 us 2087.00 us 13 XATTROP 0.00 1202.54 us 18.00 us 14631.00 us 13 ENTRYLK 0.00 151.12 us 75.00 us 200.00 us 125 READ 0.00 37.29 us 13.00 us 1232.00 us 1549 FINODELK 0.00 80.78 us 43.00 us 151.00 us 762 WRITE 0.00 74.75 us 40.00 us 371.00 us 1524 FXATTROP 0.04 4004.48 us 95.00 us 17214.00 us 1156 READDIRP 99.96 16538.92 us 58.00 us 976002.00 us 660602 LOOKUP Duration: 2820 seconds Data Read: 13651676656 bytes Data Written: 3941825592 bytes Brick: uca-amm3.cnda.avaya.com:/media/data/brick1 ------------------------------------------------- Cumulative Stats: Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 0 0 0 No. of Writes: 1 43 72 Block Size: 32768b+ No. of Reads: 0 No. of Writes: 1 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.00 132.00 us 132.00 us 132.00 us 1 OPENDIR 0.00 70.00 us 46.00 us 94.00 us 2 FLUSH 0.00 175.00 us 175.00 us 175.00 us 1 XATTROP 0.00 185.00 us 106.00 us 264.00 us 2 STATFS 0.00 135.67 us 75.00 us 227.00 us 3 GETXATTR 0.00 489.00 us 489.00 us 489.00 us 1 CREATE 0.00 250.00 us 152.00 us 348.00 us 2 READDIR 0.00 153.25 us 102.00 us 177.00 us 4 OPEN 0.00 157.88 us 76.00 us 245.00 us 8 SETATTR 0.00 330.25 us 257.00 us 430.00 us 4 MKNOD 0.00 34.11 us 14.00 us 237.00 us 239 FINODELK 0.00 83.54 us 62.00 us 179.00 us 117 WRITE 0.00 99.28 us 42.00 us 298.00 us 234 FXATTROP 0.14 5310.09 us 127.00 us 13588.00 us 1156 READDIRP 2.55 22648341.40 us 74.00 us 113241113.00 us 5 ENTRYLK 97.31 13061.69 us 16.00 us 47524.00 us 330308 LOOKUP Duration: 133 seconds Data Read: 0 bytes Data Written: 1570968 bytes Interval 0 Stats: Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 0 0 0 No. of Writes: 1 43 72 Block Size: 32768b+ No. of Reads: 0 No. of Writes: 1 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.00 132.00 us 132.00 us 132.00 us 1 OPENDIR 0.00 70.00 us 46.00 us 94.00 us 2 FLUSH 0.00 175.00 us 175.00 us 175.00 us 1 XATTROP 0.00 185.00 us 106.00 us 264.00 us 2 STATFS 0.00 135.67 us 75.00 us 227.00 us 3 GETXATTR 0.00 489.00 us 489.00 us 489.00 us 1 CREATE 0.00 250.00 us 152.00 us 348.00 us 2 READDIR 0.00 153.25 us 102.00 us 177.00 us 4 OPEN 0.00 157.88 us 76.00 us 245.00 us 8 SETATTR 0.00 330.25 us 257.00 us 430.00 us 4 MKNOD 0.00 34.11 us 14.00 us 237.00 us 239 FINODELK 0.00 83.54 us 62.00 us 179.00 us 117 WRITE 0.00 99.28 us 42.00 us 298.00 us 234 FXATTROP 0.14 5310.09 us 127.00 us 13588.00 us 1156 READDIRP 2.55 22648341.40 us 74.00 us 113241113.00 us 5 ENTRYLK 97.31 13061.69 us 16.00 us 47524.00 us 330308 LOOKUP Duration: 133 seconds Data Read: 0 bytes Data Written: 1570968 bytes Any ideas? Thanks, Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/267c9dbe/attachment.html>
On 10/09/2013 11:22 AM, Pruner, Anne (Anne) wrote:> > I'm evaluating gluster for use in our product, and I want to ensure > that I understand the failover behavior. What I'm seeing isn't great, > but it doesn't look from the docs I've read that this is what everyone > else is experiencing. > > Is this normal? > > Setup: > > -one volume, distributed, replicated (2), with two bricks on two > different servers > > -35,000 files on volume, about 1MB each, all in one directory (I'm > open to changing this, if that's the problem. ls --l takes a /really/ > long time) > > -volume is mounted (mount --t gluster) on server 1 > > Procedure: > > -I stop glusterd and glusterfsd on server1, and send a few files to > the volume. This is fine. I can write and read the files. > > -I start glusterd on server1, and this starts glusterfsd. This > triggers self-heal. > > -Send a file to the server, and try to read it. > > -Sending takes a *couple of minutes*. Reading is immediate. > > -Once self-heal is done, subsequent sends and reads are immediate. > > I tried profiling this operation, and it seems like it's stuck on > locking the file: >[Profiling deleted]> > Any ideas? > > Thanks, > > Anne > >What I suspect is happening is those 35k files are all being checked for self-heal before the directory can be regarded as clean and ready to lock. An easy way to test this would be to try writing to a file in a nearly empty directory and see if you get the same results. If you are using a current kernel, or a EL kernel with current backports, mounting with use-readdirp=on will make directory reads faster. Not sure how much faster with 35k files though. Would be interested in finding out. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/0b2a09b6/attachment.html>
Toby Corkindale
2013-Oct-11 05:29 UTC
[Gluster-users] Can't access volume during self-healing
On 10/10/13 05:22, Pruner, Anne (Anne) wrote:> I?m evaluating gluster for use in our product, and I want to ensure that > I understand the failover behavior. What I?m seeing isn?t great, but it > doesn?t look from the docs I?ve read that this is what everyone else is > experiencing. > > Is this normal? > > Setup: > > -one volume, distributed, replicated (2), with two bricks on two > different servers > > -35,000 files on volume, about 1MB each, all in one directory (I?m open > to changing this, if that?s the problem. ls ?l takes a /really/ long time)I've posted to the list about this issue before actually. We had/have a similar requirement for storing a very large number of fairly small files, and originally had them all in just a few directories in glusterfs. It turns out that Glusterfs is really badly suited to directories with large numbers of files in them. If you can split them up, do so, and performance will become tolerable again. But even then it wasn't great.. Self-heal can swamp the network, making access for clients so slow as to cause problems. For your use case (wanting distributed, replicated storage for large numbers of 1mb files) I suggest you check out Riak and the Riak CS add-on. It's proven to be great for that particular use-case for us. -Toby