Pruner, Anne (Anne)
2013-Oct-09  18:22 UTC
[Gluster-users] Can't access volume during self-healing
I'm evaluating gluster for use in our product, and I want to ensure that I
understand the failover behavior.  What I'm seeing isn't great, but it
doesn't look from the docs I've read that this is what everyone else is
experiencing.
Is this normal?
Setup:
-          one volume, distributed, replicated (2), with two bricks on two
different servers
-          35,000 files on volume, about 1MB each, all in one directory (I'm
open to changing this, if that's the problem.  ls -l takes a really long
time)
-          volume is mounted (mount -t gluster) on server 1
Procedure:
-          I stop glusterd and glusterfsd on server1, and send a few files to
the volume.  This is fine.  I can write and read the files.
-          I start glusterd on server1, and this starts glusterfsd.  This
triggers self-heal.
-          Send a file to the server, and try to read it.
-          Sending takes a couple of minutes.  Reading is immediate.
-          Once self-heal is done, subsequent sends and reads are immediate.
I tried profiling this operation, and it seems like it's stuck on locking
the file:
(server1 is uca-amm3.cnda.avaya.com. server2 is uc-amm4.cnda.avaya.com)
Brick: uc-amm4.cnda.avaya.com:/media/data/brick1
------------------------------------------------
Cumulative Stats:
   Block Size:               1024b+                4096b+                8192b+
 No. of Reads:                    0                     0                     0
No. of Writes:                  112                  1216                 38216
   Block Size:              16384b+               32768b+               65536b+
 No. of Reads:                    0                  1765                 12554
No. of Writes:               144493                 15648                  3032
   Block Size:             131072b+
 No. of Reads:                91441
No. of Writes:                  247
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us           5141      FORGET
      0.00       0.00 us       0.00 us       0.00 us          21270     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.00      61.00 us      61.00 us      61.00 us              1    GETXATTR
      0.00      62.00 us      62.00 us      62.00 us              1     OPENDIR
      0.00      36.86 us      27.00 us      74.00 us              7       FSTAT
      0.00      45.70 us      21.00 us      81.00 us             10       FLUSH
      0.00     123.00 us      83.00 us     135.00 us              5        OPEN
      0.00     118.29 us      41.00 us     315.00 us              7      STATFS
      0.00     419.60 us     266.00 us     539.00 us              5      CREATE
      0.00     422.69 us     118.00 us    2087.00 us             13     XATTROP
      0.00    1202.54 us      18.00 us   14631.00 us             13     ENTRYLK
      0.00     151.12 us      75.00 us     200.00 us            125        READ
      0.00      37.29 us      13.00 us    1232.00 us           1549    FINODELK
      0.00      80.78 us      43.00 us     151.00 us            762       WRITE
      0.00      74.75 us      40.00 us     371.00 us           1524    FXATTROP
      0.04    4004.48 us      95.00 us   17214.00 us           1156    READDIRP
     99.96   16538.92 us      58.00 us  976002.00 us         660602      LOOKUP
    Duration: 2820 seconds
   Data Read: 13651676656 bytes
Data Written: 3941825592 bytes
Interval 0 Stats:
   Block Size:               1024b+                4096b+                8192b+
 No. of Reads:                    0                     0                     0
No. of Writes:                  112                  1216                 38216
   Block Size:              16384b+               32768b+               65536b+
 No. of Reads:                    0                  1765                 12554
No. of Writes:               144493                 15648                  3032
   Block Size:             131072b+
 No. of Reads:                91441
No. of Writes:                  247
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us           5141      FORGET
      0.00       0.00 us       0.00 us       0.00 us          21270     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.00      61.00 us      61.00 us      61.00 us              1    GETXATTR
      0.00      62.00 us      62.00 us      62.00 us              1     OPENDIR
      0.00      36.86 us      27.00 us      74.00 us              7       FSTAT
      0.00      45.70 us      21.00 us      81.00 us             10       FLUSH
      0.00     123.00 us      83.00 us     135.00 us              5        OPEN
      0.00     118.29 us      41.00 us     315.00 us              7      STATFS
      0.00     419.60 us     266.00 us     539.00 us              5      CREATE
      0.00     422.69 us     118.00 us    2087.00 us             13     XATTROP
      0.00    1202.54 us      18.00 us   14631.00 us             13     ENTRYLK
      0.00     151.12 us      75.00 us     200.00 us            125        READ
      0.00      37.29 us      13.00 us    1232.00 us           1549    FINODELK
      0.00      80.78 us      43.00 us     151.00 us            762       WRITE
      0.00      74.75 us      40.00 us     371.00 us           1524    FXATTROP
      0.04    4004.48 us      95.00 us   17214.00 us           1156    READDIRP
     99.96   16538.92 us      58.00 us  976002.00 us         660602      LOOKUP
    Duration: 2820 seconds
   Data Read: 13651676656 bytes
Data Written: 3941825592 bytes
Brick: uca-amm3.cnda.avaya.com:/media/data/brick1
-------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                    1                    43                    72
   Block Size:              32768b+
 No. of Reads:                    0
No. of Writes:                    1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              5     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.00     132.00 us     132.00 us     132.00 us              1     OPENDIR
      0.00      70.00 us      46.00 us      94.00 us              2       FLUSH
      0.00     175.00 us     175.00 us     175.00 us              1     XATTROP
      0.00     185.00 us     106.00 us     264.00 us              2      STATFS
      0.00     135.67 us      75.00 us     227.00 us              3    GETXATTR
      0.00     489.00 us     489.00 us     489.00 us              1      CREATE
      0.00     250.00 us     152.00 us     348.00 us              2     READDIR
      0.00     153.25 us     102.00 us     177.00 us              4        OPEN
      0.00     157.88 us      76.00 us     245.00 us              8     SETATTR
      0.00     330.25 us     257.00 us     430.00 us              4       MKNOD
      0.00      34.11 us      14.00 us     237.00 us            239    FINODELK
      0.00      83.54 us      62.00 us     179.00 us            117       WRITE
      0.00      99.28 us      42.00 us     298.00 us            234    FXATTROP
      0.14    5310.09 us     127.00 us   13588.00 us           1156    READDIRP
      2.55 22648341.40 us      74.00 us 113241113.00 us              5    
ENTRYLK
     97.31   13061.69 us      16.00 us   47524.00 us         330308      LOOKUP
    Duration: 133 seconds
   Data Read: 0 bytes
Data Written: 1570968 bytes
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                    1                    43                    72
   Block Size:              32768b+
 No. of Reads:                    0
No. of Writes:                    1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              5     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.00     132.00 us     132.00 us     132.00 us              1     OPENDIR
      0.00      70.00 us      46.00 us      94.00 us              2       FLUSH
      0.00     175.00 us     175.00 us     175.00 us              1     XATTROP
      0.00     185.00 us     106.00 us     264.00 us              2      STATFS
      0.00     135.67 us      75.00 us     227.00 us              3    GETXATTR
      0.00     489.00 us     489.00 us     489.00 us              1      CREATE
      0.00     250.00 us     152.00 us     348.00 us              2     READDIR
      0.00     153.25 us     102.00 us     177.00 us              4        OPEN
      0.00     157.88 us      76.00 us     245.00 us              8     SETATTR
      0.00     330.25 us     257.00 us     430.00 us              4       MKNOD
      0.00      34.11 us      14.00 us     237.00 us            239    FINODELK
      0.00      83.54 us      62.00 us     179.00 us            117       WRITE
      0.00      99.28 us      42.00 us     298.00 us            234    FXATTROP
      0.14    5310.09 us     127.00 us   13588.00 us           1156    READDIRP
      2.55 22648341.40 us      74.00 us 113241113.00 us              5    
ENTRYLK
     97.31   13061.69 us      16.00 us   47524.00 us         330308      LOOKUP
    Duration: 133 seconds
   Data Read: 0 bytes
Data Written: 1570968 bytes
Any ideas?
Thanks,
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/267c9dbe/attachment.html>
On 10/09/2013 11:22 AM, Pruner, Anne (Anne) wrote:> > I'm evaluating gluster for use in our product, and I want to ensure > that I understand the failover behavior. What I'm seeing isn't great, > but it doesn't look from the docs I've read that this is what everyone > else is experiencing. > > Is this normal? > > Setup: > > -one volume, distributed, replicated (2), with two bricks on two > different servers > > -35,000 files on volume, about 1MB each, all in one directory (I'm > open to changing this, if that's the problem. ls --l takes a /really/ > long time) > > -volume is mounted (mount --t gluster) on server 1 > > Procedure: > > -I stop glusterd and glusterfsd on server1, and send a few files to > the volume. This is fine. I can write and read the files. > > -I start glusterd on server1, and this starts glusterfsd. This > triggers self-heal. > > -Send a file to the server, and try to read it. > > -Sending takes a *couple of minutes*. Reading is immediate. > > -Once self-heal is done, subsequent sends and reads are immediate. > > I tried profiling this operation, and it seems like it's stuck on > locking the file: >[Profiling deleted]> > Any ideas? > > Thanks, > > Anne > >What I suspect is happening is those 35k files are all being checked for self-heal before the directory can be regarded as clean and ready to lock. An easy way to test this would be to try writing to a file in a nearly empty directory and see if you get the same results. If you are using a current kernel, or a EL kernel with current backports, mounting with use-readdirp=on will make directory reads faster. Not sure how much faster with 35k files though. Would be interested in finding out. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/0b2a09b6/attachment.html>
Toby Corkindale
2013-Oct-11  05:29 UTC
[Gluster-users] Can't access volume during self-healing
On 10/10/13 05:22, Pruner, Anne (Anne) wrote:> I?m evaluating gluster for use in our product, and I want to ensure that > I understand the failover behavior. What I?m seeing isn?t great, but it > doesn?t look from the docs I?ve read that this is what everyone else is > experiencing. > > Is this normal? > > Setup: > > -one volume, distributed, replicated (2), with two bricks on two > different servers > > -35,000 files on volume, about 1MB each, all in one directory (I?m open > to changing this, if that?s the problem. ls ?l takes a /really/ long time)I've posted to the list about this issue before actually. We had/have a similar requirement for storing a very large number of fairly small files, and originally had them all in just a few directories in glusterfs. It turns out that Glusterfs is really badly suited to directories with large numbers of files in them. If you can split them up, do so, and performance will become tolerable again. But even then it wasn't great.. Self-heal can swamp the network, making access for clients so slow as to cause problems. For your use case (wanting distributed, replicated storage for large numbers of 1mb files) I suggest you check out Riak and the Riak CS add-on. It's proven to be great for that particular use-case for us. -Toby