thr3ads.net - Gluster users - [Gluster-users] Can't access volume during self-healing [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Pruner, Anne (Anne)

2013-Oct-09 18:22 UTC

[Gluster-users] Can't access volume during self-healing

I'm evaluating gluster for use in our product, and I want to ensure that I
understand the failover behavior.  What I'm seeing isn't great, but it
doesn't look from the docs I've read that this is what everyone else is
experiencing.


Is this normal?

Setup:

-          one volume, distributed, replicated (2), with two bricks on two
different servers

-          35,000 files on volume, about 1MB each, all in one directory (I'm
open to changing this, if that's the problem.  ls -l takes a really long
time)

-          volume is mounted (mount -t gluster) on server 1

Procedure:

-          I stop glusterd and glusterfsd on server1, and send a few files to
the volume.  This is fine.  I can write and read the files.

-          I start glusterd on server1, and this starts glusterfsd.  This
triggers self-heal.

-          Send a file to the server, and try to read it.

-          Sending takes a couple of minutes.  Reading is immediate.

-          Once self-heal is done, subsequent sends and reads are immediate.

I tried profiling this operation, and it seems like it's stuck on locking
the file:


(server1 is uca-amm3.cnda.avaya.com. server2 is uc-amm4.cnda.avaya.com)

Brick: uc-amm4.cnda.avaya.com:/media/data/brick1
------------------------------------------------
Cumulative Stats:
   Block Size:               1024b+                4096b+                8192b+
 No. of Reads:                    0                     0                     0
No. of Writes:                  112                  1216                 38216

   Block Size:              16384b+               32768b+               65536b+
 No. of Reads:                    0                  1765                 12554
No. of Writes:               144493                 15648                  3032

   Block Size:             131072b+
 No. of Reads:                91441
No. of Writes:                  247
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us           5141      FORGET
      0.00       0.00 us       0.00 us       0.00 us          21270     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.00      61.00 us      61.00 us      61.00 us              1    GETXATTR
      0.00      62.00 us      62.00 us      62.00 us              1     OPENDIR
      0.00      36.86 us      27.00 us      74.00 us              7       FSTAT
      0.00      45.70 us      21.00 us      81.00 us             10       FLUSH
      0.00     123.00 us      83.00 us     135.00 us              5        OPEN
      0.00     118.29 us      41.00 us     315.00 us              7      STATFS
      0.00     419.60 us     266.00 us     539.00 us              5      CREATE
      0.00     422.69 us     118.00 us    2087.00 us             13     XATTROP
      0.00    1202.54 us      18.00 us   14631.00 us             13     ENTRYLK
      0.00     151.12 us      75.00 us     200.00 us            125        READ
      0.00      37.29 us      13.00 us    1232.00 us           1549    FINODELK
      0.00      80.78 us      43.00 us     151.00 us            762       WRITE
      0.00      74.75 us      40.00 us     371.00 us           1524    FXATTROP
      0.04    4004.48 us      95.00 us   17214.00 us           1156    READDIRP
     99.96   16538.92 us      58.00 us  976002.00 us         660602      LOOKUP
    Duration: 2820 seconds
   Data Read: 13651676656 bytes
Data Written: 3941825592 bytes
Interval 0 Stats:
   Block Size:               1024b+                4096b+                8192b+
 No. of Reads:                    0                     0                     0
No. of Writes:                  112                  1216                 38216

   Block Size:              16384b+               32768b+               65536b+
 No. of Reads:                    0                  1765                 12554
No. of Writes:               144493                 15648                  3032

   Block Size:             131072b+
 No. of Reads:                91441
No. of Writes:                  247
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us           5141      FORGET
      0.00       0.00 us       0.00 us       0.00 us          21270     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.00      61.00 us      61.00 us      61.00 us              1    GETXATTR
      0.00      62.00 us      62.00 us      62.00 us              1     OPENDIR
      0.00      36.86 us      27.00 us      74.00 us              7       FSTAT
      0.00      45.70 us      21.00 us      81.00 us             10       FLUSH
      0.00     123.00 us      83.00 us     135.00 us              5        OPEN
      0.00     118.29 us      41.00 us     315.00 us              7      STATFS
      0.00     419.60 us     266.00 us     539.00 us              5      CREATE
      0.00     422.69 us     118.00 us    2087.00 us             13     XATTROP
      0.00    1202.54 us      18.00 us   14631.00 us             13     ENTRYLK
      0.00     151.12 us      75.00 us     200.00 us            125        READ
      0.00      37.29 us      13.00 us    1232.00 us           1549    FINODELK
      0.00      80.78 us      43.00 us     151.00 us            762       WRITE
      0.00      74.75 us      40.00 us     371.00 us           1524    FXATTROP
      0.04    4004.48 us      95.00 us   17214.00 us           1156    READDIRP
     99.96   16538.92 us      58.00 us  976002.00 us         660602      LOOKUP
    Duration: 2820 seconds
   Data Read: 13651676656 bytes
Data Written: 3941825592 bytes
Brick: uca-amm3.cnda.avaya.com:/media/data/brick1
-------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                    1                    43                    72

   Block Size:              32768b+
 No. of Reads:                    0
No. of Writes:                    1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              5     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.00     132.00 us     132.00 us     132.00 us              1     OPENDIR
      0.00      70.00 us      46.00 us      94.00 us              2       FLUSH
      0.00     175.00 us     175.00 us     175.00 us              1     XATTROP
      0.00     185.00 us     106.00 us     264.00 us              2      STATFS
      0.00     135.67 us      75.00 us     227.00 us              3    GETXATTR
      0.00     489.00 us     489.00 us     489.00 us              1      CREATE
      0.00     250.00 us     152.00 us     348.00 us              2     READDIR
      0.00     153.25 us     102.00 us     177.00 us              4        OPEN
      0.00     157.88 us      76.00 us     245.00 us              8     SETATTR
      0.00     330.25 us     257.00 us     430.00 us              4       MKNOD
      0.00      34.11 us      14.00 us     237.00 us            239    FINODELK
      0.00      83.54 us      62.00 us     179.00 us            117       WRITE
      0.00      99.28 us      42.00 us     298.00 us            234    FXATTROP
      0.14    5310.09 us     127.00 us   13588.00 us           1156    READDIRP
      2.55 22648341.40 us      74.00 us 113241113.00 us              5    
ENTRYLK
     97.31   13061.69 us      16.00 us   47524.00 us         330308      LOOKUP
    Duration: 133 seconds
   Data Read: 0 bytes
Data Written: 1570968 bytes
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+
 No. of Reads:                    0                     0                     0
No. of Writes:                    1                    43                    72

   Block Size:              32768b+
 No. of Reads:                    0
No. of Writes:                    1
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              5     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.00     132.00 us     132.00 us     132.00 us              1     OPENDIR
      0.00      70.00 us      46.00 us      94.00 us              2       FLUSH
      0.00     175.00 us     175.00 us     175.00 us              1     XATTROP
      0.00     185.00 us     106.00 us     264.00 us              2      STATFS
      0.00     135.67 us      75.00 us     227.00 us              3    GETXATTR
      0.00     489.00 us     489.00 us     489.00 us              1      CREATE
      0.00     250.00 us     152.00 us     348.00 us              2     READDIR
      0.00     153.25 us     102.00 us     177.00 us              4        OPEN
      0.00     157.88 us      76.00 us     245.00 us              8     SETATTR
      0.00     330.25 us     257.00 us     430.00 us              4       MKNOD
      0.00      34.11 us      14.00 us     237.00 us            239    FINODELK
      0.00      83.54 us      62.00 us     179.00 us            117       WRITE
      0.00      99.28 us      42.00 us     298.00 us            234    FXATTROP
      0.14    5310.09 us     127.00 us   13588.00 us           1156    READDIRP
      2.55 22648341.40 us      74.00 us 113241113.00 us              5    
ENTRYLK
     97.31   13061.69 us      16.00 us   47524.00 us         330308      LOOKUP
    Duration: 133 seconds
   Data Read: 0 bytes
Data Written: 1570968 bytes



Any ideas?

Thanks,
Anne

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/267c9dbe/attachment.html>

Joe Julian

2013-Oct-09 19:28 UTC

head link

[Gluster-users] Can't access volume during self-healing

On 10/09/2013 11:22 AM, Pruner, Anne (Anne) wrote:>
> I'm evaluating gluster for use in our product, and I want to ensure 
> that I understand the failover behavior.  What I'm seeing isn't
great,
> but it doesn't look from the docs I've read that this is what
everyone
> else is experiencing.
>
> Is this normal?
>
> Setup:
>
> -one volume, distributed, replicated (2), with two bricks on two 
> different servers
>
> -35,000 files on volume, about 1MB each, all in one directory (I'm 
> open to changing this, if that's the problem.  ls --l takes a /really/ 
> long time)
>
> -volume is mounted (mount --t gluster) on server 1
>
> Procedure:
>
> -I stop glusterd and glusterfsd on server1, and send a few files to 
> the volume.  This is fine. I can write and read the files.
>
> -I start glusterd on server1, and this starts glusterfsd.  This 
> triggers self-heal.
>
> -Send a file to the server, and try to read it.
>
> -Sending takes a *couple of minutes*.  Reading is immediate.
>
> -Once self-heal is done, subsequent sends and reads are immediate.
>
> I tried profiling this operation, and it seems like it's stuck on 
> locking the file:
>
[Profiling deleted]>
> Any ideas?
>
> Thanks,
>
> Anne
>
>What I suspect is happening is those 35k files are all being checked for 
self-heal before the directory can be regarded as clean and ready to 
lock. An easy way to test this would be to try writing to a file in a 
nearly empty directory and see if you get the same results.

If you are using a current kernel, or a EL kernel with current 
backports, mounting with use-readdirp=on will make directory reads 
faster. Not sure how much faster with 35k files though. Would be 
interested in finding out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131009/0b2a09b6/attachment.html>

Toby Corkindale

2013-Oct-11 05:29 UTC

head link

[Gluster-users] Can't access volume during self-healing

On 10/10/13 05:22, Pruner, Anne (Anne) wrote:> I?m evaluating gluster for use in our product, and I want to ensure that
> I understand the failover behavior.  What I?m seeing isn?t great, but it
> doesn?t look from the docs I?ve read that this is what everyone else is
> experiencing.
>
> Is this normal?
>
> Setup:
>
> -one volume, distributed, replicated (2), with two bricks on two
> different servers
>
> -35,000 files on volume, about 1MB each, all in one directory (I?m open
> to changing this, if that?s the problem.  ls ?l takes a /really/ long time)

I've posted to the list about this issue before actually.
We had/have a similar requirement for storing a very large number of 
fairly small files, and originally had them all in just a few 
directories in glusterfs.
It turns out that Glusterfs is really badly suited to directories with 
large numbers of files in them. If you can split them up, do so, and 
performance will become tolerable again.

But even then it wasn't great.. Self-heal can swamp the network, making 
access for clients so slow as to cause problems.

For your use case (wanting distributed, replicated storage for large 
numbers of 1mb files) I suggest you check out Riak and the Riak CS 
add-on. It's proven to be great for that particular use-case for us.

-Toby

Gluster users - Oct 2013 - Can't access volume during self-healing

[Gluster-users] Can't access volume during self-healing

[Gluster-users] Can't access volume during self-healing

[Gluster-users] Can't access volume during self-healing