Mingfan Lu
2014-Jan-22 04:05 UTC
[Gluster-users] intresting issue of replication and self-heal
I have a volume (distribute-replica (*3)), today i found an interesting problem node22 node23 and node24 are the replica-7 from client A but the annoying thing is when I create dir or write file from client to replica-7, date;dd if=/dev/zero of=49 bs=1MB count=120 Wed Jan 22 11:51:41 CST 2014 120+0 records in 120+0 records out 120000000 bytes (120 MB) copied, 1.96257 s, 61.1 MB/s but I could only find node23 & node24 have the find --------------- node23,node24 --------------- /mnt/xfsd/test-volume/test/49 in clientA, I use find command I use another machine as client B, and mount the test volume (newly mounted) to run* find /mnt/xfsd/test-volume/test/49* from Client A, the three nodes have the file now. --------------- node22,node23.node24 --------------- /mnt/xfsd/test-volume/test/49 but in Client A, I delete the file /mnt/xfsd/test-volume/test/49, node22 still have the file in brick. --------------- node22 --------------- /mnt/xfsd/test-volume/test/49 but if i delete the new created files from Client B ) my question is why node22 have no newly created/write dirs/files? I have to use find to trigger the self-heal to fix that? from ClientA's log, I find something like: I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-test-volume-replicate-7: no active sinks for performing self-heal on file /test/49 It is harmless for it is information level? I also see something like: [2014-01-19 10:23:48.422757] E [afr-self-heal-entry.c:2376:afr_sh_post_nonblocking_entry_cbk] 0-test-volume-replicate-7: Non Blocking entrylks failed for /test/video/2014/01. [2014-01-19 10:23:48.423042] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-test-volume-replicate-7: background entry self-heal failed on /test/video/2014/01 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140122/d135b5f2/attachment.html>
Mingfan Lu
2014-Jan-23 09:04 UTC
[Gluster-users] intresting issue of replication and self-heal
I profiled node22, I found that most latency comes from setxattr, where
node23 & node22 comes from lookup and locks. any one could help?
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 2437540
FORGET
0.00 0.00 us 0.00 us 0.00 us 252684
RELEASE
0.00 0.00 us 0.00 us 0.00 us 2226292
RELEASEDIR
0.00 38.00 us 37.00 us 40.00 us 4
FGETXATTR
0.00 66.16 us 15.00 us 13139.00 us 596
GETXATTR
0.00 239.14 us 58.00 us 126477.00 us 1967
LINK
0.00 51.85 us 14.00 us 8298.00 us 19045
STAT
0.00 165.50 us 9.00 us 212057.00 us 20544
READDIR
0.00 1827.92 us 184.00 us 150298.00 us 2084
RENAME
0.00 49.14 us 12.00 us 5908.00 us 189019
STATFS
0.00 84.63 us 14.00 us 96016.00 us 163405
READ
0.00 29968.76 us 156.00 us 1073902.00 us 3115
CREATE
0.00 1340.25 us 6.00 us 7415357.00 us 248141
FLUSH
0.00 1616.76 us 32.00 us 13865122.00 us 229190
FTRUNCATE
0.01 1807.58 us 19.00 us 55480776.00 us 249569
OPEN
0.01 1875.11 us 10.00 us 8842171.00 us 465197
FSTAT
0.05 393296.28 us 52.00 us 56856581.00 us 9057
UNLINK
0.07 32291.01 us 192.00 us 9638107.00 us 156081
RMDIR
0.08 18339.18 us 140.00 us 5313885.00 us 337862
MKNOD
0.09 2904.39 us 18.00 us 51724741.00 us 2226290
OPENDIR
0.15 4708.15 us 27.00 us 55115760.00 us 2334864
SETXATTR
0.18 8965.91 us 68.00 us 26465968.00 us 1513280
FXATTROP
0.21 3465.29 us 74.00 us 58580783.00 us 4506602
XATTROP
0.28 4801.16 us 44.00 us 49643138.00 us 4436847
READDIRP
0.37 5935.92 us 7.00 us 56449083.00 us 4611760
ENTRYLK
1.02 4226.58 us 33.00 us 63494729.00 us 18092335
WRITE
1.50 2734.50 us 6.00 us 185109908.00 us 40971541
INODELK
4.75 348602.30 us 5.00 us 2185602946.00 us 1019332
FINODELK
* 14.98 33957.49 us 14.00 us 59261447.00 us 32998211
LOOKUP 26.30 807063.74 us 150.00 us 68086266.00 us
2438422 MKDIR 49.95 457402.30 us 20.00 us 67894186.00
us 8171751 SETATTR*
Duration: 353678 seconds
Data Read: 21110920120 bytes
Data Written: 2338403381483 bytes
here is node23
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 22125898
FORGET
0.00 0.00 us 0.00 us 0.00 us 89286732
RELEASE
0.00 0.00 us 0.00 us 0.00 us 32865496
RELEASEDIR
0.00 35.50 us 23.00 us 48.00 us 2
FGETXATTR
0.00 164.04 us 29.00 us 749181.00 us 39320
FTRUNCATE
0.00 483.71 us 8.00 us 2688755.00 us
39288 LK
0.00 419.61 us 48.00 us 2183971.00 us 274939
LINK
0.00 970.55 us 145.00 us 2471745.00 us 293435
RENAME
0.00 1346.63 us 35.00 us 4462970.00 us 243238
SETATTR
0.01 285.51 us 25.00 us 2588685.00 us 3459436
SETXATTR
0.03 323.11 us 5.00 us 2074581.00 us 6977304
READDIR
0.05 12200.60 us 84.00 us 3943421.00 us 287979
RMDIR
0.07 592.75 us 7.00 us 3592073.00 us 8129847
STAT
0.07 6938.50 us 49.00 us 3268036.00 us 705818
UNLINK
0.08 19468.78 us 149.00 us 3664022.00 us 276310
MKNOD
0.09 763.31 us 8.00 us 3396903.00 us 8731725
STATFS
0.09 1715.79 us 4.00 us 5626912.00 us 3902746
FLUSH
0.10 4614.74 us 9.00 us 5835691.00 us 1574923
FSTAT
0.10 1189.55 us 13.00 us 6043163.00 us 6129885
OPENDIR
0.10 19729.66 us 131.00 us 4112832.00 us 376286
CREATE
0.13 328.26 us 24.00 us 2410049.00 us 29091424
WRITE
0.20 2107.64 us 10.00 us 5765196.00 us 6675496
GETXATTR
0.28 5317.38 us 14.00 us 7549301.00 us 3798543
OPEN
0.71 7042.79 us 47.00 us 5848284.00 us 7125716
READDIRP
0.80 743.88 us 10.00 us 7979373.00 us 76781383
READ
0.93 1802.29 us 60.00 us 11040319.00 us 36501360
FXATTROP
1.76 36083.12 us 141.00 us 3548175.00 us 3458135
MKDIR
1.83 5046.35 us 70.00 us 8120221.00 us 25765615
XATTROP
11.74 12896.99 us 4.00 us 2141920969.00 us 64590600
FINODELK
15.43 11171.78 us 5.00 us 909115697.00 us 98040443
ENTRYLK
25.46 12945.21 us 5.00 us 110968164.00 us 139545956
INODELK
39.91 9656.48 us 10.00 us 8137517.00 us 293268060
LOOKUP
here is node24
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 22124594
FORGET
0.00 0.00 us 0.00 us 0.00 us 89290582
RELEASE
0.00 0.00 us 0.00 us 0.00 us 26657287
RELEASEDIR
0.00 47.00 us 47.00 us 47.00 us 1
FGETXATTR
0.00 308.67 us 8.00 us 1405672.00 us
39285 LK
0.00 745.82 us 32.00 us 1690066.00 us 86586
FTRUNCATE
0.00 388.58 us 49.00 us 1348668.00 us 274927
LINK
0.00 1008.11 us 158.00 us 2443763.00 us 293423
RENAME
0.01 1094.49 us 31.00 us 2857159.00 us 290615
SETATTR
0.02 304.24 us 24.00 us 2878581.00 us 3506688
SETXATTR
0.03 279.83 us 5.00 us 3716543.00 us 6977266
READDIR
0.05 10919.43 us 83.00 us 5075633.00 us 287979
RMDIR
0.05 692.45 us 12.00 us 3951452.00 us 4692109
OPENDIR
0.06 465.87 us 6.00 us 3726826.00 us 8238785
STAT
0.07 1187.15 us 14.00 us 5361516.00 us 3626802
GETXATTR
0.07 6308.14 us 50.00 us 4281153.00 us 705476
UNLINK
0.07 16729.47 us 148.00 us 3238674.00 us 276299
MKNOD
0.08 553.69 us 8.00 us 2721668.00 us 8744855
STATFS
0.09 1462.59 us 4.00 us 5488045.00 us 3903587
FLUSH
0.10 16979.85 us 130.00 us 3471136.00 us 376279
CREATE
0.12 4818.36 us 9.00 us 6101767.00 us 1577172
FSTAT
0.15 315.32 us 24.00 us 3801518.00 us 29090837
WRITE
0.19 2539.98 us 48.00 us 4657386.00 us 4586952
READDIRP
0.23 3794.04 us 15.00 us 6487700.00 us 3798788
OPEN
0.37 393.76 us 10.00 us 3284611.00 us 58491958
READ
0.88 1524.40 us 60.00 us 7456834.00 us 36097324
FXATTROP
1.63 4429.64 us 72.00 us 7194041.00 us 22984938
XATTROP
1.74 31485.11 us 143.00 us 4705647.00 us 3458000
MKDIR
2.08 2010.98 us 4.00 us 7669056.00 us 64626004
FINODELK
18.35 11708.39 us 4.00 us 7193745.00 us 98037767
ENTRYLK
31.62 14170.24 us 5.00 us 7194060.00 us 139544869
INODELK
41.94 9273.78 us 10.00 us 7193886.00 us 282853490
LOOKUP
On Wed, Jan 22, 2014 at 12:05 PM, Mingfan Lu <mingfan.lu at gmail.com>
wrote:
> I have a volume (distribute-replica (*3)), today i found an interesting
> problem
>
> node22 node23 and node24 are the replica-7 from client A
> but the annoying thing is when I create dir or write file from client to
> replica-7,
>
> date;dd if=/dev/zero of=49 bs=1MB count=120
> Wed Jan 22 11:51:41 CST 2014
> 120+0 records in
> 120+0 records out
> 120000000 bytes (120 MB) copied, 1.96257 s, 61.1 MB/s
>
> but I could only find node23 & node24 have the find
> ---------------
> node23,node24
> ---------------
> /mnt/xfsd/test-volume/test/49
>
> in clientA, I use find command
>
> I use another machine as client B, and mount the test volume (newly
> mounted)
> to run* find /mnt/xfsd/test-volume/test/49*
>
> from Client A, the three nodes have the file now.
>
> ---------------
> node22,node23.node24
> ---------------
> /mnt/xfsd/test-volume/test/49
>
> but in Client A, I delete the file /mnt/xfsd/test-volume/test/49, node22
> still have the file in brick.
>
> ---------------
> node22
> ---------------
> /mnt/xfsd/test-volume/test/49
>
> but if i delete the new created files from Client B )
> my question is why node22 have no newly created/write dirs/files? I have
> to use find to trigger the self-heal to fix that?
>
> from ClientA's log, I find something like:
>
> I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-test-volume-replicate-7:
> no active sinks for performing self-heal on file /test/49
>
> It is harmless for it is information level?
>
> I also see something like:
> [2014-01-19 10:23:48.422757] E
> [afr-self-heal-entry.c:2376:afr_sh_post_nonblocking_entry_cbk]
> 0-test-volume-replicate-7: Non Blocking entrylks failed for
> /test/video/2014/01.
> [2014-01-19 10:23:48.423042] E
> [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
> 0-test-volume-replicate-7: background entry self-heal failed on
> /test/video/2014/01
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140123/8fd7a25e/attachment.html>
Ted Miller
2014-Jan-27 18:43 UTC
[Gluster-users] intresting issue of replication and self-heal
On 1/21/2014 11:05 PM, Mingfan Lu wrote:> I have a volume (distribute-replica (*3)), today i found an interesting problem > > node22 node23 and node24 are the replica-7 from client A > but the annoying thing is when I create dir or write file from client to > replica-7, > > date;dd if=/dev/zero of=49 bs=1MB count=120 > Wed Jan 22 11:51:41 CST 2014 > 120+0 records in > 120+0 records out > 120000000 bytes (120 MB) copied, 1.96257 s, 61.1 MB/s > > but I could only find node23 & node24 have the find > --------------- > node23,node24 > --------------- > /mnt/xfsd/test-volume/test/49 > > in clientA, I use find command > > I use another machine as client B, and mount the test volume (newly mounted) > to run*find /mnt/xfsd/test-volume/test/49* > > from Client A, the three nodes have the file now. > > --------------- > node22,node23.node24 > --------------- > /mnt/xfsd/test-volume/test/49 > > but in Client A, I delete the file /mnt/xfsd/test-volume/test/49, node22 > still have the file in brick. > > --------------- > node22 > --------------- > /mnt/xfsd/test-volume/test/49 > > but if i delete the new created files from Client B ) > my question is why node22 have no newly created/write dirs/files? I have to > use find to trigger the self-heal to fix that? > > from ClientA's log, I find something like: > > I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-test-volume-replicate-7: no > active sinks for performing self-heal on file /test/49 > > It is harmless for it is information level? > > I also see something like: > [2014-01-19 10:23:48.422757] E > [afr-self-heal-entry.c:2376:afr_sh_post_nonblocking_entry_cbk] > 0-test-volume-replicate-7: Non Blocking entrylks failed for > /test/video/2014/01. > [2014-01-19 10:23:48.423042] E > [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] > 0-test-volume-replicate-7: background entry self-heal failed on > /test/video/2014/01From the paths you are listing, it looks like you may be mounting the bricks, not the gluster volume. You MUST mount the gluster volume, not the bricks that make up the volume. In your example, the mount looks like it is mounting the xfs volume. Your mount command should be something like: mount <host name>:test volume /mount/gluster/test-volume If a brick is part of a gluster volume, the brick must NEVER be written to directly. Yes, what you write MAY eventually be duplicated over to the other nodes, but if and when that happens is unpredictable. It will give the unpredictable replication results that you are seeing. The best way to test is to run "mount". If the line where you are mounting the gluster volume doesn't say "glusterfs" on it, you have it wrong. Also, the line you use in /etc/fstab must say "glusterfs", not "xfs" or "ext4". If you are in doubt, include the output of "mount" in your next email to the list. Ted Miller Elkhart, IN, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140127/aa3dfd75/attachment.html>