Mingfan Lu
2014-Jan-22 04:05 UTC
[Gluster-users] intresting issue of replication and self-heal
I have a volume (distribute-replica (*3)), today i found an interesting problem node22 node23 and node24 are the replica-7 from client A but the annoying thing is when I create dir or write file from client to replica-7, date;dd if=/dev/zero of=49 bs=1MB count=120 Wed Jan 22 11:51:41 CST 2014 120+0 records in 120+0 records out 120000000 bytes (120 MB) copied, 1.96257 s, 61.1 MB/s but I could only find node23 & node24 have the find --------------- node23,node24 --------------- /mnt/xfsd/test-volume/test/49 in clientA, I use find command I use another machine as client B, and mount the test volume (newly mounted) to run* find /mnt/xfsd/test-volume/test/49* from Client A, the three nodes have the file now. --------------- node22,node23.node24 --------------- /mnt/xfsd/test-volume/test/49 but in Client A, I delete the file /mnt/xfsd/test-volume/test/49, node22 still have the file in brick. --------------- node22 --------------- /mnt/xfsd/test-volume/test/49 but if i delete the new created files from Client B ) my question is why node22 have no newly created/write dirs/files? I have to use find to trigger the self-heal to fix that? from ClientA's log, I find something like: I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-test-volume-replicate-7: no active sinks for performing self-heal on file /test/49 It is harmless for it is information level? I also see something like: [2014-01-19 10:23:48.422757] E [afr-self-heal-entry.c:2376:afr_sh_post_nonblocking_entry_cbk] 0-test-volume-replicate-7: Non Blocking entrylks failed for /test/video/2014/01. [2014-01-19 10:23:48.423042] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-test-volume-replicate-7: background entry self-heal failed on /test/video/2014/01 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140122/d135b5f2/attachment.html>
Mingfan Lu
2014-Jan-23 09:04 UTC
[Gluster-users] intresting issue of replication and self-heal
I profiled node22, I found that most latency comes from setxattr, where node23 & node22 comes from lookup and locks. any one could help? %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2437540 FORGET 0.00 0.00 us 0.00 us 0.00 us 252684 RELEASE 0.00 0.00 us 0.00 us 0.00 us 2226292 RELEASEDIR 0.00 38.00 us 37.00 us 40.00 us 4 FGETXATTR 0.00 66.16 us 15.00 us 13139.00 us 596 GETXATTR 0.00 239.14 us 58.00 us 126477.00 us 1967 LINK 0.00 51.85 us 14.00 us 8298.00 us 19045 STAT 0.00 165.50 us 9.00 us 212057.00 us 20544 READDIR 0.00 1827.92 us 184.00 us 150298.00 us 2084 RENAME 0.00 49.14 us 12.00 us 5908.00 us 189019 STATFS 0.00 84.63 us 14.00 us 96016.00 us 163405 READ 0.00 29968.76 us 156.00 us 1073902.00 us 3115 CREATE 0.00 1340.25 us 6.00 us 7415357.00 us 248141 FLUSH 0.00 1616.76 us 32.00 us 13865122.00 us 229190 FTRUNCATE 0.01 1807.58 us 19.00 us 55480776.00 us 249569 OPEN 0.01 1875.11 us 10.00 us 8842171.00 us 465197 FSTAT 0.05 393296.28 us 52.00 us 56856581.00 us 9057 UNLINK 0.07 32291.01 us 192.00 us 9638107.00 us 156081 RMDIR 0.08 18339.18 us 140.00 us 5313885.00 us 337862 MKNOD 0.09 2904.39 us 18.00 us 51724741.00 us 2226290 OPENDIR 0.15 4708.15 us 27.00 us 55115760.00 us 2334864 SETXATTR 0.18 8965.91 us 68.00 us 26465968.00 us 1513280 FXATTROP 0.21 3465.29 us 74.00 us 58580783.00 us 4506602 XATTROP 0.28 4801.16 us 44.00 us 49643138.00 us 4436847 READDIRP 0.37 5935.92 us 7.00 us 56449083.00 us 4611760 ENTRYLK 1.02 4226.58 us 33.00 us 63494729.00 us 18092335 WRITE 1.50 2734.50 us 6.00 us 185109908.00 us 40971541 INODELK 4.75 348602.30 us 5.00 us 2185602946.00 us 1019332 FINODELK * 14.98 33957.49 us 14.00 us 59261447.00 us 32998211 LOOKUP 26.30 807063.74 us 150.00 us 68086266.00 us 2438422 MKDIR 49.95 457402.30 us 20.00 us 67894186.00 us 8171751 SETATTR* Duration: 353678 seconds Data Read: 21110920120 bytes Data Written: 2338403381483 bytes here is node23 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 22125898 FORGET 0.00 0.00 us 0.00 us 0.00 us 89286732 RELEASE 0.00 0.00 us 0.00 us 0.00 us 32865496 RELEASEDIR 0.00 35.50 us 23.00 us 48.00 us 2 FGETXATTR 0.00 164.04 us 29.00 us 749181.00 us 39320 FTRUNCATE 0.00 483.71 us 8.00 us 2688755.00 us 39288 LK 0.00 419.61 us 48.00 us 2183971.00 us 274939 LINK 0.00 970.55 us 145.00 us 2471745.00 us 293435 RENAME 0.00 1346.63 us 35.00 us 4462970.00 us 243238 SETATTR 0.01 285.51 us 25.00 us 2588685.00 us 3459436 SETXATTR 0.03 323.11 us 5.00 us 2074581.00 us 6977304 READDIR 0.05 12200.60 us 84.00 us 3943421.00 us 287979 RMDIR 0.07 592.75 us 7.00 us 3592073.00 us 8129847 STAT 0.07 6938.50 us 49.00 us 3268036.00 us 705818 UNLINK 0.08 19468.78 us 149.00 us 3664022.00 us 276310 MKNOD 0.09 763.31 us 8.00 us 3396903.00 us 8731725 STATFS 0.09 1715.79 us 4.00 us 5626912.00 us 3902746 FLUSH 0.10 4614.74 us 9.00 us 5835691.00 us 1574923 FSTAT 0.10 1189.55 us 13.00 us 6043163.00 us 6129885 OPENDIR 0.10 19729.66 us 131.00 us 4112832.00 us 376286 CREATE 0.13 328.26 us 24.00 us 2410049.00 us 29091424 WRITE 0.20 2107.64 us 10.00 us 5765196.00 us 6675496 GETXATTR 0.28 5317.38 us 14.00 us 7549301.00 us 3798543 OPEN 0.71 7042.79 us 47.00 us 5848284.00 us 7125716 READDIRP 0.80 743.88 us 10.00 us 7979373.00 us 76781383 READ 0.93 1802.29 us 60.00 us 11040319.00 us 36501360 FXATTROP 1.76 36083.12 us 141.00 us 3548175.00 us 3458135 MKDIR 1.83 5046.35 us 70.00 us 8120221.00 us 25765615 XATTROP 11.74 12896.99 us 4.00 us 2141920969.00 us 64590600 FINODELK 15.43 11171.78 us 5.00 us 909115697.00 us 98040443 ENTRYLK 25.46 12945.21 us 5.00 us 110968164.00 us 139545956 INODELK 39.91 9656.48 us 10.00 us 8137517.00 us 293268060 LOOKUP here is node24 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 22124594 FORGET 0.00 0.00 us 0.00 us 0.00 us 89290582 RELEASE 0.00 0.00 us 0.00 us 0.00 us 26657287 RELEASEDIR 0.00 47.00 us 47.00 us 47.00 us 1 FGETXATTR 0.00 308.67 us 8.00 us 1405672.00 us 39285 LK 0.00 745.82 us 32.00 us 1690066.00 us 86586 FTRUNCATE 0.00 388.58 us 49.00 us 1348668.00 us 274927 LINK 0.00 1008.11 us 158.00 us 2443763.00 us 293423 RENAME 0.01 1094.49 us 31.00 us 2857159.00 us 290615 SETATTR 0.02 304.24 us 24.00 us 2878581.00 us 3506688 SETXATTR 0.03 279.83 us 5.00 us 3716543.00 us 6977266 READDIR 0.05 10919.43 us 83.00 us 5075633.00 us 287979 RMDIR 0.05 692.45 us 12.00 us 3951452.00 us 4692109 OPENDIR 0.06 465.87 us 6.00 us 3726826.00 us 8238785 STAT 0.07 1187.15 us 14.00 us 5361516.00 us 3626802 GETXATTR 0.07 6308.14 us 50.00 us 4281153.00 us 705476 UNLINK 0.07 16729.47 us 148.00 us 3238674.00 us 276299 MKNOD 0.08 553.69 us 8.00 us 2721668.00 us 8744855 STATFS 0.09 1462.59 us 4.00 us 5488045.00 us 3903587 FLUSH 0.10 16979.85 us 130.00 us 3471136.00 us 376279 CREATE 0.12 4818.36 us 9.00 us 6101767.00 us 1577172 FSTAT 0.15 315.32 us 24.00 us 3801518.00 us 29090837 WRITE 0.19 2539.98 us 48.00 us 4657386.00 us 4586952 READDIRP 0.23 3794.04 us 15.00 us 6487700.00 us 3798788 OPEN 0.37 393.76 us 10.00 us 3284611.00 us 58491958 READ 0.88 1524.40 us 60.00 us 7456834.00 us 36097324 FXATTROP 1.63 4429.64 us 72.00 us 7194041.00 us 22984938 XATTROP 1.74 31485.11 us 143.00 us 4705647.00 us 3458000 MKDIR 2.08 2010.98 us 4.00 us 7669056.00 us 64626004 FINODELK 18.35 11708.39 us 4.00 us 7193745.00 us 98037767 ENTRYLK 31.62 14170.24 us 5.00 us 7194060.00 us 139544869 INODELK 41.94 9273.78 us 10.00 us 7193886.00 us 282853490 LOOKUP On Wed, Jan 22, 2014 at 12:05 PM, Mingfan Lu <mingfan.lu at gmail.com> wrote:> I have a volume (distribute-replica (*3)), today i found an interesting > problem > > node22 node23 and node24 are the replica-7 from client A > but the annoying thing is when I create dir or write file from client to > replica-7, > > date;dd if=/dev/zero of=49 bs=1MB count=120 > Wed Jan 22 11:51:41 CST 2014 > 120+0 records in > 120+0 records out > 120000000 bytes (120 MB) copied, 1.96257 s, 61.1 MB/s > > but I could only find node23 & node24 have the find > --------------- > node23,node24 > --------------- > /mnt/xfsd/test-volume/test/49 > > in clientA, I use find command > > I use another machine as client B, and mount the test volume (newly > mounted) > to run* find /mnt/xfsd/test-volume/test/49* > > from Client A, the three nodes have the file now. > > --------------- > node22,node23.node24 > --------------- > /mnt/xfsd/test-volume/test/49 > > but in Client A, I delete the file /mnt/xfsd/test-volume/test/49, node22 > still have the file in brick. > > --------------- > node22 > --------------- > /mnt/xfsd/test-volume/test/49 > > but if i delete the new created files from Client B ) > my question is why node22 have no newly created/write dirs/files? I have > to use find to trigger the self-heal to fix that? > > from ClientA's log, I find something like: > > I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-test-volume-replicate-7: > no active sinks for performing self-heal on file /test/49 > > It is harmless for it is information level? > > I also see something like: > [2014-01-19 10:23:48.422757] E > [afr-self-heal-entry.c:2376:afr_sh_post_nonblocking_entry_cbk] > 0-test-volume-replicate-7: Non Blocking entrylks failed for > /test/video/2014/01. > [2014-01-19 10:23:48.423042] E > [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] > 0-test-volume-replicate-7: background entry self-heal failed on > /test/video/2014/01 > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140123/8fd7a25e/attachment.html>
Ted Miller
2014-Jan-27 18:43 UTC
[Gluster-users] intresting issue of replication and self-heal
On 1/21/2014 11:05 PM, Mingfan Lu wrote:> I have a volume (distribute-replica (*3)), today i found an interesting problem > > node22 node23 and node24 are the replica-7 from client A > but the annoying thing is when I create dir or write file from client to > replica-7, > > date;dd if=/dev/zero of=49 bs=1MB count=120 > Wed Jan 22 11:51:41 CST 2014 > 120+0 records in > 120+0 records out > 120000000 bytes (120 MB) copied, 1.96257 s, 61.1 MB/s > > but I could only find node23 & node24 have the find > --------------- > node23,node24 > --------------- > /mnt/xfsd/test-volume/test/49 > > in clientA, I use find command > > I use another machine as client B, and mount the test volume (newly mounted) > to run*find /mnt/xfsd/test-volume/test/49* > > from Client A, the three nodes have the file now. > > --------------- > node22,node23.node24 > --------------- > /mnt/xfsd/test-volume/test/49 > > but in Client A, I delete the file /mnt/xfsd/test-volume/test/49, node22 > still have the file in brick. > > --------------- > node22 > --------------- > /mnt/xfsd/test-volume/test/49 > > but if i delete the new created files from Client B ) > my question is why node22 have no newly created/write dirs/files? I have to > use find to trigger the self-heal to fix that? > > from ClientA's log, I find something like: > > I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-test-volume-replicate-7: no > active sinks for performing self-heal on file /test/49 > > It is harmless for it is information level? > > I also see something like: > [2014-01-19 10:23:48.422757] E > [afr-self-heal-entry.c:2376:afr_sh_post_nonblocking_entry_cbk] > 0-test-volume-replicate-7: Non Blocking entrylks failed for > /test/video/2014/01. > [2014-01-19 10:23:48.423042] E > [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] > 0-test-volume-replicate-7: background entry self-heal failed on > /test/video/2014/01From the paths you are listing, it looks like you may be mounting the bricks, not the gluster volume. You MUST mount the gluster volume, not the bricks that make up the volume. In your example, the mount looks like it is mounting the xfs volume. Your mount command should be something like: mount <host name>:test volume /mount/gluster/test-volume If a brick is part of a gluster volume, the brick must NEVER be written to directly. Yes, what you write MAY eventually be duplicated over to the other nodes, but if and when that happens is unpredictable. It will give the unpredictable replication results that you are seeing. The best way to test is to run "mount". If the line where you are mounting the gluster volume doesn't say "glusterfs" on it, you have it wrong. Also, the line you use in /etc/fstab must say "glusterfs", not "xfs" or "ext4". If you are in doubt, include the output of "mount" in your next email to the list. Ted Miller Elkhart, IN, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140127/aa3dfd75/attachment.html>