Khoi Mai
2013-Dec-11 04:49 UTC
[Gluster-users] what action is required for this log entry?
Gluster community, [2013-12-11 04:40:06.609091] W [server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid (76240621-1362-494d-a70a-f5824c3ce56e) is not found. anonymous fd creation failed [2013-12-11 04:40:06.610588] W [server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid (03ada1a2-ee51-4c85-a79f-a72aabde116d) is not found. anonymous fd creation failed [2013-12-11 04:40:06.616978] W [server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid (64fbc834-e00b-4afd-800e-97d64a32de92) is not found. anonymous fd creation failed [2013-12-11 04:40:06.617069] W [server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid (64fbc834-e00b-4afd-800e-97d64a32de92) is not found. anonymous fd creation failed [2013-12-11 04:40:06.624845] W [server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid (27837527-5dea-4367-a050-248a6266b2db) is not found. anonymous fd creation failed followed by [2013-12-11 04:40:10.462202] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:40:29.331476] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:40:53.125088] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:41:00.975222] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:41:01.517990] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node Tue Dec 10 22:41:01 CST 2013 [2013-12-11 04:41:05.874819] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:41:05.878135] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node Tue Dec 10 22:42:01 CST 2013 [2013-12-11 04:42:05.136054] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:42:05.330591] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node [2013-12-11 04:42:41.224927] W [marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker: cannot add a new contribution node Please help me understand what is being logged from the /var/log/glusterfs/bricks/static-content.log file Here is my config for this particular brick in a 4 node distr/rep design. cat /var/lib/glusterd/vols/devstatic/devstatic.host2.static-content.vol volume devstatic-posix type storage/posix option volume-id 75832afb-f20e-4018-8d74-8550a92233fc option directory /static/content end-volume volume devstatic-access-control type features/access-control subvolumes devstatic-posix end-volume volume devstatic-locks type features/locks subvolumes devstatic-access-control end-volume volume devstatic-io-threads type performance/io-threads subvolumes devstatic-locks end-volume volume devstatic-index type features/index option index-base /static/content/.glusterfs/indices subvolumes devstatic-io-threads end-volume volume devstatic-marker type features/marker option quota on option xtime off option timestamp-file /var/lib/glusterd/vols/devstatic/marker.tstamp option volume-uuid 75832afb-f20e-4018-8d74-8550a92233fc subvolumes devstatic-index end-volume volume /static/content type debug/io-stats option count-fop-hits off option latency-measurement off subvolumes devstatic-marker end-volume volume devstatic-server type protocol/server option auth.addr./static/content.allow * option auth.login.6173ce00-d694-4793-a755-cd1d80f5001f.password 13702989-510c-44c1-9bc4-8f1f21b65403 option auth.login./static/content.allow 6173ce00-d694-4793-a755-cd1d80f5001f option transport-type tcp subvolumes /static/content end-volume Khoi Mai From: gluster-users-request at gluster.org To: gluster-users at gluster.org Date: 12/10/2013 05:58 AM Subject: Gluster-users Digest, Vol 68, Issue 11 Sent by: gluster-users-bounces at gluster.org Send Gluster-users mailing list submissions to gluster-users at gluster.org To subscribe or unsubscribe via the World Wide Web, visit http://supercolony.gluster.org/mailman/listinfo/gluster-users or, via email, send a message with subject or body 'help' to gluster-users-request at gluster.org You can reach the person managing the list at gluster-users-owner at gluster.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Gluster-users digest..." Today's Topics: 1. Re: Testing failover and recovery (Per Hallsmark) 2. Gluster - replica - Unable to self-heal contents of '/' (possible split-brain) (Alexandru Coseru) 3. Gluster infrastructure question (Heiko Kr?mer) 4. Re: How reliable is XFS under Gluster? (Kal Black) 5. Re: Gluster infrastructure question (Nux!) 6. Scalability - File system or Object Store (Randy Breunling) 7. Re: Scalability - File system or Object Store (Jay Vyas) 8. Re: Gluster infrastructure question (Joe Julian) 9. Re: [Gluster-devel] GlusterFest Test Weekend - 3.5 Test #1 (John Mark Walker) 10. Re: Gluster infrastructure question (Nux!) 11. compatibility between 3.3 and 3.4 (samuel) 12. Re: Gluster infrastructure question (bernhard glomm) 13. Re: Gluster infrastructure question (Ben Turner) 14. Re: Gluster infrastructure question (Ben Turner) 15. Re: Scalability - File system or Object Store (Jeff Darcy) 16. Re: Gluster infrastructure question (Dan Mons) 17. Re: Gluster infrastructure question (Joe Julian) 18. Re: Gluster infrastructure question (Dan Mons) 19. Re: [CentOS 6] Upgrade to the glusterfs version in base or in glusterfs-epel (Diep Pham Van) 20. Where does the 'date' string in '/var/log/glusterfs/gl.log' come from? (harry mangalam) 21. Re: Where does the 'date' string in '/var/log/glusterfs/gl.log' come from? (Sharuzzaman Ahmat Raslan) 22. FW: Self Heal Issue GlusterFS 3.3.1 (Bobby Jacob) 23. Re: Self Heal Issue GlusterFS 3.3.1 (Joe Julian) 24. Pausing rebalance (Franco Broi) 25. Re: Where does the 'date' string in '/var/log/glusterfs/gl.log' come from? (Vijay Bellur) 26. Re: Pausing rebalance (shishir gowda) 27. Re: replace-brick failing - transport.address-family not specified (Vijay Bellur) 28. Re: [CentOS 6] Upgrade to the glusterfs version in base or in glusterfs-epel (Vijay Bellur) 29. Re: Pausing rebalance (Franco Broi) 30. Re: replace-brick failing - transport.address-family not specified (Vijay Bellur) 31. Re: Pausing rebalance (Kaushal M) 32. Re: Pausing rebalance (Franco Broi) 33. Re: Self Heal Issue GlusterFS 3.3.1 (Bobby Jacob) 34. Structure needs cleaning on some files (Johan Huysmans) 35. Re: replace-brick failing - transport.address-family not specified (Bernhard Glomm) 36. Re: Structure needs cleaning on some files (Johan Huysmans) 37. Re: Gluster infrastructure question (Heiko Kr?mer) 38. Re: Errors from PHP stat() on files and directories in a glusterfs mount (Johan Huysmans) 39. Re: Gluster infrastructure question (Andrew Lau) 40. Re: replace-brick failing - transport.address-family not specified (Vijay Bellur) 41. Re: Gluster - replica - Unable to self-heal contents of '/' (possible split-brain) (Vijay Bellur) 42. Error after crash of Virtual Machine during migration (Mariusz Sobisiak) 43. Re: Structure needs cleaning on some files (Johan Huysmans) ---------------------------------------------------------------------- Message: 1 Date: Mon, 9 Dec 2013 14:12:22 +0100 From: Per Hallsmark <per at hallsmark.se> To: gluster-users at gluster.org Subject: Re: [Gluster-users] Testing failover and recovery Message-ID: <CAPaVuL-DL8R3GBNzv9fMJq-rTOYCs=NufTf-B5V7xKpoNML+7Q at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hello, Interesting, we seems to be several users with issues regarding recovery but there is no to little replies... ;-) I did some more testing over the weekend. Same initial workload (two glusterfs servers, one client that continuesly updates a file with timestamps) and then two easy testcases: 1. one of the glusterfs servers is constantly rebooting (just a initscript that sleeps for 60 seconds before issuing "reboot") 2. similar to 1 but instead of rebooting itself, it is rebooting the other glusterfs server so that the result is that they a server comes up, wait for a bit and then rebooting the other server. During the whole weekend this has progressed nicely. The client is running all the time without issues and the glusterfs that comes back (either only one or one of the servers, depending on the testcase shown above) is actively getting into sync and updates it's copy of the file. So it seems to me that we need to look deeper in the recovery case (of course, but it is interesting to know about the nice&easy usescases as well). I'm surprised that the recovery from a failover (to restore the rendundancy) isn't getting higher attention here. Are we (and others that has difficulties in this area) running a unusual usecase? BR, Per On Wed, Dec 4, 2013 at 12:17 PM, Per Hallsmark <per at hallsmark.se> wrote:> Hello, > > I've found GlusterFS to be an interesting project. Not so muchexperience> of it > (although from similar usecases with DRBD+NFS setups) so I setup some > testcase to try out failover and recovery. > > For this I have a setup with two glusterfs servers (each is a VM) andone> client (also a VM). > I'm using GlusterFS 3.4 btw. > > The servers manages a gluster volume created as: > > gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick > gs2:/export/vda1/brick > gluster volume start testvol > gluster volume set testvol network.ping-timeout 5 > > Then the client mounts this volume as: > mount -t glusterfs gs1:/testvol /import/testvol > > Everything seems to work good in normal usecases, I can write/read tothe> volume, take servers down and up again etc. > > As a fault scenario, I'm testing a fault injection like this: > > 1. continuesly writing timestamps to a file on the volume from theclient.> It is automated in a smaller testscript like: > :~/glusterfs-test$ cat scripts/test-gfs-client.sh > #!/bin/sh > > gfs=/import/testvol > > while true; do > date +%s >> $gfs/timestamp.txt > ts=`tail -1 $gfs/timestamp.txt` > md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d" "` > echo "Timestamp = $ts, md5sum = $md5sum" > sleep 1 > done > :~/glusterfs-test$ > > As can be seen, the client is a quite simple user of the glusterfsvolume.> Low datarate and single user for example. > > > 2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate > like a broken network > > 3. After a short while, the failed server is brought alive again(ifconfig> eth0 up) > > Step 2 and 3 is also automated in a testscript like: > > :~/glusterfs-test$ cat scripts/fault-injection.sh > #!/bin/sh > > # fault injection script tailored for two glusterfs nodes named gs1 andgs2> > if [ "$HOSTNAME" == "gs1" ]; then > peer="gs2" > else > peer="gs1" > fi > > inject_eth_fault() { > echo "network down..." > ifconfig eth0 down > sleep 10 > ifconfig eth0 up > echo "... and network up again." > } > > recover() { > echo "recovering from fault..." > service glusterd restart > } > > while true; do > sleep 60 > if [ ! -f /tmp/nofault ]; then > if ping -c 1 $peer; then > inject_eth_fault > recover > fi > fi > done > :~/glusterfs-test$ > > > I then see that: > > A. This goes well first time, one server leaves the cluster and theclient> hang for like 8 seconds before beeing able to write to the volume again. > > B. When the failed server comes back, I can check that from both servers > they see each other and "gluster peer status" shows they believe theother> is in connected state. > > C. When the failed server comes back, it is not automatically seeking > active participation on syncing volume etc (the local storage timestamp > file isn't updated). > > D. If I do restart of glusterd service (service glusterd restart) the > failed node seems to get back like it was before. Not always though...The> chance is higher if I have long time between fault injections (long = 60 > sec or so, with a forced faulty state of 10 sec) > With a period time of some minutes, I could have the cluster servicingthe> client OK for up to 8+ hours at least. > Shortening the period, I'm easily down to like 10-15 minutes. > > E. Sooner or later I enter a state where the two servers seems to be up, > seeing it's peer (gluster peer status) and such but none is serving the > volume to the client. > I've tried to "heal" the volume in different way but it doesn't help. > Sometimes it is just that one of the timestamp copies in each of > the servers is ahead which is simpler but sometimes both the timestamp > files have added data at end that the other doesnt have. > > To the questions: > > * Is it so that from a design point of perspective, the choice in the > glusterfs team is that one shouldn't rely soley on glusterfs daemonsbeeing> able to recover from a faulty state? There is need for cluster manager > services (like heartbeat for example) to be part? That would make > experience C understandable and one could then take heartbeat or similar > packages to start/stop services. > > * What would then be the recommended procedure to recover from a faulty > glusterfs node? (so that experience D and E is not happening) > > * What is the expected failover timing (of course depending on config,but> say with a give ping timeout etc)? > and expected recovery timing (with similar dependency on config)? > > * What/how is glusterfs team testing to make sure that the failover, > recovery/healing functionality etc works? > > Any opinion if the testcase is bad is of course also very welcome. > > Best regards, > Per >-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/69c23114/attachment-0001.html>------------------------------ Message: 2 Date: Mon, 9 Dec 2013 15:51:31 +0200 From: "Alexandru Coseru" <alex.coseru at simplus.ro> To: <gluster-users at gluster.org> Subject: [Gluster-users] Gluster - replica - Unable to self-heal contents of '/' (possible split-brain) Message-ID: <01fe01cef4e5$c3f2cb00$4bd86100$@coseru at simplus.ro> Content-Type: text/plain; charset="us-ascii" Hello, I'm trying to build a replica volume, on two servers. The servers are: blade6 and blade7. (another blade1 in the peer, but with no volumes) The volume seems ok, but I cannot mount it from NFS. Here are some logs: [root at blade6 stor1]# df -h /dev/mapper/gluster_stor1 882G 200M 837G 1% /gluster/stor1 [root at blade7 stor1]# df -h /dev/mapper/gluster_fast 846G 158G 646G 20% /gluster/stor_fast /dev/mapper/gluster_stor1 882G 72M 837G 1% /gluster/stor1 [root at blade6 stor1]# pwd /gluster/stor1 [root at blade6 stor1]# ls -lh total 0 [root at blade7 stor1]# pwd /gluster/stor1 [root at blade7 stor1]# ls -lh total 0 [root at blade6 stor1]# gluster volume info Volume Name: stor_fast Type: Distribute Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: blade7.xen:/gluster/stor_fast Options Reconfigured: nfs.port: 2049 Volume Name: stor1 Type: Replicate Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: blade7.xen:/gluster/stor1 Brick2: blade6.xen:/gluster/stor1 Options Reconfigured: nfs.port: 2049 [root at blade7 stor1]# gluster volume info Volume Name: stor_fast Type: Distribute Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: blade7.xen:/gluster/stor_fast Options Reconfigured: nfs.port: 2049 Volume Name: stor1 Type: Replicate Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: blade7.xen:/gluster/stor1 Brick2: blade6.xen:/gluster/stor1 Options Reconfigured: nfs.port: 2049 [root at blade6 stor1]# gluster volume status Status of volume: stor_fast Gluster process Port Online Pid ---------------------------------------------------------------------------- -- Brick blade7.xen:/gluster/stor_fast 49152 Y 1742 NFS Server on localhost 2049 Y 20074 NFS Server on blade1.xen 2049 Y 22255 NFS Server on blade7.xen 2049 Y 7574 There are no active volume tasks Status of volume: stor1 Gluster process Port Online Pid ---------------------------------------------------------------------------- -- Brick blade7.xen:/gluster/stor1 49154 Y 7562 Brick blade6.xen:/gluster/stor1 49154 Y 20053 NFS Server on localhost 2049 Y 20074 Self-heal Daemon on localhost N/A Y 20079 NFS Server on blade1.xen 2049 Y 22255 Self-heal Daemon on blade1.xen N/A Y 22260 NFS Server on blade7.xen 2049 Y 7574 Self-heal Daemon on blade7.xen N/A Y 7578 There are no active volume tasks [root at blade7 stor1]# gluster volume status Status of volume: stor_fast Gluster process Port Online Pid ---------------------------------------------------------------------------- -- Brick blade7.xen:/gluster/stor_fast 49152 Y 1742 NFS Server on localhost 2049 Y 7574 NFS Server on blade6.xen 2049 Y 20074 NFS Server on blade1.xen 2049 Y 22255 There are no active volume tasks Status of volume: stor1 Gluster process Port Online Pid ---------------------------------------------------------------------------- -- Brick blade7.xen:/gluster/stor1 49154 Y 7562 Brick blade6.xen:/gluster/stor1 49154 Y 20053 NFS Server on localhost 2049 Y 7574 Self-heal Daemon on localhost N/A Y 7578 NFS Server on blade1.xen 2049 Y 22255 Self-heal Daemon on blade1.xen N/A Y 22260 NFS Server on blade6.xen 2049 Y 20074 Self-heal Daemon on blade6.xen N/A Y 20079 There are no active volume tasks [root at blade6 stor1]# gluster peer status Number of Peers: 2 Hostname: blade1.xen Port: 24007 Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b State: Peer in Cluster (Connected) Hostname: blade7.xen Port: 24007 Uuid: 574eb256-30d2-4639-803e-73d905835139 State: Peer in Cluster (Connected) [root at blade7 stor1]# gluster peer status Number of Peers: 2 Hostname: blade6.xen Port: 24007 Uuid: a65cadad-ef79-4821-be41-5649fb204f3e State: Peer in Cluster (Connected) Hostname: blade1.xen Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b State: Peer in Cluster (Connected) [root at blade6 stor1]# gluster volume heal stor1 info Gathering Heal info on volume stor1 has been successful Brick blade7.xen:/gluster/stor1 Number of entries: 0 Brick blade6.xen:/gluster/stor1 Number of entries: 0 [root at blade7 stor1]# gluster volume heal stor1 info Gathering Heal info on volume stor1 has been successful Brick blade7.xen:/gluster/stor1 Number of entries: 0 Brick blade6.xen:/gluster/stor1 Number of entries: 0 When I'm trying to mount the volume with NFS, I have the following errors: [2013-12-09 13:20:52.066978] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2013-12-09 13:20:52.067386] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-stor1-replicate-0: background meta-data self-heal failed on / [2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk] 0-nfs: error=Input/output error [2013-12-09 13:20:53.092039] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2013-12-09 13:20:53.092497] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-stor1-replicate-0: background meta-data self-heal failed on / [2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk] 0-nfs: error=Input/output error What I'm doing wrong ? PS: Volume stor_fast works like a charm. Best Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/b0b21677/attachment-0001.html>------------------------------ Message: 3 Date: Mon, 09 Dec 2013 14:18:28 +0100 From: Heiko Kr?mer <hkraemer at anynines.de> To: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: [Gluster-users] Gluster infrastructure question Message-ID: <52A5C324.4090408 at anynines.de> Content-Type: text/plain; charset="iso-8859-1" -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Heyho guys, I'm running since years glusterfs in a small environment without big problems. Now I'm going to use glusterFS for a bigger cluster but I've some questions :) Environment: * 4 Servers * 20 x 2TB HDD, each * Raidcontroller * Raid 10 * 4x bricks => Replicated, Distributed volume * Gluster 3.4 1) I'm asking me, if I can delete the raid10 on each server and create for each HDD a separate brick. In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there any experience about the write throughput in a production system with many of bricks like in this case? In addition i'll get double of HDD capacity. 2) I've heard a talk about glusterFS and out scaling. The main point was if more bricks are in use, the scale out process will take a long time. The problem was/is the Hash-Algo. So I'm asking me how is it if I've one very big brick (Raid10 20TB on each server) or I've much more bricks, what's faster and is there any issues? Is there any experiences ? 3) Failover of a HDD is for a raid controller with HotSpare HDD not a big deal. Glusterfs will rebuild automatically if a brick fails and there are no data present, this action will perform a lot of network traffic between the mirror bricks but it will handle it equal as the raid controller right ? Thanks and cheers Heiko - -- Anynines.com Avarteq GmbH B.Sc. Informatik Heiko Kr?mer CIO Twitter: @anynines - ---- Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 Sitz: Saarbr?cken -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=bDly -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: hkraemer.vcf Type: text/x-vcard Size: 277 bytes Desc: not available URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/d70112ef/attachment-0001.vcf>------------------------------ Message: 4 Date: Mon, 9 Dec 2013 09:51:41 -0500 From: Kal Black <kaloblak at gmail.com> To: Paul Robert Marino <prmarino1 at gmail.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] How reliable is XFS under Gluster? Message-ID: <CADZk1LMcRjn=qG-mWbc5S8SeJtkFB2AZica2NKuU3Z7mwQ=2kQ at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Thank you all for the wonderful input, I haven't used extensively XFS so far and my concerns primarily came from reading an article (mostly the discussion after it) by Jonathan Corbetrom on LWN (http://lwn.net/Articles/476263/) and another one http://toruonu.blogspot.ca/2012/12/xfs-vs-ext4.html. They are both relatively recent and I was under the impression the XFS still has problems, in certain cases of power loss, where the metadata and the actual data are not being in sync, which might lead existing data being corrupted. But again, like Paul Robert Marino pointed out, choosing a right IO scheduler might greatly reduce the risk of this to happen. On Sun, Dec 8, 2013 at 11:04 AM, Paul Robert Marino <prmarino1 at gmail.com>wrote:> XFS is fine Ive been using it on various distros in production for > over a decade now and I've rarely had any problems with it and when I > have they have been trivial to fix which is something I honestly cant > say about ext3 or ext4. > > Usually when there is a power failure during a write if the > transaction wasn't completely committed to the disk it is rolled back > via the journal.the one exception to this is when you have a battery > backed cache where the battery discharges before power is restored, or > a very cheap consumer grade disk which uses its cache for writes and > lies about the sync state. > in either of these scenarios any file system will have problems. > > Out of any of the filesystems Ive worked with in general XFS handles > the battery discharge senario the cleanest and is the easiest to > recover. > if you have the second scenario with the cheap disks with a cache that > lies nothing will help you not even a fsync because the hardware lies. > Also the subject of fsync is a little more complicated than most > people think there are several kinds of fsync and each behaves > differently on different filesystems. PostgreSQL has documentation > about it here > http://www.postgresql.org/docs/9.1/static/runtime-config-wal.html > looks at wal_sync_method if you would like to have a better about how > fsync works without getting too deep into the subject. > > By the way most apps don't need to do fsyncs and it would bring your > system to a crawl if they all did so take people saying > all programs should fsync with a grain of salt. > > In most cases when these problems come up its really that they didn't > set the right IO scheduler for what the server does. For example CFQ > which is the EL default can leave your write in ram cache for quite a > while before sending it to disk in an attempt to optimize your IO; > however the deadline scheduler will attempt to optimize your IO but > will predictably sync it to disk after a period of time regardless of > whether it was able to fully optimize it or not. Also there is noop > which does no optimization at all and leaves every thing to the > hardware, this is common and recommended for VM's and there is some > argument to use it with high end raid controllers for things like > financial data where you need to absolutely ensure the write happen > ASAP because there may be fines or other large penalties if you loose > any data. > > > > On Sat, Dec 7, 2013 at 3:04 AM, Franco Broi <Franco.Broi at iongeo.com> > wrote: > > Been using ZFS for about 9 months and am about to add as other 400TB,no> > issues so far. > > > > On 7 Dec 2013 04:23, Brian Foster <bfoster at redhat.com> wrote: > > On 12/06/2013 01:57 PM, Kal Black wrote: > >> Hello, > >> I am in the point of picking up a FS for new brick nodes. I was usedto> >> like and use ext4 until now but I recently red for an issueintroduced> by > >> a > >> patch in ext4 that breaks the distributed translator. In the sametime,> it > >> looks like the recommended FS for a brick is no longer ext4 but XFS > which > >> apparently will also be the default FS in the upcoming RedHat7. Onthe> >> other hand, XFS is being known as a file system that can be easily > >> corrupted (zeroing files) in case of a power failure. Supporters ofthe> >> file system claim that this should never happen if an application has > been > >> properly coded (properly committing/fsync-ing data to storage) andthe> >> storage itself has been properly configured (disk cash disabled on > >> individual disks and battery backed cache used on the controllers).My> >> question is, should I be worried about losing data in a power failureor> >> similar scenarios (or any) using GlusterFS and XFS? Are there best > >> practices for setting up a Gluster brick + XFS? Has the ext4 issuebeen> >> reliably fixed? (my understanding is that this will be impossibleunless> >> ext4 isn't being modified to allow popper work with Gluster) > >> > > > > Hi Kal, > > > > You are correct in that Red Hat recommends using XFS for glusterbricks.> > I'm sure there are plenty of ext4 (and other fs) users as well, soother> > users should chime in as far as real experiences with various brick > > filesystems goes. Also, I believe the dht/ext issue has been resolved > > for some time now. > > > > With regard to "XFS zeroing files on power failure," I'd suggest you > > check out the following blog post: > > > > >http://sandeen.net/wordpress/computers/xfs-does-not-null-files-and-requires-no-flux/> > > > My cursory understanding is that there were apparently situationswhere> > the inode size of a recently extended file would be written to the log > > before the actual extending data is written to disk, thus creating a > > crash window where the updated size would be seen, but not the actual > > data. In other words, this isn't a "zeroing files" behavior in as much > > as it is an ordering issue with logging the inode size. This isprobably> > why you've encountered references to fsync(), because with the fixyour> > data is still likely lost (unless/until you've run an fsync to flushto> > disk), you just shouldn't see the extended inode size unless theactual> > data made it to disk. > > > > Also note that this was fixed in 2007. ;) > > > > Brian > > > >> Best regards > >> > >> > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > >> > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > ________________________________ > > > > > > This email and any files transmitted with it are confidential and are > > intended solely for the use of the individual or entity to whom theyare> > addressed. If you are not the original recipient or the person > responsible > > for delivering the email to the intended recipient, be advised thatyou> have > > received this email in error, and that any use, dissemination, > forwarding, > > printing, or copying of this email is strictly prohibited. If you > received > > this email in error, please immediately notify the sender and deletethe> > original. > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/4b56a323/attachment-0001.html>------------------------------ Message: 5 Date: Mon, 09 Dec 2013 15:44:24 +0000 From: Nux! <nux at li.nux.ro> To: gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <9775f8114ebbc392472010f2d9bdf432 at li.nux.ro> Content-Type: text/plain; charset=UTF-8; format=flowed On 09.12.2013 13:18, Heiko Kr?mer wrote:> 1) > I'm asking me, if I can delete the raid10 on each server and create > for each HDD a separate brick. > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there > any experience about the write throughput in a production system with > many of bricks like in this case? In addition i'll get double of HDD > capacity.I have found problems with bricks to be disruptive whereas replacing a RAID member is quite trivial. I would recommend against dropping RAID.> 3) > Failover of a HDD is for a raid controller with HotSpare HDD not a big > deal. Glusterfs will rebuild automatically if a brick fails and there > are no data present, this action will perform a lot of network traffic > between the mirror bricks but it will handle it equal as the raid > controller right ?Gluster will not "rebuild automatically" a brick, you will need to manually add/remove it. Additionally, if a brick goes bad gluster won't do anything about it, the affected volumes will just slow down or stop working at all. Again, my advice is KEEP THE RAID and set up good monitoring of drives. :) HTH Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ------------------------------ Message: 6 Date: Mon, 9 Dec 2013 07:57:47 -0800 From: Randy Breunling <rbreunling at gmail.com> To: gluster-users at gluster.org Cc: Randy Breunling <rbreunling at gmail.com> Subject: [Gluster-users] Scalability - File system or Object Store Message-ID: <CAJwwApQ5-SvboWV_iRGC+HJSuT25xSoz_9CBJfGDmpqT4tDJzw at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1">From any experience...which has shown to scale better...a file system oran object store? --Randy San Jose CA -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/dcf7491e/attachment-0001.html>------------------------------ Message: 7 Date: Mon, 9 Dec 2013 11:07:58 -0500 From: Jay Vyas <jayunit100 at gmail.com> To: Randy Breunling <rbreunling at gmail.com> Cc: "Gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Scalability - File system or Object Store Message-ID: <CAAu13zE4kYJ1Dt9ypOMt=M=ps7QfyPSn4LSqZ3YLYBnW5pE4yA at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" in object stores you sacrifice the consistency gauranteed by filesystems for **higher** availability. probably by "scale" you mean higher availability, so... the answer is probably object storage. That said, gluster is an interesting file system in that it is "object-like" --- it is really fast for lookups.... and so if you aren't really sure you need objects, you might be able to do just fine with gluster out of the box. One really cool idea that is permeating the gluster community nowadays is this "UFO" concept, -- you can easily start with regular gluster, and then layer an object store on top at a later date if you want to sacrifice posix operations for (even) higher availability. "Unified File and Object Storage - Unified file and object storage allows admins to utilize the same data store for both POSIX-style mounts as well as S3 or Swift-compatible APIs." (from http://gluster.org/community/documentation/index.php/3.3beta) On Mon, Dec 9, 2013 at 10:57 AM, Randy Breunling <rbreunling at gmail.com>wrote:> From any experience...which has shown to scale better...a file system or > an object store? > > --Randy > San Jose CA > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-- Jay Vyas http://jayunit100.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/e46cf569/attachment-0001.html>------------------------------ Message: 8 Date: Mon, 09 Dec 2013 08:09:24 -0800 From: Joe Julian <joe at julianfamily.org> To: Nux! <nux at li.nux.ro>,gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <698ab788-9f27-44a6-bd98-a53eb25f4573 at email.android.com> Content-Type: text/plain; charset=UTF-8 Nux! <nux at li.nux.ro> wrote:>On 09.12.2013 13:18, Heiko Kr?mer wrote: >> 1) >> I'm asking me, if I can delete the raid10 on each server and create >> for each HDD a separate brick. >> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there >> any experience about the write throughput in a production system with >> many of bricks like in this case? In addition i'll get double of HDD >> capacity. > >I have found problems with bricks to be disruptive whereas replacing a >RAID member is quite trivial. I would recommend against dropping RAID. >Brick disruption has been addressed in 3.4.>> 3) >> Failover of a HDD is for a raid controller with HotSpare HDD not a >big >> deal. Glusterfs will rebuild automatically if a brick fails and there >> are no data present, this action will perform a lot of network >traffic >> between the mirror bricks but it will handle it equal as the raid >> controller right ? > >Gluster will not "rebuild automatically" a brick, you will need to >manually add/remove it.Not exactly, but you will have to manually add an attribute and "heal...full" to re-mirror the replacement.>Additionally, if a brick goes bad gluster won't do anything about it, >the affected volumes will just slow down or stop working at all. >Again, addressed in 3.4.>Again, my advice is KEEP THE RAID and set up good monitoring of drives. >I'm not arguing for or against RAID. It's another tool in our tool box. I, personally, use JBOD. Our use case has a lot of different files being used by different clients. JBOD maximizes our use of cache. ------------------------------ Message: 9 Date: Mon, 9 Dec 2013 11:28:05 -0500 (EST) From: John Mark Walker <johnmark at gluster.org> To: "Kaleb S. KEITHLEY" <kkeithle at redhat.com> Cc: "Gluster-users at gluster.org List" <gluster-users at gluster.org>, Gluster Devel <gluster-devel at nongnu.org> Subject: Re: [Gluster-users] [Gluster-devel] GlusterFest Test Weekend - 3.5 Test #1 Message-ID: <1654421306.26844542.1386606485161.JavaMail.root at redhat.com> Content-Type: text/plain; charset=utf-8 Incidentally, we're wrapping this up today. If you want to be included in the list of swag-receivers (t-shirt, USB car charger, and stickers), you still have a couple of hours to file a bug and have it verified by the dev team. Thanks, everyone :) -JM ----- Original Message -----> On 12/05/2013 09:31 PM, John Mark Walker wrote: > > Greetings, > > > > If you've been keeping up with our weekly meetings and the 3.5planning> > page, then you know that tomorrow, December 6, is the first testing"day"> > for 3.5. But since this is a Friday, we're going to make the partylast> > all weekend, through mid-day Monday. > > > > YUM repos with 3.5.0qa3 RPMs for EPEL-6 and Fedora 18, 19, and 20 are > available at > http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.5.0qa3/ > > > -- > > Kaleb > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at nongnu.org > https://lists.nongnu.org/mailman/listinfo/gluster-devel >------------------------------ Message: 10 Date: Mon, 09 Dec 2013 16:43:42 +0000 From: Nux! <nux at li.nux.ro> To: Joe Julian <joe at julianfamily.org> Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <b48aa7ed1b14432fc4047c934320e941 at li.nux.ro> Content-Type: text/plain; charset=UTF-8; format=flowed On 09.12.2013 16:09, Joe Julian wrote:>> > > Brick disruption has been addressed in 3.4.Good to know! What exactly happens when the brick goes unresponsive?>> Additionally, if a brick goes bad gluster won't do anything about it, >> the affected volumes will just slow down or stop working at all. >> > > Again, addressed in 3.4.How? What is the expected behaviour now? Thanks! -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ------------------------------ Message: 11 Date: Mon, 9 Dec 2013 18:03:59 +0100 From: samuel <samu60 at gmail.com> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: [Gluster-users] compatibility between 3.3 and 3.4 Message-ID: <CAOg=WDc-JT=CfqE39qWSPTjP2OqKj4L_oCfDG8icQKVTpi+0JQ at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi all, We're playing around with new versions and uprading options. We currently have a 2x2x2 stripped-distributed-replicated volume based on 3.3.0 and we're planning to upgrade to 3.4 version. We've tried upgrading fist the clients and we've tried with 3.4.0, 3.4.1 and 3.4.2qa2 but all of them caused the same error: Failed to get stripe-size So it seems as if 3.4 clients are not compatible to 3.3 volumes. Is this assumtion right? Is there any procedure to upgrade the gluster from 3.3 to 3.4 without stopping the service? Where are the compatibility limitations between these 2 versions? Any hint or link to documentation would be highly appreciated. Thank you in advance, Samuel. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/cec50893/attachment-0001.html>------------------------------ Message: 12 Date: Mon, 9 Dec 2013 19:52:57 +0100 From: bernhard glomm <bernhard.glomm at ecologic.eu> To: Heiko Kr?mer <hkraemer at anynines.de> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <E2AB54DC-4D82-4734-9BE2-E7B0B700BBA3 at ecologic.eu> Content-Type: text/plain; charset="windows-1252" Hi Heiko, some years ago I had to deliver a reliable storage that should be easy to grow in size over time. For that I was in close contact with presto prime who produced a lot of interesting research results accessible to the public. http://www.prestoprime.org/project/public.en.html what was striking me was the general concern of how and when and with which pattern hard drives will fail, and the rebuilding time in case a "big" (i.e. 2TB+) drive fails. (one of the papers at pp was dealing in detail with that)>From that background my approach was to build relatively small raid6bricks (9 * 2 TB + 1 Hot-Spare) and connect them together with a distributed glusterfs. I never experienced any problems with that and felt quite comfortable about it. That was for just a lot of big file data exported via samba. At the same time I used another, mirrored, glusterfs as a storage backend for my VM-images, same there, no problem and much less hazel and headache than drbd and ocfs2 which I run on another system. hth best Bernhard Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinn?tzige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic? is a Trade Mark (TM) of Ecologic Institut gemeinn?tzige GmbH On Dec 9, 2013, at 2:18 PM, Heiko Kr?mer <hkraemer at anynines.de> wrote:> Signed PGP part > Heyho guys, > > I'm running since years glusterfs in a small environment without big > problems. > > Now I'm going to use glusterFS for a bigger cluster but I've some > questions :) > > Environment: > * 4 Servers > * 20 x 2TB HDD, each > * Raidcontroller > * Raid 10 > * 4x bricks => Replicated, Distributed volume > * Gluster 3.4 > > 1) > I'm asking me, if I can delete the raid10 on each server and create > for each HDD a separate brick. > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there > any experience about the write throughput in a production system with > many of bricks like in this case? In addition i'll get double of HDD > capacity. > > 2) > I've heard a talk about glusterFS and out scaling. The main point was > if more bricks are in use, the scale out process will take a long > time. The problem was/is the Hash-Algo. So I'm asking me how is it if > I've one very big brick (Raid10 20TB on each server) or I've much more > bricks, what's faster and is there any issues? > Is there any experiences ? > > 3) > Failover of a HDD is for a raid controller with HotSpare HDD not a big > deal. Glusterfs will rebuild automatically if a brick fails and there > are no data present, this action will perform a lot of network traffic > between the mirror bricks but it will handle it equal as the raid > controller right ? > > > > Thanks and cheers > Heiko > > > > -- > Anynines.com > > Avarteq GmbH > B.Sc. Informatik > Heiko Kr?mer > CIO > Twitter: @anynines > > ---- > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 > Sitz: Saarbr?cken > > <hkraemer.vcf>_______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/c95b9cc8/attachment-0001.html>-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/c95b9cc8/attachment-0001.sig>------------------------------ Message: 13 Date: Mon, 9 Dec 2013 14:26:45 -0500 (EST) From: Ben Turner <bturner at redhat.com> To: Heiko Kr?mer <hkraemer at anynines.de> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <124648027.2334242.1386617205234.JavaMail.root at redhat.com> Content-Type: text/plain; charset=utf-8 ----- Original Message -----> From: "Heiko Kr?mer" <hkraemer at anynines.de> > To: "gluster-users at gluster.org List" <gluster-users at gluster.org> > Sent: Monday, December 9, 2013 8:18:28 AM > Subject: [Gluster-users] Gluster infrastructure question > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Heyho guys, > > I'm running since years glusterfs in a small environment without big > problems. > > Now I'm going to use glusterFS for a bigger cluster but I've some > questions :) > > Environment: > * 4 Servers > * 20 x 2TB HDD, each > * Raidcontroller > * Raid 10 > * 4x bricks => Replicated, Distributed volume > * Gluster 3.4 > > 1) > I'm asking me, if I can delete the raid10 on each server and create > for each HDD a separate brick. > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there > any experience about the write throughput in a production system with > many of bricks like in this case? In addition i'll get double of HDD > capacity.Have a look at: http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf Specifically: ? RAID arrays ? More RAID LUNs for better concurrency ? For RAID6, 256-KB stripe size I use a single RAID 6 that is divided into several LUNs for my bricks. For example, on my Dell servers(with PERC6 RAID controllers) each server has 12 disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and create a new PV/VG/LV for each brick. From there I follow the recommendations listed in the presentation. HTH! -b> 2) > I've heard a talk about glusterFS and out scaling. The main point was > if more bricks are in use, the scale out process will take a long > time. The problem was/is the Hash-Algo. So I'm asking me how is it if > I've one very big brick (Raid10 20TB on each server) or I've much more > bricks, what's faster and is there any issues? > Is there any experiences ? > > 3) > Failover of a HDD is for a raid controller with HotSpare HDD not a big > deal. Glusterfs will rebuild automatically if a brick fails and there > are no data present, this action will perform a lot of network traffic > between the mirror bricks but it will handle it equal as the raid > controller right ? > > > > Thanks and cheers > Heiko > > > > - -- > Anynines.com > > Avarteq GmbH > B.Sc. Informatik > Heiko Kr?mer > CIO > Twitter: @anynines > > - ---- > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 > Sitz: Saarbr?cken > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY> =bDly > -----END PGP SIGNATURE----- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users------------------------------ Message: 14 Date: Mon, 9 Dec 2013 14:31:00 -0500 (EST) From: Ben Turner <bturner at redhat.com> To: Heiko Kr?mer <hkraemer at anynines.de> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <1676822821.2336090.1386617460049.JavaMail.root at redhat.com> Content-Type: text/plain; charset=utf-8 ----- Original Message -----> From: "Ben Turner" <bturner at redhat.com> > To: "Heiko Kr?mer" <hkraemer at anynines.de> > Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> > Sent: Monday, December 9, 2013 2:26:45 PM > Subject: Re: [Gluster-users] Gluster infrastructure question > > ----- Original Message ----- > > From: "Heiko Kr?mer" <hkraemer at anynines.de> > > To: "gluster-users at gluster.org List" <gluster-users at gluster.org> > > Sent: Monday, December 9, 2013 8:18:28 AM > > Subject: [Gluster-users] Gluster infrastructure question > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Heyho guys, > > > > I'm running since years glusterfs in a small environment without big > > problems. > > > > Now I'm going to use glusterFS for a bigger cluster but I've some > > questions :) > > > > Environment: > > * 4 Servers > > * 20 x 2TB HDD, each > > * Raidcontroller > > * Raid 10 > > * 4x bricks => Replicated, Distributed volume > > * Gluster 3.4 > > > > 1) > > I'm asking me, if I can delete the raid10 on each server and create > > for each HDD a separate brick. > > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there > > any experience about the write throughput in a production system with > > many of bricks like in this case? In addition i'll get double of HDD > > capacity. > > Have a look at: > > http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdfThat one was from 2012, here is the latest: http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf -b> Specifically: > > ? RAID arrays > ? More RAID LUNs for better concurrency > ? For RAID6, 256-KB stripe size > > I use a single RAID 6 that is divided into several LUNs for my bricks.For> example, on my Dell servers(with PERC6 RAID controllers) each server has12> disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and > create a new PV/VG/LV for each brick. From there I follow the > recommendations listed in the presentation. > > HTH! > > -b > > > 2) > > I've heard a talk about glusterFS and out scaling. The main point was > > if more bricks are in use, the scale out process will take a long > > time. The problem was/is the Hash-Algo. So I'm asking me how is it if > > I've one very big brick (Raid10 20TB on each server) or I've much more > > bricks, what's faster and is there any issues? > > Is there any experiences ? > > > > 3) > > Failover of a HDD is for a raid controller with HotSpare HDD not a big > > deal. Glusterfs will rebuild automatically if a brick fails and there > > are no data present, this action will perform a lot of network traffic > > between the mirror bricks but it will handle it equal as the raid > > controller right ? > > > > > > > > Thanks and cheers > > Heiko > > > > > > > > - -- > > Anynines.com > > > > Avarteq GmbH > > B.Sc. Informatik > > Heiko Kr?mer > > CIO > > Twitter: @anynines > > > > - ---- > > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer > > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 > > Sitz: Saarbr?cken > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.14 (GNU/Linux) > > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > > > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B > > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 > > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ > > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs > > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 > > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY> > =bDly > > -----END PGP SIGNATURE----- > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users------------------------------ Message: 15 Date: Mon, 09 Dec 2013 14:57:08 -0500 From: Jeff Darcy <jdarcy at redhat.com> To: Randy Breunling <rbreunling at gmail.com>, gluster-users at gluster.org Subject: Re: [Gluster-users] Scalability - File system or Object Store Message-ID: <52A62094.1000507 at redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 12/09/2013 10:57 AM, Randy Breunling wrote:> From any experience...which has shown to scale better...a file system > or an object store?In terms of numbers of files/objects, I'd have to say object stores. S3 and Azure are both over a *trillion* objects, and I've never heard of a filesystem that size. In terms of performance it might go the other way. More importantly, I think the object stores give up too much in terms of semantics - e.g. hierarchical directories and rename, byte granularity, consistency/durability guarantees. It saddens me to see so many people working around these limitations in their apps based on object stores - duplicating each others' work, creating incompatibibility (e.g. with a half dozen "conventions" for simulating hierarchical directories), and sometimes even losing data to subtle distributed-coordination bugs. An app that uses a subset of an underlying filesystem's functionality is far more likely to be correct and portable than one that tries to build extra abstractions on top of a bare-bones object store. ------------------------------ Message: 16 Date: Tue, 10 Dec 2013 07:58:25 +1000 From: Dan Mons <dmons at cuttingedge.com.au> To: Ben Turner <bturner at redhat.com> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>, Heiko Kr?mer <hkraemer at anynines.de> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <CACa6TycgVYLNOWkk7eO2L80hhEdQLJpgk-+Bav_dfL2gPVGpjw at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 I went with big RAID on each node (16x 3TB SATA disks in RAID6 with a hot spare per node) rather than brick-per-disk. The simple reason being that I wanted to configure distribute+replicate at the GlusterFS level, and be 100% guaranteed that the replication happened across to another node, and not to another brick on the same node. As each node only has one giant brick, the cluster is forced to replicate to a separate node each time. Some careful initial setup could probably have done the same, but I wanted to avoid the dramas of my employer expanding the cluster one node at a time later on, causing that design goal to fail as the new single node with many bricks found replication partners on itself. On a different topic, I find no real-world difference in RAID10 to RAID6 with GlusterFS. Most of the access delay in Gluster has little to do with the speed of the disk. The only downside to RAID6 is a long rebuild time if you're unlucky enough to blow a couple of drives at once. RAID50 might be a better choice if you're up at 20 drives per node. We invested in SSD caching on our nodes, and to be honest it was rather pointless. Certainly not bad, but the real-world speed boost is not noticed by end users. -Dan ---------------- Dan Mons R&D SysAdmin Unbreaker of broken things Cutting Edge http://cuttingedge.com.au On 10 December 2013 05:31, Ben Turner <bturner at redhat.com> wrote:> ----- Original Message ----- >> From: "Ben Turner" <bturner at redhat.com> >> To: "Heiko Kr?mer" <hkraemer at anynines.de> >> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> >> Sent: Monday, December 9, 2013 2:26:45 PM >> Subject: Re: [Gluster-users] Gluster infrastructure question >> >> ----- Original Message ----- >> > From: "Heiko Kr?mer" <hkraemer at anynines.de> >> > To: "gluster-users at gluster.org List" <gluster-users at gluster.org> >> > Sent: Monday, December 9, 2013 8:18:28 AM >> > Subject: [Gluster-users] Gluster infrastructure question >> > >> > -----BEGIN PGP SIGNED MESSAGE----- >> > Hash: SHA1 >> > >> > Heyho guys, >> > >> > I'm running since years glusterfs in a small environment without big >> > problems. >> > >> > Now I'm going to use glusterFS for a bigger cluster but I've some >> > questions :) >> > >> > Environment: >> > * 4 Servers >> > * 20 x 2TB HDD, each >> > * Raidcontroller >> > * Raid 10 >> > * 4x bricks => Replicated, Distributed volume >> > * Gluster 3.4 >> > >> > 1) >> > I'm asking me, if I can delete the raid10 on each server and create >> > for each HDD a separate brick. >> > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there >> > any experience about the write throughput in a production system with >> > many of bricks like in this case? In addition i'll get double of HDD >> > capacity. >> >> Have a look at: >> >> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf > > That one was from 2012, here is the latest: > >http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf> > -b > >> Specifically: >> >> ? RAID arrays >> ? More RAID LUNs for better concurrency >> ? For RAID6, 256-KB stripe size >> >> I use a single RAID 6 that is divided into several LUNs for my bricks.For>> example, on my Dell servers(with PERC6 RAID controllers) each serverhas 12>> disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and >> create a new PV/VG/LV for each brick. From there I follow the >> recommendations listed in the presentation. >> >> HTH! >> >> -b >> >> > 2) >> > I've heard a talk about glusterFS and out scaling. The main point was >> > if more bricks are in use, the scale out process will take a long >> > time. The problem was/is the Hash-Algo. So I'm asking me how is it if >> > I've one very big brick (Raid10 20TB on each server) or I've muchmore>> > bricks, what's faster and is there any issues? >> > Is there any experiences ? >> > >> > 3) >> > Failover of a HDD is for a raid controller with HotSpare HDD not abig>> > deal. Glusterfs will rebuild automatically if a brick fails and there >> > are no data present, this action will perform a lot of networktraffic>> > between the mirror bricks but it will handle it equal as the raid >> > controller right ? >> > >> > >> > >> > Thanks and cheers >> > Heiko >> > >> > >> > >> > - -- >> > Anynines.com >> > >> > Avarteq GmbH >> > B.Sc. Informatik >> > Heiko Kr?mer >> > CIO >> > Twitter: @anynines >> > >> > - ---- >> > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer >> > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 >> > Sitz: Saarbr?cken >> > -----BEGIN PGP SIGNATURE----- >> > Version: GnuPG v1.4.14 (GNU/Linux) >> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >> > >> > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B >> > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 >> > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ >> > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs >> > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 >> > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY>> > =bDly >> > -----END PGP SIGNATURE----- >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > http://supercolony.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users------------------------------ Message: 17 Date: Mon, 09 Dec 2013 14:09:11 -0800 From: Joe Julian <joe at julianfamily.org> To: Dan Mons <dmons at cuttingedge.com.au> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <52A63F87.8070107 at julianfamily.org> Content-Type: text/plain; charset=UTF-8; format=flowed Replicas are defined in the order bricks are listed in the volume create command. So gluster volume create myvol replica 2 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 will replicate between server1 and server2 and replicate between server3 and server4. Bricks added to a replica 2 volume after it's been created will require pairs of bricks, The best way to "force" replication to happen on another server is to just define it that way. On 12/09/2013 01:58 PM, Dan Mons wrote:> I went with big RAID on each node (16x 3TB SATA disks in RAID6 with a > hot spare per node) rather than brick-per-disk. The simple reason > being that I wanted to configure distribute+replicate at the GlusterFS > level, and be 100% guaranteed that the replication happened across to > another node, and not to another brick on the same node. As each node > only has one giant brick, the cluster is forced to replicate to a > separate node each time. > > Some careful initial setup could probably have done the same, but I > wanted to avoid the dramas of my employer expanding the cluster one > node at a time later on, causing that design goal to fail as the new > single node with many bricks found replication partners on itself. > > On a different topic, I find no real-world difference in RAID10 to > RAID6 with GlusterFS. Most of the access delay in Gluster has little > to do with the speed of the disk. The only downside to RAID6 is a > long rebuild time if you're unlucky enough to blow a couple of drives > at once. RAID50 might be a better choice if you're up at 20 drives > per node. > > We invested in SSD caching on our nodes, and to be honest it was > rather pointless. Certainly not bad, but the real-world speed boost > is not noticed by end users. > > -Dan > > ---------------- > Dan Mons > R&D SysAdmin > Unbreaker of broken things > Cutting Edge > http://cuttingedge.com.au > > > On 10 December 2013 05:31, Ben Turner <bturner at redhat.com> wrote: >> ----- Original Message ----- >>> From: "Ben Turner" <bturner at redhat.com> >>> To: "Heiko Kr?mer" <hkraemer at anynines.de> >>> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> >>> Sent: Monday, December 9, 2013 2:26:45 PM >>> Subject: Re: [Gluster-users] Gluster infrastructure question >>> >>> ----- Original Message ----- >>>> From: "Heiko Kr?mer" <hkraemer at anynines.de> >>>> To: "gluster-users at gluster.org List" <gluster-users at gluster.org> >>>> Sent: Monday, December 9, 2013 8:18:28 AM >>>> Subject: [Gluster-users] Gluster infrastructure question >>>> >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> Heyho guys, >>>> >>>> I'm running since years glusterfs in a small environment without big >>>> problems. >>>> >>>> Now I'm going to use glusterFS for a bigger cluster but I've some >>>> questions :) >>>> >>>> Environment: >>>> * 4 Servers >>>> * 20 x 2TB HDD, each >>>> * Raidcontroller >>>> * Raid 10 >>>> * 4x bricks => Replicated, Distributed volume >>>> * Gluster 3.4 >>>> >>>> 1) >>>> I'm asking me, if I can delete the raid10 on each server and create >>>> for each HDD a separate brick. >>>> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there >>>> any experience about the write throughput in a production system with >>>> many of bricks like in this case? In addition i'll get double of HDD >>>> capacity. >>> Have a look at: >>> >>>http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf>> That one was from 2012, here is the latest: >> >>http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf>> >> -b >> >>> Specifically: >>> >>> ? RAID arrays >>> ? More RAID LUNs for better concurrency >>> ? For RAID6, 256-KB stripe size >>> >>> I use a single RAID 6 that is divided into several LUNs for my bricks.For>>> example, on my Dell servers(with PERC6 RAID controllers) each serverhas 12>>> disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and >>> create a new PV/VG/LV for each brick. From there I follow the >>> recommendations listed in the presentation. >>> >>> HTH! >>> >>> -b >>> >>>> 2) >>>> I've heard a talk about glusterFS and out scaling. The main point was >>>> if more bricks are in use, the scale out process will take a long >>>> time. The problem was/is the Hash-Algo. So I'm asking me how is it if >>>> I've one very big brick (Raid10 20TB on each server) or I've muchmore>>>> bricks, what's faster and is there any issues? >>>> Is there any experiences ? >>>> >>>> 3) >>>> Failover of a HDD is for a raid controller with HotSpare HDD not abig>>>> deal. Glusterfs will rebuild automatically if a brick fails and there >>>> are no data present, this action will perform a lot of networktraffic>>>> between the mirror bricks but it will handle it equal as the raid >>>> controller right ? >>>> >>>> >>>> >>>> Thanks and cheers >>>> Heiko >>>> >>>> >>>> >>>> - -- >>>> Anynines.com >>>> >>>> Avarteq GmbH >>>> B.Sc. Informatik >>>> Heiko Kr?mer >>>> CIO >>>> Twitter: @anynines >>>> >>>> - ---- >>>> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer >>>> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 >>>> Sitz: Saarbr?cken >>>> -----BEGIN PGP SIGNATURE----- >>>> Version: GnuPG v1.4.14 (GNU/Linux) >>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >>>> >>>> iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B >>>> lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 >>>> GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ >>>> N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs >>>> TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 >>>> Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY>>>> =bDly >>>> -----END PGP SIGNATURE----- >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users------------------------------ Message: 18 Date: Tue, 10 Dec 2013 09:38:03 +1000 From: Dan Mons <dmons at cuttingedge.com.au> To: Joe Julian <joe at julianfamily.org> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <CACa6TyenCTAgoKKsXCmrvd0G191VdBPkdNf3j4yROkT_9jTyhQ at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On 10 December 2013 08:09, Joe Julian <joe at julianfamily.org> wrote:> Replicas are defined in the order bricks are listed in the volume create > command. So > gluster volume create myvol replica 2 server1:/data/brick1 > server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 > will replicate between server1 and server2 and replicate between server3and> server4. > > Bricks added to a replica 2 volume after it's been created will require > pairs of bricks, > > The best way to "force" replication to happen on another server is tojust> define it that way.Yup, that's understood. The problem is when (for argument's sake) : * We've defined 4 hosts with 10 disks each * Each individual disk is a brick * Replication is defined correctly when creating the volume initially * I'm on holidays, my employer buys a single node, configures it brick-per-disk, and the IT junior adds it to the cluster All good up until that final point, and then I've got that fifth node at the end replicating to itself. Node goes down some months later, chaos ensues. Not a GlusterFS/technology problem, but a problem with what frequently happens at a human level. As a sysadmin, these are also things I need to work around, even if it means deviating from best practices. :) -Dan ------------------------------ Message: 19 Date: Tue, 10 Dec 2013 11:06:06 +0700 From: Diep Pham Van <imeo at favadi.com> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] [CentOS 6] Upgrade to the glusterfs version in base or in glusterfs-epel Message-ID: <20131210110606.2e217dc6 at debbox> Content-Type: text/plain; charset=US-ASCII On Mon, 9 Dec 2013 19:53:20 +0900 Nguyen Viet Cuong <mrcuongnv at gmail.com> wrote:> There is no glusterfs-server in the "base" repository, just client.Silly me. After install and attempt to mount with base version of glusterfs-fuse, I realize that I have to change 'backupvolfile-server' mount option to 'backup-volfile-servers'[1]. Links: [1] https://bugzilla.redhat.com/show_bug.cgi?id=1023950 -- PHAM Van Diep ------------------------------ Message: 20 Date: Mon, 09 Dec 2013 20:44:06 -0800 From: harry mangalam <harry.mangalam at uci.edu> To: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: [Gluster-users] Where does the 'date' string in '/var/log/glusterfs/gl.log' come from? Message-ID: <34671480.j6DT7uby7B at stunted> Content-Type: text/plain; charset="us-ascii" Admittedly I should search the source, but I wonder if anyone knows this offhand. Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have picked up an 'advanced date' in the /var/log/glusterfs/gl.log file - that date string is running about 5-6 hours ahead of the system date and all the Gluster servers (which are identical and correct). The time advancement does not appear to be identical tho it's hard to tell since it only shows on errors and those update irregularly. All the clients are the same version and all the servers are the same (gluster v 3.4.0-8.el6.x86_64 This would not be of interest except that those 4 clients are losing files, unable to reliably do IO, etc on the gluster fs. They don't appear to be having problems with NFS mounts, nor with a Fraunhofer FS that is also mounted on each node, Rebooting 2 of them has no effect - they come right back with an advanced date. --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/9cde5ba3/attachment-0001.html>------------------------------ Message: 21 Date: Tue, 10 Dec 2013 12:49:25 +0800 From: Sharuzzaman Ahmat Raslan <sharuzzaman at gmail.com> To: harry mangalam <harry.mangalam at uci.edu> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Where does the 'date' string in '/var/log/glusterfs/gl.log' come from? Message-ID: <CAK+zuc=5SY7wuFXUe-i2nUXAhGr+Ddaahr_7TKYgMxgtWKh1zg at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi Harry, Did you setup ntp on each of the node, and sync the time to one single source? Thanks. On Tue, Dec 10, 2013 at 12:44 PM, harry mangalam <harry.mangalam at uci.edu>wrote:> Admittedly I should search the source, but I wonder if anyone knowsthis> offhand. > > > > Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have > picked up an 'advanced date' in the /var/log/glusterfs/gl.log file -that> date string is running about 5-6 hours ahead of the system date and allthe> Gluster servers (which are identical and correct). The time advancement > does not appear to be identical tho it's hard to tell since it onlyshows> on errors and those update irregularly. > > > > All the clients are the same version and all the servers are the same > (gluster v 3.4.0-8.el6.x86_64 > > > > This would not be of interest except that those 4 clients are losing > files, unable to reliably do IO, etc on the gluster fs. They don'tappear> to be having problems with NFS mounts, nor with a Fraunhofer FS that is > also mounted on each node, > > > > Rebooting 2 of them has no effect - they come right back with anadvanced> date. > > > > > > --- > > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine > > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 > > 415 South Circle View Dr, Irvine, CA, 92697 [shipping] > > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > > --- > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-- Sharuzzaman Ahmat Raslan -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d0de4ecd/attachment-0001.html>------------------------------ Message: 22 Date: Tue, 10 Dec 2013 04:49:50 +0000 From: Bobby Jacob <bobby.jacob at alshaya.com> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: [Gluster-users] FW: Self Heal Issue GlusterFS 3.3.1 Message-ID: <AC3305F9C186F849B835A3E6D3C9BEFEB5A763 at KWTPRMBX001.mha.local> Content-Type: text/plain; charset="iso-8859-1" Hi, Can someone please advise on this issue. ?? Urgent. Selfheal is working every 10 minutes only. ?? Thanks & Regards, Bobby Jacob From: Bobby Jacob Sent: Tuesday, December 03, 2013 8:51 AM To: gluster-users at gluster.org Subject: FW: Self Heal Issue GlusterFS 3.3.1 Just and addition: on the node where the self heal is not working when I check /var/log/glusterd/glustershd.log, I see the following: [2013-12-03 05:49:18.348637] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.350273] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.354813] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.355893] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.356901] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.357730] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.359136] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.360276] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.361168] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.362135] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.363569] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.364232] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.364872] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.365777] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.367383] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.368075] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) Thanks & Regards, Bobby Jacob From: gluster-users-bounces at gluster.org [ mailto:gluster-users-bounces at gluster.org] On Behalf Of Bobby Jacob Sent: Tuesday, December 03, 2013 8:48 AM To: gluster-users at gluster.org Subject: [Gluster-users] Self Heal Issue GlusterFS 3.3.1 Hi, I'm running glusterFS 3.3.1 on Centos 6.4. ? Gluster volume status Status of volume: glustervol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y 20031 Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y 1260 NFS Server on localhost 38467 Y 43320 Self-heal Daemon on localhost N/A Y 43326 NFS Server on KWTOCUATGS002 38467 Y 5842 Self-heal Daemon on KWTOCUATGS002 N/A Y 5848 The self heal stops working and application write only to 1 brick and it doesn't replicate. When I check /var/log/glusterfs/glustershd.log I see the following.: [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2013-12-03 05:42:32.033646] W [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2013-12-03 05:42:32.790473] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.790840] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: Connected to 172.16.95.153:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.790884] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back up; going online. [2013-12-03 05:42:32.791161] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-1: Server lk version = 1 [2013-12-03 05:42:32.795103] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.798064] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.799278] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.800636] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.802223] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.803339] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804308] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804877] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.807517] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: Connected to 172.16.107.154:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.807562] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.810357] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-0: Server lk version = 1 [2013-12-03 05:42:32.827437] E [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] 0-glustervol-replicate-0: Unable to self-heal contents of '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). Please delete the file from all but the preferred subvolume. [2013-12-03 05:42:39.205157] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). Please fix the file on all backend volumes [2013-12-03 05:42:39.215793] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). Please fix the file on all backend volumes PLEASE ADVICE. Thanks & Regards, Bobby Jacob -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment-0001.html>-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001.txt URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment-0001.txt>------------------------------ Message: 23 Date: Mon, 09 Dec 2013 20:59:21 -0800 From: Joe Julian <joe at julianfamily.org> To: Bobby Jacob <bobby.jacob at alshaya.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1 Message-ID: <1386651561.2455.12.camel at bunion-ii.julianfamily.org> Content-Type: text/plain; charset="UTF-8" On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:> Hi, > > > > I?m running glusterFS 3.3.1 on Centos 6.4. > > ? Gluster volume status > > > > Status of volume: glustervol > > Gluster process Port Online > Pid > >------------------------------------------------------------------------------> > Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y > 20031 > > Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y > 1260 > > NFS Server on localhost > 38467 Y 43320 > > Self-heal Daemon on localhost N/A > Y 43326 > > NFS Server on KWTOCUATGS002 38467 Y > 5842 > > Self-heal Daemon on KWTOCUATGS002 N/A Y > 5848 > > > > The self heal stops working and application write only to 1 brick and > it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log I > see the following.: > > > > [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] > 0-socket: failed to set keep idle on socket 8 > > [2013-12-03 05:42:32.033646] W > [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: > Failed to set keep-alive: Operation not supported > > [2013-12-03 05:42:32.790473] I > [client-handshake.c:1614:select_server_supported_programs] > 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), > Version (330) > > [2013-12-03 05:42:32.790840] I > [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: > Connected to 172.16.95.153:24009, attached to remote volume > '/mnt/cloudbrick'. > > [2013-12-03 05:42:32.790884] I > [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: > Server and Client lk-version numbers are not same, reopening the fds > > [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] > 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back > up; going online. > > [2013-12-03 05:42:32.791161] I > [client-handshake.c:453:client_set_lk_version_cbk] > 0-glustervol-client-1: Server lk version = 1 > > [2013-12-03 05:42:32.795103] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.798064] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.799278] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.800636] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.802223] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.803339] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.804308] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.804877] I > [client-handshake.c:1614:select_server_supported_programs] > 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), > Version (330) > > [2013-12-03 05:42:32.807517] I > [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: > Connected to 172.16.107.154:24009, attached to remote volume > '/mnt/cloudbrick'. > > [2013-12-03 05:42:32.807562] I > [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: > Server and Client lk-version numbers are not same, reopening the fds > > [2013-12-03 05:42:32.810357] I > [client-handshake.c:453:client_set_lk_version_cbk] > 0-glustervol-client-0: Server lk version = 1 > > [2013-12-03 05:42:32.827437] E > [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] > 0-glustervol-replicate-0: Unable to self-heal contents of > '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). > Please delete the file from all but the preferred subvolume.That file is at $brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403 Try picking one to remove like it says.> > [2013-12-03 05:42:39.205157] E > [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] > 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of > '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). > Please fix the file on all backend volumes > > [2013-12-03 05:42:39.215793] E > [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] > 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of > '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). > Please fix the file on all backend volumes > >If that doesn't allow it to heal, you may need to find which filename that's hardlinked to. ls -li the gfid file at the path I demonstrated earlier. With that inode number in hand, find $brick -inum $inode_number Once you know which filenames it's linked with, remove all linked copies from all but one replica. Then the self-heal can continue successfully. ------------------------------ Message: 24 Date: Tue, 10 Dec 2013 13:09:38 +0800 From: Franco Broi <franco.broi at iongeo.com> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: [Gluster-users] Pausing rebalance Message-ID: <1386652178.1682.110.camel at tc1> Content-Type: text/plain; charset="UTF-8" Before attempting a rebalance on my existing distributed Gluster volume I thought I'd do some testing with my new storage. I created a volume consisting of 4 bricks on the same server and wrote some data to it. I then added a new brick from a another server. I ran the fix-layout and wrote some new files and could see them on the new brick. All good so far, so I started the data rebalance. After it had been running for a while I wanted to add another brick, which I obviously couldn't do while it was running so I stopped it. Even with it stopped It wouldn't let me add a brick so I tried restarting it, but it wouldn't let me do that either. I presume you just reissue the start command as there's no restart? [root at nas3 ~]# gluster vol rebalance test-volume status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 7 611.7GB 1358 0 10 stopped 4929.00 localhost 7 611.7GB 1358 0 10 stopped 4929.00 nas4-10g 0 0Bytes 1506 0 0 completed 8.00 volume rebalance: test-volume: success: [root at nas3 ~]# gluster vol add-brick test-volume nas4-10g:/data14/gvol volume add-brick: failed: Volume name test-volume rebalance is in progress. Please retry after completion [root at nas3 ~]# gluster vol rebalance test-volume start volume rebalance: test-volume: failed: Rebalance on test-volume is already started In the end I used the force option to make it start but was that the right thing to do? glusterfs 3.4.1 built on Oct 28 2013 11:01:59 Volume Name: test-volume Type: Distribute Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066 Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: nas3-10g:/data9/gvol Brick2: nas3-10g:/data10/gvol Brick3: nas3-10g:/data11/gvol Brick4: nas3-10g:/data12/gvol Brick5: nas4-10g:/data13/gvol ------------------------------ Message: 25 Date: Tue, 10 Dec 2013 10:42:28 +0530 From: Vijay Bellur <vbellur at redhat.com> To: harry mangalam <harry.mangalam at uci.edu>, "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Where does the 'date' string in '/var/log/glusterfs/gl.log' come from? Message-ID: <52A6A2BC.7010501 at redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 12/10/2013 10:14 AM, harry mangalam wrote:> Admittedly I should search the source, but I wonder if anyone knows this > offhand. > > Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have > picked up an 'advanced date' in the /var/log/glusterfs/gl.log file - > that date string is running about 5-6 hours ahead of the system date and > all the Gluster servers (which are identical and correct). The time > advancement does not appear to be identical tho it's hard to tell since > it only shows on errors and those update irregularly.The timestamps in the log file are by default in UTC. That could possibly explain why the timestamps look advanced in the log file.> > All the clients are the same version and all the servers are the same > (gluster v 3.4.0-8.el6.x86_64 > > This would not be of interest except that those 4 clients are losing > files, unable to reliably do IO, etc on the gluster fs. They don't > appear to be having problems with NFS mounts, nor with a Fraunhofer FS > that is also mounted on each node,Do you observe anything in the client log files of these machines that indicate I/O problems? Thanks, Vijay ------------------------------ Message: 26 Date: Tue, 10 Dec 2013 10:56:52 +0530 From: shishir gowda <gowda.shishir at gmail.com> To: Franco Broi <franco.broi at iongeo.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Pausing rebalance Message-ID: <CAMYy+hVgyiPMYiDtkKtA1EBbbcpJAyp3O1_1=oAqKq1dc4NN+g at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi Franco, If a file is under migration, and a rebalance stop is encountered, then rebalance process exits only after the completion of the migration. That might be one of the reasons why you saw rebalance in progress message while trying to add the brick Could you please share the average file size in your setup? You could always check the rebalance status command to ensure rebalance has indeed completed/stopped before proceeding with the add-brick. Using add-brick force while rebalance is on-going should not be used in normal scenarios. I do see that in your case, they show stopped/completed. Glusterd logs would help in triaging the issue. Rebalance re-writes layouts, and migrates data. While this is happening, if a add-brick is done, then the cluster might go into a imbalanced stated. Hence, the check if rebalance is in progress while doing add-brick With regards, Shishir On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote:> > Before attempting a rebalance on my existing distributed Gluster volume > I thought I'd do some testing with my new storage. I created a volume > consisting of 4 bricks on the same server and wrote some data to it. I > then added a new brick from a another server. I ran the fix-layout and > wrote some new files and could see them on the new brick. All good so > far, so I started the data rebalance. After it had been running for a > while I wanted to add another brick, which I obviously couldn't do while > it was running so I stopped it. Even with it stopped It wouldn't let me > add a brick so I tried restarting it, but it wouldn't let me do that > either. I presume you just reissue the start command as there's no > restart? > > [root at nas3 ~]# gluster vol rebalance test-volume status > Node Rebalanced-files size > scanned failures skipped status run time in secs > --------- ----------- ----------- ----------- ----------- > ----------- ------------ -------------- > localhost 7 611.7GB 1358 0 > 10 stopped 4929.00 > localhost 7 611.7GB 1358 0 > 10 stopped 4929.00 > nas4-10g 0 0Bytes 1506 0 > 0 completed 8.00 > volume rebalance: test-volume: success: > [root at nas3 ~]# gluster vol add-brick test-volume nas4-10g:/data14/gvol > volume add-brick: failed: Volume name test-volume rebalance is in > progress. Please retry after completion > [root at nas3 ~]# gluster vol rebalance test-volume start > volume rebalance: test-volume: failed: Rebalance on test-volume isalready> started > > In the end I used the force option to make it start but was that the > right thing to do? > > glusterfs 3.4.1 built on Oct 28 2013 11:01:59 > Volume Name: test-volume > Type: Distribute > Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066 > Status: Started > Number of Bricks: 5 > Transport-type: tcp > Bricks: > Brick1: nas3-10g:/data9/gvol > Brick2: nas3-10g:/data10/gvol > Brick3: nas3-10g:/data11/gvol > Brick4: nas3-10g:/data12/gvol > Brick5: nas4-10g:/data13/gvol > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/1944e9e8/attachment-0001.html>------------------------------ Message: 27 Date: Tue, 10 Dec 2013 11:02:52 +0530 From: Vijay Bellur <vbellur at redhat.com> To: Alex Pearson <alex at apics.co.uk> Cc: gluster-users Discussion List <Gluster-users at gluster.org> Subject: Re: [Gluster-users] replace-brick failing - transport.address-family not specified Message-ID: <52A6A784.6070404 at redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 12/08/2013 05:44 PM, Alex Pearson wrote:> Hi All, > Just to assist anyone else having this issue, and so people can correctme if I'm wrong...> > It would appear that replace-brick is 'horribly broken' and should notbe used in Gluster 3.4. Instead a combination of "remove-brick ... count X ... start" should be used to remove the resilience from a volume and the brick, then "add-brick ... count X" to add the new brick.> > This does beg the question of why the hell a completely broken commandwas left in the 'stable' release of the software. This sort of thing really hurts Glusters credibility. A mention of replace-brick not being functional was made in the release note for 3.4.0: https://github.com/gluster/glusterfs/blob/release-3.4/doc/release-notes/3.4.0.md> > Ref:http://www.gluster.org/pipermail/gluster-users/2013-August/036936.html This discussion happened after the release of GlusterFS 3.4. However, I do get the point you are trying to make here. We can have an explicit warning in CLI when operations considered broken are attempted. There is a similar plan to add a warning for rdma volumes: https://bugzilla.redhat.com/show_bug.cgi?id=1017176 There is a patch under review currently to remove the replace-brick command from CLI: http://review.gluster.org/6031 This is intended for master. If you can open a bug report indicating an appropriate warning message that you would like to see when replace-brick is attempted, I would be happy to get such a fix in to both 3.4 and 3.5. Thanks, Vijay> > Cheers > > Alex > > ----- Original Message ----- > From: "Alex Pearson" <alex at apics.co.uk> > To: gluster-users at gluster.org > Sent: Friday, 6 December, 2013 5:25:43 PM > Subject: [Gluster-users] replace-brick failing -transport.address-family not specified> > Hello, > I have what I think is a fairly basic Gluster setup, however when I tryto carry out a replace-brick operation it consistently fails...> > Here are the command line options: > > root at osh1:~# gluster volume info media > > Volume Name: media > Type: Replicate > Volume ID: 4c290928-ba1c-4a45-ac05-85365b4ea63a > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: osh1.apics.co.uk:/export/sdc/media > Brick2: osh2.apics.co.uk:/export/sdb/media > > root at osh1:~# gluster volume replace-brick mediaosh1.apics.co.uk:/export/sdc/media osh1.apics.co.uk:/export/WCASJ2055681/media start> volume replace-brick: success: replace-brick started successfully > ID: 60bef96f-a5c7-4065-864e-3e0b2773d7bb > root at osh1:~# gluster volume replace-brick mediaosh1.apics.co.uk:/export/sdc/media osh1.apics.co.uk:/export/WCASJ2055681/media status> volume replace-brick: failed: Commit failed on localhost. Please checkthe log file for more details.> > root at osh1:~# tail /var/log/glusterfs/bricks/export-sdc-media.log > [2013-12-06 17:24:54.795754] E [name.c:147:client_fill_address_family]0-media-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options> [2013-12-06 17:24:57.796422] W [dict.c:1055:data_to_str](-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7fb826e39f50]))) 0-dict: data is NULL> [2013-12-06 17:24:57.796494] W [dict.c:1055:data_to_str](-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7fb826e39f5b]))) 0-dict: data is NULL> [2013-12-06 17:24:57.796519] E [name.c:147:client_fill_address_family]0-media-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options> [2013-12-06 17:25:00.797153] W [dict.c:1055:data_to_str](-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7fb826e39f50]))) 0-dict: data is NULL> [2013-12-06 17:25:00.797226] W [dict.c:1055:data_to_str](-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7fb826e39f5b]))) 0-dict: data is NULL> [2013-12-06 17:25:00.797251] E [name.c:147:client_fill_address_family]0-media-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options> [2013-12-06 17:25:03.797811] W [dict.c:1055:data_to_str](-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7fb826e39f50]))) 0-dict: data is NULL> [2013-12-06 17:25:03.797883] W [dict.c:1055:data_to_str](-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7fb826e39f5b]))) 0-dict: data is NULL> [2013-12-06 17:25:03.797909] E [name.c:147:client_fill_address_family]0-media-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options> > > I've tried placing the transport.address-family option in variousplaces, however it hasn't helped.> > Any help would be very much appreciated. > > Thanks in advance > > Alex >------------------------------ Message: 28 Date: Tue, 10 Dec 2013 11:04:49 +0530 From: Vijay Bellur <vbellur at redhat.com> To: Diep Pham Van <imeo at favadi.com>, "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] [CentOS 6] Upgrade to the glusterfs version in base or in glusterfs-epel Message-ID: <52A6A7F9.2090009 at redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 12/10/2013 09:36 AM, Diep Pham Van wrote:> On Mon, 9 Dec 2013 19:53:20 +0900 > Nguyen Viet Cuong <mrcuongnv at gmail.com> wrote: > >> There is no glusterfs-server in the "base" repository, just client. > Silly me. > After install and attempt to mount with base version of glusterfs-fuse, > I realize that I have to change 'backupvolfile-server' mount option to > 'backup-volfile-servers'[1].And a patch to provide backward compatibility for 'backupvolfile-server' is available now [1]. -Vijay [1] http://review.gluster.org/6464> > Links: > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1023950 >------------------------------ Message: 29 Date: Tue, 10 Dec 2013 13:39:38 +0800 From: Franco Broi <franco.broi at iongeo.com> To: shishir gowda <gowda.shishir at gmail.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Pausing rebalance Message-ID: <1386653978.1682.125.camel at tc1> Content-Type: text/plain; charset="utf-8" On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:> Hi Franco, > > > If a file is under migration, and a rebalance stop is encountered, > then rebalance process exits only after the completion of the > migration. > > That might be one of the reasons why you saw rebalance in progress > message while trying to add the brickThe status said it was stopped. I didn't do a top on the machine but are you saying that it was still rebalancing despite saying it had stopped?> > Could you please share the average file size in your setup? >Bit hard to say, I just copied some data from our main processing system. The sizes range from very small to 10's of gigabytes.> > You could always check the rebalance status command to ensure > rebalance has indeed completed/stopped before proceeding with the > add-brick. Using add-brick force while rebalance is on-going should > not be used in normal scenarios. I do see that in your case, they show > stopped/completed. Glusterd logs would help in triaging the issue.See attached.> > > Rebalance re-writes layouts, and migrates data. While this is > happening, if a add-brick is done, then the cluster might go into a > imbalanced stated. Hence, the check if rebalance is in progress while > doing add-brickI can see that but as far as I could tell, the rebalance had stopped according to the status. Just to be clear, what command restarts the rebalancing?> > > With regards, > Shishir > > > > On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote: > > Before attempting a rebalance on my existing distributed > Gluster volume > I thought I'd do some testing with my new storage. I created a > volume > consisting of 4 bricks on the same server and wrote some data > to it. I > then added a new brick from a another server. I ran the > fix-layout and > wrote some new files and could see them on the new brick. All > good so > far, so I started the data rebalance. After it had been > running for a > while I wanted to add another brick, which I obviously > couldn't do while > it was running so I stopped it. Even with it stopped It > wouldn't let me > add a brick so I tried restarting it, but it wouldn't let me > do that > either. I presume you just reissue the start command as > there's no > restart? > > [root at nas3 ~]# gluster vol rebalance test-volume status > Node Rebalanced-files > size scanned failures skipped > status run time in secs > --------- ----------- ----------- ----------- > ----------- ----------- ------------ -------------- > localhost 7 611.7GB 1358 > 0 10 stopped 4929.00 > localhost 7 611.7GB 1358 > 0 10 stopped 4929.00 > nas4-10g 0 0Bytes 1506 > 0 0 completed 8.00 > volume rebalance: test-volume: success: > [root at nas3 ~]# gluster vol add-brick test-volume > nas4-10g:/data14/gvol > volume add-brick: failed: Volume name test-volume rebalance is > in progress. Please retry after completion > [root at nas3 ~]# gluster vol rebalance test-volume start > volume rebalance: test-volume: failed: Rebalance on > test-volume is already started > > In the end I used the force option to make it start but was > that the > right thing to do? > > glusterfs 3.4.1 built on Oct 28 2013 11:01:59 > Volume Name: test-volume > Type: Distribute > Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066 > Status: Started > Number of Bricks: 5 > Transport-type: tcp > Bricks: > Brick1: nas3-10g:/data9/gvol > Brick2: nas3-10g:/data10/gvol > Brick3: nas3-10g:/data11/gvol > Brick4: nas3-10g:/data12/gvol > Brick5: nas4-10g:/data13/gvol > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- A non-text attachment was scrubbed... Name: etc-glusterfs-glusterd.vol.log.gz Type: application/gzip Size: 7209 bytes Desc: not available URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/adc5d486/attachment-0001.bin>------------------------------ Message: 30 Date: Tue, 10 Dec 2013 11:09:47 +0530 From: Vijay Bellur <vbellur at redhat.com> To: Nguyen Viet Cuong <mrcuongnv at gmail.com> Cc: "Gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] replace-brick failing - transport.address-family not specified Message-ID: <52A6A923.4030208 at redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:> Thanks for sharing. > > Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.x in > production. >This is quite contrary to what we have seen in the community. From a development perspective too, we feel much better about 3.4.1. Are there specific instances that worked well with 3.2.x which does not work fine for you in 3.4.x? Cheers, Vijay ------------------------------ Message: 31 Date: Tue, 10 Dec 2013 11:30:21 +0530 From: Kaushal M <kshlmster at gmail.com> To: Franco Broi <franco.broi at iongeo.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Pausing rebalance Message-ID: <CAOujamU0J4Tam9ojFAmCoPqSzd5Tm1FeyfMYEBv2znMX9yN=4A at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi at iongeo.com> wrote:> On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote: >> Hi Franco, >> >> >> If a file is under migration, and a rebalance stop is encountered, >> then rebalance process exits only after the completion of the >> migration. >> >> That might be one of the reasons why you saw rebalance in progress >> message while trying to add the brick > > The status said it was stopped. I didn't do a top on the machine but are > you saying that it was still rebalancing despite saying it had stopped? >The 'stopped' status is a little bit misleading. The rebalance process could have been migrating a large file when the stop command was issued, so the process would continue migrating that file and quit once it finished. In this time period, though the status says 'stopped' the rebalance process is actually running, which prevents other operations from happening. Ideally, we would have a 'stopping' status which would convey the correct meaning. But for now we can only verify that a rebalance process has actually stopped by monitoring the actual rebalance process. The rebalance process is a 'glusterfs' process with some arguments containing rebalance.>> >> Could you please share the average file size in your setup? >> > > Bit hard to say, I just copied some data from our main processing > system. The sizes range from very small to 10's of gigabytes. > >> >> You could always check the rebalance status command to ensure >> rebalance has indeed completed/stopped before proceeding with the >> add-brick. Using add-brick force while rebalance is on-going should >> not be used in normal scenarios. I do see that in your case, they show >> stopped/completed. Glusterd logs would help in triaging the issue. > > See attached. > >> >> >> Rebalance re-writes layouts, and migrates data. While this is >> happening, if a add-brick is done, then the cluster might go into a >> imbalanced stated. Hence, the check if rebalance is in progress while >> doing add-brick > > I can see that but as far as I could tell, the rebalance had stopped > according to the status. > > Just to be clear, what command restarts the rebalancing? > >> >> >> With regards, >> Shishir >> >> >> >> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote: >> >> Before attempting a rebalance on my existing distributed >> Gluster volume >> I thought I'd do some testing with my new storage. I created a >> volume >> consisting of 4 bricks on the same server and wrote some data >> to it. I >> then added a new brick from a another server. I ran the >> fix-layout and >> wrote some new files and could see them on the new brick. All >> good so >> far, so I started the data rebalance. After it had been >> running for a >> while I wanted to add another brick, which I obviously >> couldn't do while >> it was running so I stopped it. Even with it stopped It >> wouldn't let me >> add a brick so I tried restarting it, but it wouldn't let me >> do that >> either. I presume you just reissue the start command as >> there's no >> restart? >> >> [root at nas3 ~]# gluster vol rebalance test-volume status >> Node Rebalanced-files >> size scanned failures skipped >> status run time in secs >> --------- ----------- ----------- ----------- >> ----------- ----------- ------------ -------------- >> localhost 7 611.7GB 1358 >> 0 10 stopped 4929.00 >> localhost 7 611.7GB 1358 >> 0 10 stopped 4929.00 >> nas4-10g 0 0Bytes 1506 >> 0 0 completed 8.00 >> volume rebalance: test-volume: success: >> [root at nas3 ~]# gluster vol add-brick test-volume >> nas4-10g:/data14/gvol >> volume add-brick: failed: Volume name test-volume rebalance is >> in progress. Please retry after completion >> [root at nas3 ~]# gluster vol rebalance test-volume start >> volume rebalance: test-volume: failed: Rebalance on >> test-volume is already started >> >> In the end I used the force option to make it start but was >> that the >> right thing to do? >> >> glusterfs 3.4.1 built on Oct 28 2013 11:01:59 >> Volume Name: test-volume >> Type: Distribute >> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066 >> Status: Started >> Number of Bricks: 5 >> Transport-type: tcp >> Bricks: >> Brick1: nas3-10g:/data9/gvol >> Brick2: nas3-10g:/data10/gvol >> Brick3: nas3-10g:/data11/gvol >> Brick4: nas3-10g:/data12/gvol >> Brick5: nas4-10g:/data13/gvol >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users------------------------------ Message: 32 Date: Tue, 10 Dec 2013 14:32:46 +0800 From: Franco Broi <franco.broi at iongeo.com> To: Kaushal M <kshlmster at gmail.com> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Pausing rebalance Message-ID: <1386657166.1682.130.camel at tc1> Content-Type: text/plain; charset="UTF-8" Thanks for clearing that up. I had to wait about 30 minutes for all rebalancing activity to cease, then I was able to add a new brick. What does it use to migrate the files? The copy rate was pretty slow considering both bricks were on the same server, I only saw about 200MB/Sec. Each brick is a 16 disk ZFS raidz2, copying with dd I can get well over 500MB/Sec. On Tue, 2013-12-10 at 11:30 +0530, Kaushal M wrote:> On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi at iongeo.com>wrote:> > On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote: > >> Hi Franco, > >> > >> > >> If a file is under migration, and a rebalance stop is encountered, > >> then rebalance process exits only after the completion of the > >> migration. > >> > >> That might be one of the reasons why you saw rebalance in progress > >> message while trying to add the brick > > > > The status said it was stopped. I didn't do a top on the machine butare> > you saying that it was still rebalancing despite saying it hadstopped?> > > > The 'stopped' status is a little bit misleading. The rebalance process > could have been migrating a large file when the stop command was > issued, so the process would continue migrating that file and quit > once it finished. In this time period, though the status says > 'stopped' the rebalance process is actually running, which prevents > other operations from happening. Ideally, we would have a 'stopping' > status which would convey the correct meaning. But for now we can only > verify that a rebalance process has actually stopped by monitoring the > actual rebalance process. The rebalance process is a 'glusterfs' > process with some arguments containing rebalance. > > >> > >> Could you please share the average file size in your setup? > >> > > > > Bit hard to say, I just copied some data from our main processing > > system. The sizes range from very small to 10's of gigabytes. > > > >> > >> You could always check the rebalance status command to ensure > >> rebalance has indeed completed/stopped before proceeding with the > >> add-brick. Using add-brick force while rebalance is on-going should > >> not be used in normal scenarios. I do see that in your case, theyshow> >> stopped/completed. Glusterd logs would help in triaging the issue. > > > > See attached. > > > >> > >> > >> Rebalance re-writes layouts, and migrates data. While this is > >> happening, if a add-brick is done, then the cluster might go into a > >> imbalanced stated. Hence, the check if rebalance is in progress while > >> doing add-brick > > > > I can see that but as far as I could tell, the rebalance had stopped > > according to the status. > > > > Just to be clear, what command restarts the rebalancing? > > > >> > >> > >> With regards, > >> Shishir > >> > >> > >> > >> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com>wrote:> >> > >> Before attempting a rebalance on my existing distributed > >> Gluster volume > >> I thought I'd do some testing with my new storage. I createda> >> volume > >> consisting of 4 bricks on the same server and wrote some data > >> to it. I > >> then added a new brick from a another server. I ran the > >> fix-layout and > >> wrote some new files and could see them on the new brick. All > >> good so > >> far, so I started the data rebalance. After it had been > >> running for a > >> while I wanted to add another brick, which I obviously > >> couldn't do while > >> it was running so I stopped it. Even with it stopped It > >> wouldn't let me > >> add a brick so I tried restarting it, but it wouldn't let me > >> do that > >> either. I presume you just reissue the start command as > >> there's no > >> restart? > >> > >> [root at nas3 ~]# gluster vol rebalance test-volume status > >> Node Rebalanced-files > >> size scanned failures skipped > >> status run time in secs > >> --------- ----------- ----------- ----------- > >> ----------- ----------- ------------ -------------- > >> localhost 7 611.7GB 1358 > >> 0 10 stopped 4929.00 > >> localhost 7 611.7GB 1358 > >> 0 10 stopped 4929.00 > >> nas4-10g 0 0Bytes 1506 > >> 0 0 completed 8.00 > >> volume rebalance: test-volume: success: > >> [root at nas3 ~]# gluster vol add-brick test-volume > >> nas4-10g:/data14/gvol > >> volume add-brick: failed: Volume name test-volume rebalanceis> >> in progress. Please retry after completion > >> [root at nas3 ~]# gluster vol rebalance test-volume start > >> volume rebalance: test-volume: failed: Rebalance on > >> test-volume is already started > >> > >> In the end I used the force option to make it start but was > >> that the > >> right thing to do? > >> > >> glusterfs 3.4.1 built on Oct 28 2013 11:01:59 > >> Volume Name: test-volume > >> Type: Distribute > >> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066 > >> Status: Started > >> Number of Bricks: 5 > >> Transport-type: tcp > >> Bricks: > >> Brick1: nas3-10g:/data9/gvol > >> Brick2: nas3-10g:/data10/gvol > >> Brick3: nas3-10g:/data11/gvol > >> Brick4: nas3-10g:/data12/gvol > >> Brick5: nas4-10g:/data13/gvol > >> > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > >> > >> > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users------------------------------ Message: 33 Date: Tue, 10 Dec 2013 07:42:57 +0000 From: Bobby Jacob <bobby.jacob at alshaya.com> To: Joe Julian <joe at julianfamily.org> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1 Message-ID: <AC3305F9C186F849B835A3E6D3C9BEFEB5A841 at KWTPRMBX001.mha.local> Content-Type: text/plain; charset="utf-8" Hi, Thanks Joe, the split brain files have been removed as you recommended. How can we deal with this situation as there is no document which solves such issues. ? [root at KWTOCUATGS001 83]# gluster volume heal glustervol info Gathering Heal info on volume glustervol has been successful Brick KWTOCUATGS001:/mnt/cloudbrick Number of entries: 14 /Tommy Kolega <gfid:10429dd5-180c-432e-aa4a-8b1624b86f4b> <gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7> <gfid:3e3d77d6-2818-4766-ae3b-4f582118321b> <gfid:8bd03482-025c-4c09-8704-60be9ddfdfd8> <gfid:2685e11a-4eb9-4a92-883e-faa50edfa172> <gfid:24d83cbd-e621-4330-b0c1-ae1f0fd2580d> <gfid:197e50fa-bfc0-4651-acaa-1f3d2d73936f> <gfid:3e094ee9-c9cf-4010-82f4-6d18c1ab9ca0> <gfid:77783245-4e03-4baf-8cb4-928a57b266cb> <gfid:70340eaa-7967-41d0-855f-36add745f16f> <gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f> <gfid:b1651457-175a-43ec-b476-d91ae8b52b0b> /Tommy Kolega/lucene_index Brick KWTOCUATGS002:/mnt/cloudbrick Number of entries: 15 <gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7> <gfid:0454d0d2-d432-4ac8-8476-02a8522e4a6a> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> <gfid:00389876-700f-4351-b00e-1c57496eed89> <gfid:0cd48d89-1dd2-47f6-9311-58224b19446e> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> <gfid:a109c429-5885-499e-8711-09fdccd396f2> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> /Tommy Kolega /Tommy Kolega/lucene_index <gfid:c49e9d76-e5d4-47dc-9cf1-3f858f6d07ea> <gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f> Thanks & Regards, Bobby Jacob -----Original Message----- From: Joe Julian [mailto:joe at julianfamily.org] Sent: Tuesday, December 10, 2013 7:59 AM To: Bobby Jacob Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1 On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:> Hi, > > > > I?m running glusterFS 3.3.1 on Centos 6.4. > > ? Gluster volume status > > > > Status of volume: glustervol > > Gluster process Port Online > Pid > > ---------------------------------------------------------------------- > -------- > > Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y > 20031 > > Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y > 1260 > > NFS Server on localhost > 38467 Y 43320 > > Self-heal Daemon on localhost N/A > Y 43326 > > NFS Server on KWTOCUATGS002 38467 Y > 5842 > > Self-heal Daemon on KWTOCUATGS002 N/A Y > 5848 > > > > The self heal stops working and application write only to 1 brick and > it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log I > see the following.: > > > > [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] > 0-socket: failed to set keep idle on socket 8 > > [2013-12-03 05:42:32.033646] W > [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: > Failed to set keep-alive: Operation not supported > > [2013-12-03 05:42:32.790473] I > [client-handshake.c:1614:select_server_supported_programs] > 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), > Version (330) > > [2013-12-03 05:42:32.790840] I > [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: > Connected to 172.16.95.153:24009, attached to remote volume > '/mnt/cloudbrick'. > > [2013-12-03 05:42:32.790884] I > [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: > Server and Client lk-version numbers are not same, reopening the fds > > [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] > 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back > up; going online. > > [2013-12-03 05:42:32.791161] I > [client-handshake.c:453:client_set_lk_version_cbk] > 0-glustervol-client-1: Server lk version = 1 > > [2013-12-03 05:42:32.795103] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.798064] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.799278] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.800636] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.802223] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.803339] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.804308] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.804877] I > [client-handshake.c:1614:select_server_supported_programs] > 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), > Version (330) > > [2013-12-03 05:42:32.807517] I > [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: > Connected to 172.16.107.154:24009, attached to remote volume > '/mnt/cloudbrick'. > > [2013-12-03 05:42:32.807562] I > [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: > Server and Client lk-version numbers are not same, reopening the fds > > [2013-12-03 05:42:32.810357] I > [client-handshake.c:453:client_set_lk_version_cbk] > 0-glustervol-client-0: Server lk version = 1 > > [2013-12-03 05:42:32.827437] E > [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] > 0-glustervol-replicate-0: Unable to self-heal contents of > '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). > Please delete the file from all but the preferred subvolume.That file is at $brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403 Try picking one to remove like it says.> > [2013-12-03 05:42:39.205157] E > [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] > 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of > '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). > Please fix the file on all backend volumes > > [2013-12-03 05:42:39.215793] E > [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] > 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of > '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). > Please fix the file on all backend volumes > >If that doesn't allow it to heal, you may need to find which filename that's hardlinked to. ls -li the gfid file at the path I demonstrated earlier. With that inode number in hand, find $brick -inum $inode_number Once you know which filenames it's linked with, remove all linked copies from all but one replica. Then the self-heal can continue successfully. ------------------------------ Message: 34 Date: Tue, 10 Dec 2013 09:30:22 +0100 From: Johan Huysmans <johan.huysmans at inuits.be> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: [Gluster-users] Structure needs cleaning on some files Message-ID: <52A6D11E.4030406 at inuits.be> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ------------------------------ Message: 35 Date: Tue, 10 Dec 2013 08:56:56 +0000 From: "Bernhard Glomm" <bernhard.glomm at ecologic.eu> To: vbellur at redhat.com, mrcuongnv at gmail.com Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] replace-brick failing - transport.address-family not specified Message-ID: <03a55549428f5909f0b3db1dee93d8c55e3ba3c3 at ecologic.eu> Content-Type: text/plain; charset="utf-8" Am 10.12.2013 06:39:47, schrieb Vijay Bellur:> On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote: > > Thanks for sharing. > > > > Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.xin> > production. > >> This is quite contrary to what we have seen in the community. From a > development perspective too, we feel much better about 3.4.1. Are there > specific instances that worked well with 3.2.x which does not work fine > for you in 3.4.x?987555 -?is that fixed in 3.5?Or did it even make it into 3.4.2couldn't find a note on that.Show stopper for moving from?3.2.x to anywhere for me! cheersb?> > Cheers, > Vijay > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinn?tzige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic? is a Trade Mark (TM) of Ecologic Institut gemeinn?tzige GmbH -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/475454d4/attachment-0001.html>------------------------------ Message: 36 Date: Tue, 10 Dec 2013 10:02:14 +0100 From: Johan Huysmans <johan.huysmans at inuits.be> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Structure needs cleaning on some files Message-ID: <52A6D896.1020404 at inuits.be> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote:> Hi All, > > When reading some files we get this error: > md5sum: /path/to/file.xml: Structure needs cleaning > > in /var/log/glusterfs/mnt-sharedfs.log we see these errors: > [2013-12-10 08:07:32.256910] W > [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: > remote operation failed: No such file or directory > [2013-12-10 08:07:32.257436] W > [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: > remote operation failed: No such file or directory > [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] > 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure > needs cleaning) > > We are using gluster 3.4.1-3 on CentOS6. > Our servers are 64-bit, our clients 32-bit (we are already using > --enable-ino32 on the mountpoint) > > This is my gluster configuration: > Volume Name: testvolume > Type: Replicate > Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: SRV-1:/gluster/brick1 > Brick2: SRV-2:/gluster/brick2 > Options Reconfigured: > performance.force-readdirp: on > performance.stat-prefetch: off > network.ping-timeout: 5 > > And this is how the applications work: > We have 2 client nodes who both have a fuse.glusterfs mountpoint. > On 1 client node we have a application which writes files. > On the other client node we have a application which reads these files. > On the node where the files are written we don't see any problem, and > can read that file without problems. > On the other node we have problems (error messages above) reading that > file. > The problem occurs when we perform a md5sum on the exact file, when > perform a md5sum on all files in that directory there is no problem. > > > How can we solve this problem as this is annoying. > The problem occurs after some time (can be days), an umount and mount > of the mountpoint solves it for some days. > Once it occurs (and we don't remount) it occurs every time. > > > I hope someone can help me with this problems. > > Thanks, > Johan Huysmans > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- A non-text attachment was scrubbed... Name: gluster_debug.log Type: text/x-log Size: 16600 bytes Desc: not available URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/bdf626dc/attachment-0001.bin>------------------------------ Message: 37 Date: Tue, 10 Dec 2013 10:08:43 +0100 From: Heiko Kr?mer <hkraemer at anynines.com> To: gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <52A6DA1B.3030209 at anynines.com> Content-Type: text/plain; charset="iso-8859-1" -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi guys, thanks for all these reports. Well, I think I'll change my Raid level to 6 and let the Raid controller build and rebuild all Raid members and replicate again with glusterFS. I get more capacity but I need to check if the write throughput acceptable. I think, I can't take advantage of using glusterFS with a lot of Bricks because I've found more cons as pros in my case. @Ben thx for this very detailed document! Cheers and Thanks Heiko On 10.12.2013 00:38, Dan Mons wrote:> On 10 December 2013 08:09, Joe Julian <joe at julianfamily.org> > wrote: >> Replicas are defined in the order bricks are listed in the volume >> create command. So gluster volume create myvol replica 2 >> server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 >> server4:/data/brick1 will replicate between server1 and server2 >> and replicate between server3 and server4. >> >> Bricks added to a replica 2 volume after it's been created will >> require pairs of bricks, >> >> The best way to "force" replication to happen on another server >> is to just define it that way. > > Yup, that's understood. The problem is when (for argument's sake) > : > > * We've defined 4 hosts with 10 disks each * Each individual disk > is a brick * Replication is defined correctly when creating the > volume initially * I'm on holidays, my employer buys a single node, > configures it brick-per-disk, and the IT junior adds it to the > cluster > > All good up until that final point, and then I've got that fifth > node at the end replicating to itself. Node goes down some months > later, chaos ensues. > > Not a GlusterFS/technology problem, but a problem with what > frequently happens at a human level. As a sysadmin, these are also > things I need to work around, even if it means deviating from best > practices. :) > > -Dan _______________________________________________ Gluster-users > mailing list Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >- -- Anynines.com Avarteq GmbH B.Sc. Informatik Heiko Kr?mer CIO Twitter: @anynines - ---- Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 Sitz: Saarbr?cken -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSptoTAAoJELxFogM4ixOFJTsIAJBWed3AGiiI+PDC2ubfboKc UPkMc+zuirRh2+QJBAoZ4CsAv9eIZ5NowclSSby9PTq2XRjjLvMdKuI+IbXCRT4j AbMLYfP3g4Q+agXnY6N6WJ6ZIqXQ8pbCK3shYp9nBfVYkiDUT1bGk0WcgQmEWTCw ta1h17LYkworIDRtqWQAl4jr4JR4P3x4cmwOZiHCVCtlyOP02x/fN4dji6nyOtuB kQPBVsND5guQNU8Blg5cQoES5nthtuwJdkWXB+neaCZd/u3sexVSNe5m15iWbyYg mAoVvlBJ473IKATlxM5nVqcUhmjFwNcc8MMwczXxTkwniYzth53BSoltPn7kIx4=epys -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: hkraemer.vcf Type: text/x-vcard Size: 277 bytes Desc: not available URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/f663943d/attachment-0001.vcf>------------------------------ Message: 38 Date: Tue, 10 Dec 2013 10:42:43 +0100 From: Johan Huysmans <johan.huysmans at inuits.be> To: gluster-users at gluster.org, bill.mair at web.de Subject: Re: [Gluster-users] Errors from PHP stat() on files and directories in a glusterfs mount Message-ID: <52A6E213.3000109 at inuits.be> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Hi, It seems I have a related problem (just posted this on the mailing list). Do you already have a solution for this problem? gr. Johan Huysmans On 05-12-13 20:05, Bill Mair wrote:> Hi, > > I'm trying to use glusterfs to mirror the ownCloud "data" area between > 2 servers. > > They are using debian jessie due to some dependancies that I have for > other components. > > This is where my issue rears it's ugly head. This is failing because I > can't stat the files and directories on my glusterfs mount. > > /var/www/owncloud/data is where I am mounting the volume and I can > reproduce the error using a simple php test application, so I don't > think that it is apache or owncloud related. > > I'd be grateful for any pointers on how to resolve this problem. > > Thanks, > > Bill > > Attached is "simple.php" test and the results of executing "strace > php5 simple.php" twice, once with the glusterfs mounted > (simple.php.strace-glusterfs) and once against the file system when > unmounted (simple.php.strace-unmounted). > > ------------------------------------------------------------------------ > > Here is what I get in the gluster log when I run the test (as root): > > /var/log/glusterfs/var-www-owncloud-data.log > > [2013-12-05 18:33:50.802250] D > [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0: > returning as transport is already disconnected OR there are no frames > (0 || 0) > [2013-12-05 18:33:50.825132] D > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] > 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] > [2013-12-05 18:33:50.825322] D > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] > 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] > [2013-12-05 18:33:50.825393] D > [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0: > Number of sources: 0 > [2013-12-05 18:33:50.825456] D > [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] > 0-gv-ocdata-replicate-0: returning read_child: 0 > [2013-12-05 18:33:50.825511] D > [afr-common.c:1380:afr_lookup_select_read_child] > 0-gv-ocdata-replicate-0: Source selected as 0 for / > [2013-12-05 18:33:50.825579] D > [afr-common.c:1117:afr_lookup_build_response_params] > 0-gv-ocdata-replicate-0: Building lookup response from 0 > [2013-12-05 18:33:50.827069] D > [afr-common.c:131:afr_lookup_xattr_req_prepare] > 0-gv-ocdata-replicate-0: /check.txt: failed to get the gfid from dict > [2013-12-05 18:33:50.829409] D > [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0: > returning as transport is already disconnected OR there are no frames > (0 || 0) > [2013-12-05 18:33:50.836719] D > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] > 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] > [2013-12-05 18:33:50.836870] D > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] > 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] > [2013-12-05 18:33:50.836941] D > [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0: > Number of sources: 0 > [2013-12-05 18:33:50.837002] D > [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] > 0-gv-ocdata-replicate-0: returning read_child: 0 > [2013-12-05 18:33:50.837058] D > [afr-common.c:1380:afr_lookup_select_read_child] > 0-gv-ocdata-replicate-0: Source selected as 0 for /check.txt > [2013-12-05 18:33:50.837129] D > [afr-common.c:1117:afr_lookup_build_response_params] > 0-gv-ocdata-replicate-0: Building lookup response from 0 > > Other bits of information > > root at bbb-1:/var/www/owncloud# uname -a > Linux bbb-1 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l > GNU/Linux > > root at bbb-1:/var/www/owncloud# dpkg -l glusterfs-* > Desired=Unknown/Install/Remove/Purge/Hold > | >Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) > ||/ Name Version Architecture Description >+++-============================================-===========================-===========================-=============================================================================================> ii glusterfs-client 3.4.1-1 armhf clustered> file-system (client package) > ii glusterfs-common 3.4.1-1 armhf GlusterFS > common libraries and translator modules > ii glusterfs-server 3.4.1-1 armhf clustered > file-system (server package) > > mount > > bbb-1:gv-ocdata on /var/www/owncloud/data type fuse.glusterfs >(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)> > /etc/fstab > > UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /sdhc ext4 defaults 0 0 > bbb-1:gv-ocdata /var/www/owncloud/data glusterfs > defaults,_netdev,log-level=DEBUG 0 0 > > ls -al on the various paths > > root at bbb-1:/var/log/glusterfs# ll -d /sdhc/ > drwxrwxr-x 7 root root 4096 Nov 28 19:15 /sdhc/ > > root at bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/ > drwxrwx--- 5 www-data www-data 4096 Dec 5 00:50 /sdhc/gv-ocdata/ > > root at bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/check.txt > -rw-r--r-- 2 root root 10 Dec 5 00:50 /sdhc/gv-ocdata/check.txt > > root at bbb-1:/var/www/owncloud# ll -d /var/www/owncloud/data/ > drwxrwx--- 5 www-data www-data 4096 Dec 5 00:50 /var/www/owncloud/data/ > > root at bbb-1:/var/www/owncloud# ll -d /var/www/owncloud/data/check.txt > -rw-r--r-- 1 root root 10 Dec 5 00:50 /var/www/owncloud/data/check.txt > > file & dir attr information: > > root at bbb-1:/var/www/owncloud# attr -l /var/www/owncloud/data > Attribute "glusterfs.volume-id" has a 16 byte value for > /var/www/owncloud/data > > root at bbb-1:/var/www/owncloud# attr -l /var/www/owncloud/data/check.txt > root at bbb-1:/var/www/owncloud# > > root at bbb-1:/var/www/owncloud# attr -l /sdhc/gv-ocdata/ > Attribute "glusterfs.volume-id" has a 16 byte value for /sdhc/gv-ocdata/ > Attribute "gfid" has a 16 byte value for /sdhc/gv-ocdata/ > Attribute "glusterfs.dht" has a 16 byte value for /sdhc/gv-ocdata/ > Attribute "afr.gv-ocdata-client-0" has a 12 byte value for > /sdhc/gv-ocdata/ > Attribute "afr.gv-ocdata-client-1" has a 12 byte value for > /sdhc/gv-ocdata/ > > root at bbb-1:/var/www/owncloud# attr -l /sdhc/gv-ocdata/check.txt > Attribute "gfid" has a 16 byte value for /sdhc/gv-ocdata/check.txt > Attribute "afr.gv-ocdata-client-0" has a 12 byte value for > /sdhc/gv-ocdata/check.txt > Attribute "afr.gv-ocdata-client-1" has a 12 byte value for > /sdhc/gv-ocdata/check.txt > root at bbb-1:/var/www/owncloud# > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d77e25bb/attachment-0001.html>------------------------------ Message: 39 Date: Tue, 10 Dec 2013 21:03:36 +1100 From: Andrew Lau <andrew at andrewklau.com> To: Ben Turner <bturner at redhat.com> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster infrastructure question Message-ID: <CAD7dF9c3uexEG++1YEHwh3zw7a1Xy+=Co_xO+zrDrggDuV2DJQ at mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi Ben, For glusterfs would you recommend the enterprise-storage or throughput-performance tuned profile? Thanks, Andrew On Tue, Dec 10, 2013 at 6:31 AM, Ben Turner <bturner at redhat.com> wrote:> ----- Original Message ----- > > From: "Ben Turner" <bturner at redhat.com> > > To: "Heiko Kr?mer" <hkraemer at anynines.de> > > Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org> > > Sent: Monday, December 9, 2013 2:26:45 PM > > Subject: Re: [Gluster-users] Gluster infrastructure question > > > > ----- Original Message ----- > > > From: "Heiko Kr?mer" <hkraemer at anynines.de> > > > To: "gluster-users at gluster.org List" <gluster-users at gluster.org> > > > Sent: Monday, December 9, 2013 8:18:28 AM > > > Subject: [Gluster-users] Gluster infrastructure question > > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > > Hash: SHA1 > > > > > > Heyho guys, > > > > > > I'm running since years glusterfs in a small environment without big > > > problems. > > > > > > Now I'm going to use glusterFS for a bigger cluster but I've some > > > questions :) > > > > > > Environment: > > > * 4 Servers > > > * 20 x 2TB HDD, each > > > * Raidcontroller > > > * Raid 10 > > > * 4x bricks => Replicated, Distributed volume > > > * Gluster 3.4 > > > > > > 1) > > > I'm asking me, if I can delete the raid10 on each server and create > > > for each HDD a separate brick. > > > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Isthere> > > any experience about the write throughput in a production systemwith> > > many of bricks like in this case? In addition i'll get double of HDD > > > capacity. > > > > Have a look at: > > > >http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf> > That one was from 2012, here is the latest: > > >http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf> > -b > > > Specifically: > > > > ? RAID arrays > > ? More RAID LUNs for better concurrency > > ? For RAID6, 256-KB stripe size > > > > I use a single RAID 6 that is divided into several LUNs for my bricks. > For > > example, on my Dell servers(with PERC6 RAID controllers) each serverhas> 12 > > disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and > > create a new PV/VG/LV for each brick. From there I follow the > > recommendations listed in the presentation. > > > > HTH! > > > > -b > > > > > 2) > > > I've heard a talk about glusterFS and out scaling. The main pointwas> > > if more bricks are in use, the scale out process will take a long > > > time. The problem was/is the Hash-Algo. So I'm asking me how is itif> > > I've one very big brick (Raid10 20TB on each server) or I've muchmore> > > bricks, what's faster and is there any issues? > > > Is there any experiences ? > > > > > > 3) > > > Failover of a HDD is for a raid controller with HotSpare HDD not abig> > > deal. Glusterfs will rebuild automatically if a brick fails andthere> > > are no data present, this action will perform a lot of networktraffic> > > between the mirror bricks but it will handle it equal as the raid > > > controller right ? > > > > > > > > > > > > Thanks and cheers > > > Heiko > > > > > > > > > > > > - -- > > > Anynines.com > > > > > > Avarteq GmbH > > > B.Sc. Informatik > > > Heiko Kr?mer > > > CIO > > > Twitter: @anynines > > > > > > - ---- > > > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer > > > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168 > > > Sitz: Saarbr?cken > > > -----BEGIN PGP SIGNATURE----- > > > Version: GnuPG v1.4.14 (GNU/Linux) > > > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > > > > > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B > > > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 > > > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ > > > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs > > > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 > > > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY> > > =bDly > > > -----END PGP SIGNATURE----- > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/b19779ff/attachment-0001.html>------------------------------ Message: 40 Date: Tue, 10 Dec 2013 15:34:56 +0530 From: Vijay Bellur <vbellur at redhat.com> To: Bernhard Glomm <bernhard.glomm at ecologic.eu>, mrcuongnv at gmail.com Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] replace-brick failing - transport.address-family not specified Message-ID: <52A6E748.5070300 at redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed On 12/10/2013 02:26 PM, Bernhard Glomm wrote:> Am 10.12.2013 06:39:47, schrieb Vijay Bellur: > > On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote: > > Thanks for sharing. > > Btw, I do believe that GlusterFS 3.2.x is much more stable than > 3.4.x in > production. > > > This is quite contrary to what we have seen in the community. From a > development perspective too, we feel much better about 3.4.1. Arethere> specific instances that worked well with 3.2.x which does not workfine> for you in 3.4.x? > > > 987555 - is that fixed in 3.5? > > Or did it even make it into 3.4.2 > > couldn't find a note on that. >Yes, this will be part of 3.4.2. Note that the original problem was due to libvirt being rigid about the ports that it needs to use for migrations. AFAIK this has been addressed in upstream libvirt as well. Through this bug fix, glusterfs provides a mechanism where it can use a separate range of ports for bricks. This configuration can be enabled to work with other applications that do not adhere with guidelines laid out by IANA. Cheers, Vijay ------------------------------ Message: 41 Date: Tue, 10 Dec 2013 15:38:16 +0530 From: Vijay Bellur <vbellur at redhat.com> To: Alexandru Coseru <alex.coseru at simplus.ro>, gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster - replica - Unable to self-heal contents of '/' (possible split-brain) Message-ID: <52A6E810.9050900 at redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed On 12/09/2013 07:21 PM, Alexandru Coseru wrote:> > [2013-12-09 13:20:52.066978] E > [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] > 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible > split-brain). Please delete the file from all but the preferred > subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] > > [2013-12-09 13:20:52.067386] E > [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] > 0-stor1-replicate-0: background meta-data self-heal failed on / > > [2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk] > 0-nfs: error=Input/output error > > [2013-12-09 13:20:53.092039] E > [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] > 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible > split-brain). Please delete the file from all but the preferred > subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] > > [2013-12-09 13:20:53.092497] E > [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] > 0-stor1-replicate-0: background meta-data self-heal failed on / > > [2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk] > 0-nfs: error=Input/output error > > What I?m doing wrong ?Looks like there is a metadata split-brain on /. The split-brain resolution document at [1] can possibly be of help here. -Vijay [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md> > PS: Volume stor_fast works like a charm. >Good to know, thanks! ------------------------------ Message: 42 Date: Tue, 10 Dec 2013 11:59:44 +0100 From: "Mariusz Sobisiak" <MSobisiak at ydp.pl> To: <gluster-users at gluster.org> Subject: [Gluster-users] Error after crash of Virtual Machine during migration Message-ID: <507D8C234E515F4F969362F9666D7EBBE875D1 at nagato1.intranet.ydp> Content-Type: text/plain; charset="us-ascii" Greetings, Legend: storage-gfs-3-prd - the first gluster. storage-1-saas - new gluster where "the first gluster" had to be migrated. storage-gfs-4-prd - the second gluster (which had to be migrated later). I've started command replace-brick: 'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared start' During that Virtual Machine (Xen) has crashed. Now I can't abort migration and continue it again. When I try: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort' The command lasts about 5 minutes then finishes with no results. Apart from that Gluster after that command starts behave very strange. For example I can't do '# gluster volume heal sa_bookshelf info' because it lasts about 5 minutes and returns black screen (the same like abort). Then I restart Gluster server and Gluster returns to normal work except the replace-brick commands. When I do: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status' I get: Number of files migrated = 0 Current fileI can do 'volume heal info' commands etc. until I call the command: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'. # gluster --version glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. Brick (/ydp/shared) logs (repeats the same constantly): [2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options # gluster volume info Volume Name: sa_bookshelf Type: Distributed-Replicate Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: storage-gfs-3-prd:/ydp/shared Brick2: storage-gfs-4-prd:/ydp/shared Brick3: storage-gfs-3-prd:/ydp/shared2 Brick4: storage-gfs-4-prd:/ydp/shared2 # gluster volume status Status of volume: sa_bookshelf Gluster process Port Online Pid ------------------------------------------------------------------------ ------ Brick storage-gfs-3-prd:/ydp/shared 24009 Y 758 Brick storage-gfs-4-prd:/ydp/shared 24009 Y 730 Brick storage-gfs-3-prd:/ydp/shared2 24010 Y 764 Brick storage-gfs-4-prd:/ydp/shared2 24010 Y 4578 NFS Server on localhost 38467 Y 770 Self-heal Daemon on localhost N/A Y 776 NFS Server on storage-1-saas 38467 Y 840 Self-heal Daemon on storage-1-saas N/A Y 846 NFS Server on storage-gfs-4-prd 38467 Y 4584 Self-heal Daemon on storage-gfs-4-prd N/A Y 4590 storage-gfs-3-prd:~# gluster peer status Number of Peers: 2 Hostname: storage-1-saas Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07 State: Peer in Cluster (Connected) Hostname: storage-gfs-4-prd Uuid: 4c384f45-873b-4c12-9683-903059132c56 State: Peer in Cluster (Connected) (from storage-1-saas)# gluster peer status Number of Peers: 2 Hostname: 172.16.3.60 Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884 State: Peer in Cluster (Connected) Hostname: storage-gfs-4-prd Uuid: 4c384f45-873b-4c12-9683-903059132c56 State: Peer in Cluster (Connected) Clients work properly. I googled for that but I found that was a bug but in 3.3.0 version. How can I repair that and continue my migration? Thank You for any help. BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - Replace Crashed Server how to. Regards, Mariusz ------------------------------ Message: 43 Date: Tue, 10 Dec 2013 12:52:29 +0100 From: Johan Huysmans <johan.huysmans at inuits.be> To: "gluster-users at gluster.org" <gluster-users at gluster.org> Subject: Re: [Gluster-users] Structure needs cleaning on some files Message-ID: <52A7007D.6020005 at inuits.be> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote:> I could reproduce this problem with while my mount point is running in > debug mode. > logfile is attached. > > gr. > Johan Huysmans > > On 10-12-13 09:30, Johan Huysmans wrote: >> Hi All, >> >> When reading some files we get this error: >> md5sum: /path/to/file.xml: Structure needs cleaning >> >> in /var/log/glusterfs/mnt-sharedfs.log we see these errors: >> [2013-12-10 08:07:32.256910] W >> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: >> remote operation failed: No such file or directory >> [2013-12-10 08:07:32.257436] W >> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: >> remote operation failed: No such file or directory >> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] >> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure >> needs cleaning) >> >> We are using gluster 3.4.1-3 on CentOS6. >> Our servers are 64-bit, our clients 32-bit (we are already using >> --enable-ino32 on the mountpoint) >> >> This is my gluster configuration: >> Volume Name: testvolume >> Type: Replicate >> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: SRV-1:/gluster/brick1 >> Brick2: SRV-2:/gluster/brick2 >> Options Reconfigured: >> performance.force-readdirp: on >> performance.stat-prefetch: off >> network.ping-timeout: 5 >> >> And this is how the applications work: >> We have 2 client nodes who both have a fuse.glusterfs mountpoint. >> On 1 client node we have a application which writes files. >> On the other client node we have a application which reads these files. >> On the node where the files are written we don't see any problem, and >> can read that file without problems. >> On the other node we have problems (error messages above) reading >> that file. >> The problem occurs when we perform a md5sum on the exact file, when >> perform a md5sum on all files in that directory there is no problem. >> >> >> How can we solve this problem as this is annoying. >> The problem occurs after some time (can be days), an umount and mount >> of the mountpoint solves it for some days. >> Once it occurs (and we don't remount) it occurs every time. >> >> >> I hope someone can help me with this problems. >> >> Thanks, >> Johan Huysmans >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: < http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/32f9069c/attachment-0001.html>------------------------------ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users End of Gluster-users Digest, Vol 68, Issue 11 ********************************************* ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. ** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d921f3e9/attachment.html>