Martin Toth
2017-Sep-22 12:22 UTC
[Gluster-users] Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
Hi, thanks for suggesions. Yes "gluster peer probe node3? will be first command in order to discover 3rd node by Gluster. I am running on latest 3.7.x - there is 3.7.6-1ubuntu1 installed and latest 3.7.x according https://packages.ubuntu.com/xenial/glusterfs-server <https://packages.ubuntu.com/xenial/glusterfs-server> is 3.7.6-1ubuntu1, so this should be OK.> If you are *not* on the latest 3.7.x, you are unlikely to be able to goDo you mean latest package from Ubuntu repository or latest package from Gluster PPA (3.7.20-ubuntu1~xenial1). Currently I am using Ubuntu repository package, but want to use PPA for upgrade because Ubuntu has old packages of Gluster in repo. I do not use sharding because all bricks has same size, so it will not speedup healing of VMs images in case of heal operation. Volume is 3TB, how long does it take to heal on 2x1gbit (linux bond) connection, can you approximate ? I want to turn every VM off because its required for upgrading gluster procedure, thats why I want to add 3rd brick (3rd replica) at this time (after upgrade when VMs will be offline). Martin> On 22 Sep 2017, at 12:20, Diego Remolina <dijuremo at gmail.com> wrote: > > Procedure looks good. > > Remember to back up Gluster config files before update: > > /etc/glusterfs > /var/lib/glusterd > > If you are *not* on the latest 3.7.x, you are unlikely to be able to go back to it because PPA only keeps the latest version of each major branch, so keep that in mind. With Ubuntu, every time you update, make sure to download and keep a manual copy of the .Deb files. Otherwise you will have to compile the packages yourself in the event you wanted to go back. > > Might need before adding 3rd replica: > gluster peer probe node3 > > When you add the 3rd replica, it should start healing, and there may be an issue there if the VMs are running. Your plan to not have VMs up is good here. Are you using sharding? If you are not sharding, I/O in running VMs may be stopped for too long while a large image is healed. If you were already using sharding you should be able to add the 3rd replica when VMs are running without much issue. > > Once healing is completed and if you are satisfied with 3.12, then remember to bump op version of Gluster. > > Diego > > > On Sep 20, 2017 19:32, "Martin Toth" <snowmailer at gmail.com <mailto:snowmailer at gmail.com>> wrote: > Hello all fellow GlusterFriends, > > I would like you to comment / correct my upgrade procedure steps on replica 2 volume of 3.7.x gluster. > Than I would like to change replica 2 to replica 3 in order to correct quorum issue that Infrastructure currently has. > > Infrastructure setup: > - all clients running on same nodes as servers (FUSE mounts) > - under gluster there is ZFS pool running as raidz2 with SSD ZLOG/ZIL cache > - all two hypervisor running as GlusterFS nodes and also Qemu compute nodes (Ubuntu 16.04 LTS) > - we are running Qemu VMs that accesses VMs disks via gfapi (Opennebula) > - we currently run : 1x2 , Type: Replicate volume > > Current Versions : > glusterfs-* [package] 3.7.6-1ubuntu1 > qemu-* [package] 2.5+dfsg-5ubuntu10.2glusterfs3.7.14xenial1 > > What we need : (New versions) > - upgrade GlusterFS to 3.12 LTM version (Ubuntu 16.06 LTS packages are EOL - see https://www.gluster.org/community/release-schedule/ <https://www.gluster.org/community/release-schedule/>) > - I want to use https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12 <https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12> as package repository for 3.12 > - upgrade Qemu (with build-in support for libgfapi) - https://launchpad.net/~monotek/+archive/ubuntu/qemu-glusterfs-3.12 <https://launchpad.net/~monotek/+archive/ubuntu/qemu-glusterfs-3.12> > - (sadly Ubuntu has packages build without libgfapi support) > - add third node to replica setup of volume (this is probably most dangerous operation) > > Backup Phase > - backup "NFS storage? - raw DATA that runs on VMs > - stop all running VMs > - backup all running VMs (Qcow2 images) outside of gluster > > Upgrading Gluster Phase > - killall glusterfs glusterfsd glusterd (on every server) > (this should stop all gluster services - server and client as it runs on same nodes) > - install new Gluster Server and Client packages from repository mentioned upper (on every server) > - install new Monotek's qemu glusterfs package with gfapi enabled support (on every server) > - /etc/init.d/glusterfs-server start (on every server) > - /etc/init.d/glusterfs-server status - verify that all runs ok (on every server) > - check : > - gluster volume info > - gluster volume status > - check gluster FUSE clients, if mounts working as expected > - test if various VMs are able tu boot and run as expected (if libgfapi works in Qemu) > - reboot all nodes - do system upgrade of packages > - test and check again > > Adding third node to replica 2 setup (replica 2 => replica 3) > (volumes will be mounted and up after upgrade and we tested VMs are able to be served with libgfapi = upgrade of gluster sucessfuly completed) > (next we extend replica 2 to replica 3 while volumes are mounted but no data is touched = no running VMs, only glusterfs servers and clients on nodes) > - issue command : gluster volume add-brick volume replica 3 node3.san:/tank/gluster/brick1 (on new single node - node3) > so we change : > Bricks: > Brick1: node1.san:/tank/gluster/brick1 > Brick2: node2.san:/tank/gluster/brick1 > to : > Bricks: > Brick1: node1.san:/tank/gluster/brick1 > Brick2: node2.san:/tank/gluster/brick1 > Brick3: node3.san:/tank/gluster/brick1 > - check gluster status > - (is rebalance / heal required here ?) > - start all VMs and start celebration :) > > My Questions > - is heal and rebalance necessary in order to upgrade replica 2 to replica 3 ? > - is this upgrade procedure OK ? What more/else should I do in order to do this upgrade correctly ? > > Many thanks to all for support. Hope my little preparation howto will help others to solve same situation. > > Best Regards, > Martin > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170922/c18ee477/attachment.html>
Diego Remolina
2017-Sep-22 12:50 UTC
[Gluster-users] Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
Hi Martin,> Do you mean latest package from Ubuntu repository or latest package from > Gluster PPA (3.7.20-ubuntu1~xenial1). > Currently I am using Ubuntu repository package, but want to use PPA for > upgrade because Ubuntu has old packages of Gluster in repo.When you switch to PPA, make sure to download and keep a copy of each set of gluster deb packages, otherwise if you ever want to back out an upgrade to an older release, you will have to download the source deb file and build it yourself, because PPAs only keep the latest version for binaries.> > I do not use sharding because all bricks has same size, so it will not > speedup healing of VMs images in case of heal operation. Volume is 3TB, how > long does it take to heal on 2x1gbit (linux bond) connection, can you > approximate ?Sharding is not so much about brick size. Sharding is about preventing a whole large VM file being locked when it is being healed. Also minimizes the amount of data copied because gluster only heals smaller pieces versus a whole VM image. Say your 100GB IMG needs to be healed, the file is locked while it gets copied from one server to the other and the running VM may not be able to use it while the heal is going, so your VM may in fact stop working or have I/O errors. With sharding, VMs are cut into, well, shards, largest shard is 512MB, then the heal process only locks the shards being healed. So gluster only heals the shards that changed which are much smaller and faster to copy, and do not need to lock the whole 100GB IMG file which takes longer to copy, just the shard being healed. Do note that if you had never used sharding, if you turn it on it will *not* convert your older files. Also you should *never* turn on sharding and then back off, as that will result in corrupted VM image files. Once it is on, if you want to turn it off, stop your VMs, then move all VM IMG files elsewhere, turn off sharding and then copy the files back to the volume after disabling sharding. As for speed, I really cannot tell you as it depends on the disks, netowr, etc. For example, I have a two node setup plus an arbiter (2 nodes with bricks, one is just the arbiter to keep quorum if one of the brick servers goes down). I recently replaced the HDDs in one machine as the drives hit the 5 year age mark. So I took the 12 drives out, added 24 drives to the machine (we had unused slots), reconfigured raid 6 and left it initializing in the background and started the heal of 13.1TB of data. My servers are connected via 10Gbit (I am not seeing reads/writes over 112MB/s) and this process started last Monday at 7;20PM and it is not done yet. It is missing healing about 40GB still. Now my servers are used as a file server, which means lots of small files which take longer to heal. I would think your VM images will heal much faster.> I want to turn every VM off because its required for upgrading gluster > procedure, thats why I want to add 3rd brick (3rd replica) at this time > (after upgrade when VMs will be offline). >You could even attempt an online upgrade if you try to add the new node/brick running 3.12 to the mix before upgrading from 3.7.x on the other nodes. However, I am not sure how that is going to work. With such a difference in versions, it may not work well. If you can afford the downtime to upgrade, that will be the safest option. Diego
Martin Toth
2017-Oct-01 15:53 UTC
[Gluster-users] Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
Hi Diego, I?ve tried to upgrade and then extend gluster with 3rd node in virtualbox test environment and all went without problems. Sharding will not help me at this time so I will consider upgrading 1G to 10G before this procedure in production. That should lower downtime - healing time of VM image files on Gluster. I hope healing will take as short as possible on 10G. Additional info for Gluster/Qemu Users: - Ubuntu does not have Qemu compiled with libgfapi support so I?ve created PPA for that : https://launchpad.net/~snowmanko/+archive/ubuntu/qemu-glusterfs-3.12 <https://launchpad.net/~snowmanko/+archive/ubuntu/qemu-glusterfs-3.12> (I will try to make this repo up to date) - it?s tested against glusterfs3.12.1 version (libgfapi works as expected with this repo) - Moreover related to this problem - there is MIR - https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1274247 <https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1274247> - it?s now accepted, I am really excited to see libgfapi compiled by default in Ubuntu Qemu packages in near future Thanks for support. BR, Martin> On 22 Sep 2017, at 14:50, Diego Remolina <dijuremo at gmail.com> wrote: > > Hi Martin, > >> Do you mean latest package from Ubuntu repository or latest package from >> Gluster PPA (3.7.20-ubuntu1~xenial1). >> Currently I am using Ubuntu repository package, but want to use PPA for >> upgrade because Ubuntu has old packages of Gluster in repo. > > When you switch to PPA, make sure to download and keep a copy of each > set of gluster deb packages, otherwise if you ever want to back out an > upgrade to an older release, you will have to download the source deb > file and build it yourself, because PPAs only keep the latest version > for binaries. > >> >> I do not use sharding because all bricks has same size, so it will not >> speedup healing of VMs images in case of heal operation. Volume is 3TB, how >> long does it take to heal on 2x1gbit (linux bond) connection, can you >> approximate ? > > Sharding is not so much about brick size. Sharding is about preventing > a whole large VM file being locked when it is being healed. Also > minimizes the amount of data copied because gluster only heals smaller > pieces versus a whole VM image. > > Say your 100GB IMG needs to be healed, the file is locked while it > gets copied from one server to the other and the running VM may not be > able to use it while the heal is going, so your VM may in fact stop > working or have I/O errors. With sharding, VMs are cut into, well, > shards, largest shard is 512MB, then the heal process only locks the > shards being healed. So gluster only heals the shards that changed > which are much smaller and faster to copy, and do not need to lock the > whole 100GB IMG file which takes longer to copy, just the shard being > healed. Do note that if you had never used sharding, if you turn it on > it will *not* convert your older files. Also you should *never* turn > on sharding and then back off, as that will result in corrupted VM > image files. Once it is on, if you want to turn it off, stop your VMs, > then move all VM IMG files elsewhere, turn off sharding and then copy > the files back to the volume after disabling sharding. > > As for speed, I really cannot tell you as it depends on the disks, > netowr, etc. For example, I have a two node setup plus an arbiter (2 > nodes with bricks, one is just the arbiter to keep quorum if one of > the brick servers goes down). I recently replaced the HDDs in one > machine as the drives hit the 5 year age mark. So I took the 12 drives > out, added 24 drives to the machine (we had unused slots), > reconfigured raid 6 and left it initializing in the background and > started the heal of 13.1TB of data. My servers are connected via > 10Gbit (I am not seeing reads/writes over 112MB/s) and this process > started last Monday at 7;20PM and it is not done yet. It is missing > healing about 40GB still. Now my servers are used as a file server, > which means lots of small files which take longer to heal. I would > think your VM images will heal much faster. > >> I want to turn every VM off because its required for upgrading gluster >> procedure, thats why I want to add 3rd brick (3rd replica) at this time >> (after upgrade when VMs will be offline). >> > > You could even attempt an online upgrade if you try to add the new > node/brick running 3.12 to the mix before upgrading from 3.7.x on the > other nodes. However, I am not sure how that is going to work. With > such a difference in versions, it may not work well. > > If you can afford the downtime to upgrade, that will be the safest option. > > Diego-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171001/5648bfe4/attachment.html>
Apparently Analagous Threads
- Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
- Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
- Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
- Upgrade Gluster 3.7 to 3.12 and add 3rd replica [howto/help]
- How are bricks healed in Debian Jessie 3.11