Hi all We have a 20-node, 1pb Gluster deployment that is running 3.6.3 - the same version we installed on day 1. There are obviously numerous performance and feature improvements that we'd like to take advantage of. However, this is a production system and we don't have a replica of it that we can test the upgrade on. We're running CentOS 6.6 with official Gluster binaries. We rely on Gluster's NFS daemon, and also use samba-glusterfs with samba for SMB access to our Gluster volume. What risks might we face with an upgrade from 3.6 to 3.10/3.11? And what rollback options do we have? More importantly, is there anyone who would be willing to work for a retainer plus worked hours to be "on call" in case we have problems during the upgrade? Someone with plenty of experience in Gluster over the years and could diagnose any issues we may experience in an upgrade. If you're interested, please e-mail me off-list. I'm, of course, interested in advice on-list as well. Thanks Brett. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170808/692fae0a/attachment.html>
I had a mixed experience going from 3.6.6 to 3.10.2 on a two server setup. I have since upgraded to 3.10.3 but I still have a bad problem with specific files (see CONS below). PROS - Back on a "supported" version. - Windows roaming profiles (small file performance) improved significantly via samba. This may be due to new tuning options added (see my tuning options for the volume below): Volume Name: export Type: Replicate Volume ID: ---snip--- Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.0.1.7:/bricks/hdds/brick Brick2: 10.0.1.6:/bricks/hdds/brick Options Reconfigured: performance.stat-prefetch: on performance.cache-min-file-size: 0 network.inode-lru-limit: 65536 performance.cache-invalidation: on features.cache-invalidation: on performance.md-cache-timeout: 600 features.cache-invalidation-timeout: 600 performance.cache-samba-metadata: on transport.address-family: inet server.allow-insecure: on performance.cache-size: 10GB cluster.server-quorum-type: server nfs.disable: on performance.io-thread-count: 64 performance.io-cache: on cluster.lookup-optimize: on cluster.readdir-optimize: on server.event-threads: 5 client.event-threads: 5 performance.cache-max-file-size: 256MB diagnostics.client-log-level: INFO diagnostics.brick-log-level: INFO cluster.server-quorum-ratio: 51% CONS - New problems came up with specific files (Autodesk Revit files) for which no solution has been found, other than stop using samba vfs gluster plugin and also doing some stupid file renaming game. See: http://lists.gluster.org/pipermail/gluster-users/2017-June/031377.html - With 3.6.6 I had a nightly rsync process that would copy all the data from the gluster server pair to another server (nightly backup). This operation used to finish between 1-2AM every day. After upgrade, this operation is much slower with rsync finishing up between 3-5AM. - I have not looked a lot into it, but after 40-ish days after the upgrade, the gluster mount in one server became stuck and I had to reboot the servers. As for recommendations, definitively do *not* go with 3.11 as that is *not* a long term release. Stay with 3.10. https://www.gluster.org/community/release-schedule/ Make sure you have the 3.6.3 rpms available to downgrade if needed. You can always go back to the previous rpms if you have them available (this is not easy if you have a mix with other distros, i.e ubuntu, where the ppa only have the latest .deb file for each minor version). You must schedule downtime and bring the whole gluster down for the upgrade. Upgrade all servers, then clients then test, test, test and test more (I did not notice my Revit file problem until users brought it to my attention). If things are going well in your testing, then you should do the op version upgrade, but not before committing to staying with 3.10. It is truth you can lower the op version later manually, but then you have to manually edit several files on each server, so I say, stay with the *older* op version until you are sure you want to stay on 3.10 then upgrade the op version. https://gluster.readthedocs.io/en/latest/Upgrade-Guide/op_version/ Prior to any changes, backup all your gluster server configuration folders ( /var/lib/glusterd/ ) in every single server. That will allow you to go back to the moment before upgrade if really needed. HTH, Diego On Tue, Aug 8, 2017 at 6:51 AM, Brett Randall <brett.randall at gmail.com> wrote:> Hi all > > We have a 20-node, 1pb Gluster deployment that is running 3.6.3 - the same > version we installed on day 1. There are obviously numerous performance and > feature improvements that we'd like to take advantage of. However, this is a > production system and we don't have a replica of it that we can test the > upgrade on. > > We're running CentOS 6.6 with official Gluster binaries. We rely on > Gluster's NFS daemon, and also use samba-glusterfs with samba for SMB access > to our Gluster volume. > > What risks might we face with an upgrade from 3.6 to 3.10/3.11? And what > rollback options do we have? > > More importantly, is there anyone who would be willing to work for a > retainer plus worked hours to be "on call" in case we have problems during > the upgrade? Someone with plenty of experience in Gluster over the years and > could diagnose any issues we may experience in an upgrade. If you're > interested, please e-mail me off-list. I'm, of course, interested in advice > on-list as well. > > Thanks > > Brett. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Thanks Diego. This is invaluable information, appreciate it immensely. I had heard previously that you can always go back to previous Gluster binaries, but without understanding the data structures behind Gluster, I had no idea how safe that was. Backing up the lib folder makes perfect sense. The performance issues we're specifically keen to address are the small-file performance improvements introduced in 3.7. I feel that a lot of the complaints we get are from people using apps that are {slowly} crawling massively deep folders via SMB. I'm hoping that the improvements made in 3.7 have stayed intact in 3.10! Otherwise, is there a generally accepted "fast and stable" version earlier than 3.10 that we should be looking at as an interim step? Brett ________________________________ From: Diego Remolina <dijuremo at gmail.com> Sent: Tuesday, August 8, 2017 10:39:27 PM To: Brett Randall Cc: gluster-users at gluster.org List Subject: Re: [Gluster-users] Upgrading from 3.6.3 to 3.10/3.11 I had a mixed experience going from 3.6.6 to 3.10.2 on a two server setup. I have since upgraded to 3.10.3 but I still have a bad problem with specific files (see CONS below). PROS - Back on a "supported" version. - Windows roaming profiles (small file performance) improved significantly via samba. This may be due to new tuning options added (see my tuning options for the volume below): Volume Name: export Type: Replicate Volume ID: ---snip--- Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.0.1.7:/bricks/hdds/brick Brick2: 10.0.1.6:/bricks/hdds/brick Options Reconfigured: performance.stat-prefetch: on performance.cache-min-file-size: 0 network.inode-lru-limit: 65536 performance.cache-invalidation: on features.cache-invalidation: on performance.md-cache-timeout: 600 features.cache-invalidation-timeout: 600 performance.cache-samba-metadata: on transport.address-family: inet server.allow-insecure: on performance.cache-size: 10GB cluster.server-quorum-type: server nfs.disable: on performance.io-thread-count: 64 performance.io-cache: on cluster.lookup-optimize: on cluster.readdir-optimize: on server.event-threads: 5 client.event-threads: 5 performance.cache-max-file-size: 256MB diagnostics.client-log-level: INFO diagnostics.brick-log-level: INFO cluster.server-quorum-ratio: 51% CONS - New problems came up with specific files (Autodesk Revit files) for which no solution has been found, other than stop using samba vfs gluster plugin and also doing some stupid file renaming game. See: http://lists.gluster.org/pipermail/gluster-users/2017-June/031377.html - With 3.6.6 I had a nightly rsync process that would copy all the data from the gluster server pair to another server (nightly backup). This operation used to finish between 1-2AM every day. After upgrade, this operation is much slower with rsync finishing up between 3-5AM. - I have not looked a lot into it, but after 40-ish days after the upgrade, the gluster mount in one server became stuck and I had to reboot the servers. As for recommendations, definitively do *not* go with 3.11 as that is *not* a long term release. Stay with 3.10. https://www.gluster.org/community/release-schedule/ Make sure you have the 3.6.3 rpms available to downgrade if needed. You can always go back to the previous rpms if you have them available (this is not easy if you have a mix with other distros, i.e ubuntu, where the ppa only have the latest .deb file for each minor version). You must schedule downtime and bring the whole gluster down for the upgrade. Upgrade all servers, then clients then test, test, test and test more (I did not notice my Revit file problem until users brought it to my attention). If things are going well in your testing, then you should do the op version upgrade, but not before committing to staying with 3.10. It is truth you can lower the op version later manually, but then you have to manually edit several files on each server, so I say, stay with the *older* op version until you are sure you want to stay on 3.10 then upgrade the op version. https://gluster.readthedocs.io/en/latest/Upgrade-Guide/op_version/ Prior to any changes, backup all your gluster server configuration folders ( /var/lib/glusterd/ ) in every single server. That will allow you to go back to the moment before upgrade if really needed. HTH, Diego On Tue, Aug 8, 2017 at 6:51 AM, Brett Randall <brett.randall at gmail.com> wrote:> Hi all > > We have a 20-node, 1pb Gluster deployment that is running 3.6.3 - the same > version we installed on day 1. There are obviously numerous performance and > feature improvements that we'd like to take advantage of. However, this is a > production system and we don't have a replica of it that we can test the > upgrade on. > > We're running CentOS 6.6 with official Gluster binaries. We rely on > Gluster's NFS daemon, and also use samba-glusterfs with samba for SMB access > to our Gluster volume. > > What risks might we face with an upgrade from 3.6 to 3.10/3.11? And what > rollback options do we have? > > More importantly, is there anyone who would be willing to work for a > retainer plus worked hours to be "on call" in case we have problems during > the upgrade? Someone with plenty of experience in Gluster over the years and > could diagnose any issues we may experience in an upgrade. If you're > interested, please e-mail me off-list. I'm, of course, interested in advice > on-list as well. > > Thanks > > Brett. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170808/cbea64d5/attachment.html>
Apparently Analagous Threads
- Upgrading from 3.6.3 to 3.10/3.11
- Upgrading from 3.6.3 to 3.10/3.11
- SMB copies failing with GlusterFS 3.10
- URGENT: Update issues from 3.6.6 to 3.10.2 Accessing files via samba come up with permission denied
- URGENT: Update issues from 3.6.6 to 3.10.2 Accessing files via samba come up with permission denied