On Fri, Mar 29, 2019, 10:03 PM Jim Kinney <jim.kinney at gmail.com> wrote:> Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain and > out of sync, need heal files. > > We need to migrate the three replica servers to gluster v. 5 or 6. Also > will need to upgrade about 80 clients as well. Given that a complete > removal of gluster will not touch the 200+TB of data on 12 volumes, we are > looking at doing that process, Stop all clients, stop all glusterd > services, remove all of it, install new version, setup new volumes from old > bricks, install new clients, mount everything. > > We would like to get some better performance from nfs-ganesha mounts but > that doesn't look like an option (not done any parameter tweaks in testing > yet). At a bare minimum, we would like to minimize the total downtime of > all systems. > > Does this process make more sense than a version upgrade process to 4.1, > then 5, then 6? What "gotcha's" do I need to be ready for? I have until > late May to prep and test on old, slow hardware with a small amount of > files and volumes. >You can directly upgrade from 3.12 to 6.x. I would suggest that rather than deleting and creating Gluster volume. +Hari and +Sanju for further guidelines on upgrade, as they recently did upgrade tests. +Soumya to add to the nfs-ganesha aspect. Regards, Poornima> -- > > James P. Kinney III Every time you stop a school, you will have to build a > jail. What you gain at one end you lose at the other. It's like feeding a > dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark > Twain http://heretothereideas.blogspot.com/ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190329/bafc2cc7/attachment.html>
On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote:> > > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney <jim.kinney at gmail.com > <mailto:jim.kinney at gmail.com>> wrote: > > Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain > and out of sync, need heal files. > > We need to migrate the three replica servers to gluster v. 5 or 6. > Also will need to upgrade about 80 clients as well. Given that a > complete removal of gluster will not touch the 200+TB of data on 12 > volumes, we are looking at doing that process, Stop all clients, > stop all glusterd services, remove all of it, install new version, > setup new volumes from old bricks, install new clients, mount > everything. > > We would like to get some better performance from nfs-ganesha mounts > but that doesn't look like an option (not done any parameter tweaks > in testing yet). At a bare minimum, we would like to minimize the > total downtime of all systems.Could you please be more specific here? As in are you looking for better performance during upgrade process or in general? Compared to 3.12, there are lot of perf improvements done in both glusterfs and esp., nfs-ganesha (latest stable - V2.7.x) stack. If you could provide more information about your workloads (for eg., large-file,small-files, metadata-intensive) , we can make some recommendations wrt to configuration. Thanks, Soumya> > Does this process make more sense than a version upgrade process to > 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I > have until late May to prep and test on old, slow hardware with a > small amount of files and volumes. > > > You can directly upgrade from 3.12 to 6.x. I would suggest that rather > than deleting and creating Gluster volume. +Hari and +Sanju for further > guidelines on upgrade, as they recently did upgrade tests. +Soumya to > add to the nfs-ganesha aspect. > > Regards, > Poornima > > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. What you > gain at one end you lose at the other. It's like feeding a dog on his > own tail. It won't fatten the dog. > - Speech 11/23/1900 Mark Twain > > http://heretothereideas.blogspot.com/ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users >
Hi, As mentioned above you need not stop the whole cluster and then upgrade and restart the gluster processes. We did do the basic rolling upgrade test with replica volume. And things turned out fine. There was this minor issue: https://bugzilla.redhat.com/show_bug.cgi?id=1694010 To overcome this, you will have to check if your upgraded node is getting disconnect. if it does, then you will have to 1) stop glusterd service on all the nodes (only glusterd) 2) flush the iptables (iptables -F) 3) start glusterd If you are fine with stopping your service and upgrading all nodes at the same time, You can go ahead with that as well. On Sun, Mar 31, 2019 at 11:02 PM Soumya Koduri <skoduri at redhat.com> wrote:> > > > On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote: > > > > > > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney <jim.kinney at gmail.com > > <mailto:jim.kinney at gmail.com>> wrote: > > > > Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain > > and out of sync, need heal files. > > > > We need to migrate the three replica servers to gluster v. 5 or 6. > > Also will need to upgrade about 80 clients as well. Given that a > > complete removal of gluster will not touch the 200+TB of data on 12 > > volumes, we are looking at doing that process, Stop all clients, > > stop all glusterd services, remove all of it, install new version, > > setup new volumes from old bricks, install new clients, mount > > everything. > > > > We would like to get some better performance from nfs-ganesha mounts > > but that doesn't look like an option (not done any parameter tweaks > > in testing yet). At a bare minimum, we would like to minimize the > > total downtime of all systems. > > Could you please be more specific here? As in are you looking for better > performance during upgrade process or in general? Compared to 3.12, > there are lot of perf improvements done in both glusterfs and esp., > nfs-ganesha (latest stable - V2.7.x) stack. If you could provide more > information about your workloads (for eg., large-file,small-files, > metadata-intensive) , we can make some recommendations wrt to configuration. > > Thanks, > Soumya > > > > > Does this process make more sense than a version upgrade process to > > 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I > > have until late May to prep and test on old, slow hardware with a > > small amount of files and volumes. > > > > > > You can directly upgrade from 3.12 to 6.x. I would suggest that rather > > than deleting and creating Gluster volume. +Hari and +Sanju for further > > guidelines on upgrade, as they recently did upgrade tests. +Soumya to > > add to the nfs-ganesha aspect. > > > > Regards, > > Poornima > > > > -- > > > > James P. Kinney III > > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > > http://heretothereideas.blogspot.com/ > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > >-- Regards, Hari Gowtham.
On Sun, 2019-03-31 at 23:01 +0530, Soumya Koduri wrote:> On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote: > > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney <jim.kinney at gmail.com > > <mailto:jim.kinney at gmail.com>> wrote: > > Currently running 3.12 on Centos 7.6. Doing cleanups on split- > > brain and out of sync, need heal files. > > We need to migrate the three replica servers to gluster v. 5 or > > 6. Also will need to upgrade about 80 clients as well. Given > > that a complete removal of gluster will not touch the 200+TB of > > data on 12 volumes, we are looking at doing that process, Stop > > all clients, stop all glusterd services, remove all of it, > > install new version, setup new volumes from old bricks, install > > new clients, mount everything. > > We would like to get some better performance from nfs-ganesha > > mounts but that doesn't look like an option (not done any > > parameter tweaks in testing yet). At a bare minimum, we would > > like to minimize the total downtime of all systems. > > Could you please be more specific here? As in are you looking for > better performance during upgrade process or in general? Compared to > 3.12, there are lot of perf improvements done in both glusterfs and > esp., nfs-ganesha (latest stable - V2.7.x) stack. If you could > provide more information about your workloads (for eg., large- > file,small-files, metadata-intensive) , we can make some > recommendations wrt to configuration.Sure. More details: We are (soon to be) running a three-node replica only gluster service (2 nodes now, third is racked and ready for sync and being added to gluster cluster). Each node has 2 external drive arrays plus one internal. Each node has 40G IB plus 40G IP connections (plans to upgrade to 100G). We currently have 9 volumes and each is 7TB up to 50TB of space. Each volume is a mix of thousands of large (>1GB) and tens of thousands of small (~100KB) plus thousands inbetween. Currently we have a 13-node computational cluster with varying GPU abilities that mounts all of these volumes using gluster-fuse. Writes are slow and reads are also as if from a single server. I have data from a test setup (not anywhere near the capacity of the production system - just for testing commands and recoveries) that indicates raw NFS is much faster but no gluster, gluster-fuse is much slower. We have mmap issues with python and fuse-mounted locations. Converting to NFS solves this. We have tinkered with kernel settings to handle oom-killer so it will no longer drop glusterfs when an errant job eat all the ram (set oom_score_adj - -1000 for all glusterfs pids). We would like to transition (smoothly!!) to gluster 5 or 6 with nfs- ganesha 2.7 and see some performance improvements. We will be using corosync and pacemaker for NFS failover. It would be fantastic be able to saturate a 10G IPoIB (or 40G IB !) connection to each compute node in the current computational cluster. Right now we absolutely can't get much write speed ( copy a 6.2GB file from host to gluster storage took 1m 21s. cp from disk to /dev/null is 7s). cp from gluster to /dev/null is 1.0m (same 6.2GB file). That's a 10Gbps IPoIB connection at only 800Mbps. We would like to do things like enable SSL encryption of all data flows (we deal with PHI data in a HIPAA-regulated setting) but are concerned about performance. We are running dual Intel Xeon E5-2630L (12 physical cores each @ 2.4GHz) and 128GB RAM in each server node. We have 170 users. About 20 are active at any time. The current setting on /home (others are similar if not identical, maybe nfs-disable is true for others): gluster volume get home allOption Value ------ -- --- cluster.lookup- unhashed on cluste r.lookup- optimize off cluste r.min-free- disk 10% cluster. min-free- inodes 5% cluster. rebalance- stats off cluster.s ubvols-per- directory (null) cluster.rea ddir- optimize off cluster .rsync-hash- regex (null) cluster.ex tra-hash- regex (null) cluster.dh t-xattr- name trusted.glusterfs.dht cluster.r andomize-hash-range-by- gfid off cluster.rebal- throttle normal clust er.lock- migration off clus ter.local-volume- name (null) cluster.weig hted- rebalance on cluster. switch- pattern (null) cluste r.entry-change- log on cluster.read -subvolume (null) clu ster.read-subvolume-index - 1 cluster.read-hash- mode 1 cluster.b ackground-self-heal- count 8 cluster.metadata- self- heal on cluster.data- self- heal on cluster.e ntry-self- heal on cluster.se lf-heal- daemon enable cluster.h eal- timeout 600 clus ter.self-heal-window- size 1 cluster.data- change- log on cluster.met adata-change- log on cluster.data- self-heal- algorithm (null) cluster.eager- lock on dispe rse.eager- lock on cluste r.quorum- type none cluste r.quorum- count (null) cluste r.choose- local true cluste r.self-heal-readdir- size 1KB cluster.post-op- delay- secs 1 cluster.ensur e- durability on cluste r.consistent- metadata no cluster.he al-wait-queue- length 128 cluster.favorit e-child- policy none cluster.stripe -block- size 128KB cluster.stri pe- coalesce true diagno stics.latency- measurement off diagnostics .dump-fd- stats off diagnostics .count-fop- hits off diagnostics.b rick-log- level INFO diagnostics.c lient-log- level INFO diagnostics.br ick-sys-log- level CRITICAL diagnostics.clien t-sys-log- level CRITICAL diagnostics.brick- logger (null) diagnosti cs.client- logger (null) diagnostic s.brick-log- format (null) diagnostics.c lient-log- format (null) diagnostics.br ick-log-buf- size 5 diagnostics.clien t-log-buf- size 5 diagnostics.brick- log-flush- timeout 120 diagnostics.client- log-flush- timeout 120 diagnostics.stats- dump- interval 0 diagnostics.fo p-sample- interval 0 diagnostics.st ats-dump- format json diagnostics.fo p-sample-buf- size 65535 diagnostics.stats- dnscache-ttl- sec 86400 performance.cache-max- file- size 0 performance.cache- min-file- size 0 performance.cache- refresh- timeout 1 performance.cache -priority performa nce.cache- size 32MB performan ce.io-thread- count 16 performance.h igh-prio- threads 16 performance.n ormal-prio- threads 16 performance.low -prio- threads 16 performance. least-prio- threads 1 performance.en able-least- priority on performance.cach e- size 128MB performan ce.flush- behind on performan ce.nfs.flush- behind on performance.w rite-behind-window- size 1MB performance.resync- failed-syncs-after- fsyncoff performance.nfs.write- behind-window- size1MB performance.strict-o- direct off performance. nfs.strict-o- direct off performance.stri ct-write- ordering off performance.nfs. strict-write- ordering off performance.lazy- open yes performa nce.read-after- open no performance.re ad-ahead-page- count 4 performance.md- cache- timeout 1 performance. cache-swift- metadata true performance.cac he-samba- metadata false performance.cac he-capability- xattrs true performance.cache- ima- xattrs true features.encr yption off encr yption.master- key (null) encryptio n.data-key- size 256 encryption. block- size 4096 network. frame- timeout 1800 netwo rk.ping- timeout 42 netw ork.tcp-window- size (null) features.l ock- heal off featu res.grace- timeout 10 networ k.remote- dio disable client .event- threads 2 clie nt.tcp-user- timeout 0 client. keepalive- time 20 client.k eepalive- interval 2 client.k eepalive- count 9 network. tcp-window- size (null) network.in ode-lru- limit 16384 auth.allo w * auth.reject (null) transport.keepalive 1 server.allow- insecure (null) serv er.root- squash off ser ver.anonuid 65534 server.anongid 65534 server.statedump- path /var/run/gluster server.o utstanding-rpc- limit 64 features.lock- heal off featu res.grace- timeout 10 server .ssl (null) auth.ssl- allow * server.manage- gids off serve r.dynamic- auth on client .send- gids on ser ver.gid- timeout 300 se rver.own- thread (null) se rver.event- threads 1 serv er.tcp-user- timeout 0 server. keepalive- time 20 server.k eepalive- interval 2 server.k eepalive- count 9 transpor t.listen- backlog 10 ssl.own- cert (null) ssl.private- key (null) ssl .ca- list (null) ssl.crl- path (null) ssl.certificate- depth (null) ssl.cip her- list (null) ss l.dh- param (null) ssl.ec- curve (null) performance.write- behind on performan ce.read- ahead on performa nce.readdir- ahead off performance .io- cache on perfor mance.quick- read on performan ce.open- behind on performa nce.nl- cache off perfor mance.stat- prefetch on performa nce.client-io- threads off performance.n fs.write- behind on performance.n fs.read- ahead off performance. nfs.io- cache off performanc e.nfs.quick- read off performance.n fs.stat- prefetch off performance. nfs.io- threads off performanc e.force- readdirp true performan ce.cache- invalidation false features. uss off features.snapshot- directory .snaps features. show-snapshot- directory off network.compre ssion off netwo rk.compression.window-size - 15 network.compression.mem- level 8 network.compres sion.min- size 0 network.compres sion.compression-level - 1 network.compression.debug false features.limit- usage (null) featur es.default-soft- limit 80% features.soft -timeout 60 feat ures.hard- timeout 5 featu res.alert- time 86400 featur es.quota-deem- statfs off geo- replication.indexing off geo- replication.indexing off geo-replication.ignore-pid- check off geo- replication.ignore-pid- check off features.quota off features. inode- quota off featur es.bitrot disable debug.trace off debug.log- history no d ebug.log- file no d ebug.exclude- ops (null) debug .include- ops (null) debug .error- gen off deb ug.error- failure (null) deb ug.error- number (null) deb ug.random- failure off debu g.error- fops (null) nfs .enable- ino32 no nf s.mem- factor 15 nfs.export- dirs on nf s.export- volumes on nf s.addr- namelookup off nfs.dynamic- volumes off nfs .register-with- portmap on nfs.outst anding-rpc- limit 16 nfs.port 2049 nf s.rpc-auth- unix on nfs. rpc-auth- null on nfs. rpc-auth- allow all nfs. rpc-auth- reject none nfs. ports- insecure off n fs.trusted- sync off nfs .trusted- write off nfs .volume-access read- write nfs.export- dir nf s.disable off nfs.nlm on nfs.acl on nfs.mount- udp off n fs.mount- rmtab /var/lib/glusterd/nfs/rmtab n fs.rpc- statd /sbin/rpc.statd nfs.server-aux- gids off nfs.dr c off nfs.drc- size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write- size (1 * 1048576ULL) nfs.readdir- size (1 * 1048576ULL) nfs.rdirplus on nfs.exports-auth- enable (null) nfs.auth -refresh-interval- sec (null) nfs.auth-cache- ttl- sec (null) features.r ead- only off featu res.worm off features.worm-file- level off features.d efault-retention- period 120 features.retention -mode relax features. auto-commit- period 180 storage.linu x- aio off stora ge.batch-fsync-mode reverse- fsync storage.batch-fsync-delay- usec 0 storage.owner- uid - 1 storage.owner- gid - 1 storage.node-uuid- pathinfo off storage.h ealth-check- interval 30 storage.buil d- pgfid on stora ge.gfid2path on storage.gfid2path- separator : storage.b d- aio off cl uster.server-quorum- type off cluster.serve r-quorum- ratio 0 changelog.cha ngelog off chan gelog.changelog- dir (null) changelog.e ncoding ascii ch angelog.rollover- time 15 changelog. fsync- interval 5 changel og.changelog-barrier- timeout 120 changelog.capture- del- path off features.barr ier disable feat ures.barrier- timeout 120 features .trash off features.trash- dir .trashcan featur es.trash-eliminate- path (null) features.trash- max- filesize 5MB features.t rash-internal- op off cluster.enable- shared- storage disable cluster.write -freq- threshold 0 cluster.re ad-freq- threshold 0 cluster.t ier- pause off clus ter.tier-promote- frequency 120 cluster.tier -demote- frequency 3600 cluster.wat ermark- hi 90 cluster.w atermark- low 75 cluster.t ier- mode cache clus ter.tier-max-promote-file- size 0 cluster.tier-max- mb 4000 cluster. tier-max- files 10000 cluster. tier-query- limit 100 cluster.ti er- compact on clus ter.tier-hot-compact- frequency 604800 cluster.tier- cold-compact- frequency 604800 features.ctr- enabled off feat ures.record- counters off feature s.ctr-record-metadata- heat off features.ctr_link_co nsistency off features.ct r_lookupheal_link_timeout 300 fe atures.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db- cachesize 12500 features.ct r-sql-db-wal- autocheckpoint 25000 features.selinu x on locks. trace off locks.mandatory- locking off cluster .disperse-self-heal- daemon enable cluster.quorum- reads no client .bind- insecure (null) fea tures.shard off features.shard-block- size 64MB features.scr ub- throttle lazy featur es.scrub- freq biweekly featur es.scrub false features.expiry- time 120 feature s.cache- invalidation off featur es.cache-invalidation- timeout 60 features.leases off features.l ease-lock-recall- timeout 60 disperse.backgroun d- heals 8 disperse.he al-wait- qlength 128 cluster.he al- timeout 600 dht. force- readdirp on d isperse.read-policy gfid- hash cluster.shd-max- threads 1 cluster .shd-wait- qlength 1024 cluster. locking- scheme full cluster .granular-entry- heal no features.locks -revocation- secs 0 features.locks- revocation-clear- all false features.locks- revocation-max- blocked 0 features.locks- monkey- unlocking false disperse.shd- max- threads 1 disperse .shd-wait- qlength 1024 disperse. cpu- extensions auto disp erse.self-heal-window- size 1 cluster.use- compound- fops off performance. parallel- readdir off performance. rda-request- size 131072 performance.rda -low- wmark 4096 performance .rda-high- wmark 128KB performance. rda-cache- limit 10MB performance.n l-cache-positive- entry false performance.nl-cache- limit 10MB performance. nl-cache- timeout 60 cluster.bric k- multiplex off clust er.max-bricks-per- process 0 disperse.optim istic-change- log on cluster.halo- enabled False clus ter.halo-shd-max- latency 99999 cluster.halo -nfsd-max- latency 5 cluster.halo- max- latency 5 cluster. halo-max-replicas> Thanks,Soumya > > Does this process make more sense than a version upgrade > > process to 4.1, then 5, then 6? What "gotcha's" do I need to be > > ready for? I have until late May to prep and test on old, slow > > hardware with a small amount of files and volumes. > > > > You can directly upgrade from 3.12 to 6.x. I would suggest that > > rather than deleting and creating Gluster volume. +Hari and +Sanju > > for further guidelines on upgrade, as they recently did upgrade > > tests. +Soumya to add to the nfs-ganesha aspect. > > Regards,Poornima > > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. > > What you gain at one end you lose at the other. It's like > > feeding a dog on his own tail. It won't fatten the dog. - > > Speech 11/23/1900 Mark Twain > > http://heretothereideas.blogspot.com/ > > > > _______________________________________________ Gluster- > > users mailing list Gluster-users at gluster.org <mailto:Gluster- > > users at gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > >-- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190401/afd63681/attachment.html>