Hi all, just wanted to mention that since I had sole use of our cluster over the holidays and a complete set of backups :) I decided to test some alternate cluster software and do some stress testing. Stress testing involved multiple soft and *hard* resets of individual servers and hard simultaneous resets of the entire cluster, where a hard reset is equivalent to a power outage. Gluster (3.8.7) coped perfectly - no data loss, no maintenance required, each time it came up by itself with no hand holding and started healing nodes, which completed very quickly. VM's on gluster auto started with no problems, i/o load while healing was ok. I felt quite confident in it. The alternate cluster fs - not so good. Many times running VM's were corrupted, several times I lost the entire filesystem. Also IOPS where atrocious (fuse based). It easy to claim HA when you exclude such things as power supply failures, dodgy network switches etc. I think glusters active/active quorum based design, where is every node is a master is a winner, active/passive systems where you have a SPOF master are difficult to DR manage. However :) Things I'd really like to see in Gluster: - More flexible/easier management of servers and bricks (add/remove/replace) - More flexible replication rules One of the things I really *really* like with LizardFS is the powerful goal system and chunkservers. Nodes and disks can be trivially easily added/removed on the fly and chunks will be shuffled, replicated or deleted to balance the system. Individual objects can have difference goals (replication levels) which can also be changed on the fly and the system will rebalance them. Objects can even be changed from/to simple replication to Erasure Encoded objects. I doubt this could be fitted to the existing gluster, but is there potential for this sort of thing in Gluster 4.0? I read the design docs and they look ambitious. Cheers, -- Lindsay Mathieson
> Gluster (3.8.7) coped perfectly - no data loss, no maintenance required, > each time it came up by itself with no hand holding and started healing > nodes, which completed very quickly. VM's on gluster auto started with > no problems, i/o load while healing was ok. I felt quite confident in it.Glad to hear that part went well.> The alternate cluster fs - not so good. Many times running VM's were > corrupted, several times I lost the entire filesystem. Also IOPS where > atrocious (fuse based). It easy to claim HA when you exclude such things > as power supply failures, dodgy network switches etc.Too true. Unfortunately, I think just about every distributed storage system has to go through this learning curve, from not handling failure at all to handling the simplest/easiest cases to handling the weird stuff that real deployments can throw at you. It's not just about the actual failure handling, either. Sometimes, it's about things you do in the main I/O path, such as not throwing away O_SYNC flags to claim better performance. From the information you've provided, I'll bet that's where your data corruption came from.> I think glusters active/active quorum based design, where is every node > is a master is a winner, active/passive systems where you have a SPOF > master are difficult to DR manage.Active/passive designs create a very tough set of tradeoffs. Detecting and responding to failures quickly enough, while also avoiding false alarms, is like balancing on a knife edge. Then there's problems with overload turning into failure, with failback, etc. It can all be done right and work well, but it's *really* hard. While I guess it's better than nothing, experience has shown that active/active designs are easier to make robust, and the techniques for doing so have been well known for at least a decade or so.> However :) Things I'd really like to see in Gluster: > > - More flexible/easier management of servers and bricks (add/remove/replace) > > - More flexible replication rules > > One of the things I really *really* like with LizardFS is the powerful > goal system and chunkservers. Nodes and disks can be trivially easily > added/removed on the fly and chunks will be shuffled, replicated or > deleted to balance the system. Individual objects can have difference > goals (replication levels) which can also be changed on the fly and the > system will rebalance them. Objects can even be changed from/to simple > replication to Erasure Encoded objects. > > I doubt this could be fitted to the existing gluster, but is there > potential for this sort of thing in Gluster 4.0? I read the design docs > and they look ambitious.There used to be an idea called "data classification" to cover this kind of case. You're right that setting arbitrary goals for arbitrary objects would be too difficult. However, we could have multiple pools with different replication/EC strategies, then use a translator like the one for tiering to control which objects go into which pools based on some kind of policy. To support that with a relatively small number of nodes/bricks we'd also need to be able to split bricks into smaller units, but that's not really all that hard. Unfortunately, although many of these ideas have been around for at least a year and a half, nobody has ever been freed up to work on them. Maybe, with all of the interest in multi-tenancy to support containers and hyperconvergence and whatever else, we might finally be able to get these under way.
Have you done comparisons against Lustre? From what I've seen Lustre performance is 2x faster than a replicated gluster volume. On 1/4/17 5:43 PM, Lindsay Mathieson wrote:> Hi all, just wanted to mention that since I had sole use of our > cluster over the holidays and a complete set of backups :) I decided > to test some alternate cluster software and do some stress testing. > > > Stress testing involved multiple soft and *hard* resets of individual > servers and hard simultaneous resets of the entire cluster, where a > hard reset is equivalent to a power outage. > > > Gluster (3.8.7) coped perfectly - no data loss, no maintenance > required, each time it came up by itself with no hand holding and > started healing nodes, which completed very quickly. VM's on gluster > auto started with no problems, i/o load while healing was ok. I felt > quite confident in it. > > > The alternate cluster fs - not so good. Many times running VM's were > corrupted, several times I lost the entire filesystem. Also IOPS where > atrocious (fuse based). It easy to claim HA when you exclude such > things as power supply failures, dodgy network switches etc. > > > I think glusters active/active quorum based design, where is every > node is a master is a winner, active/passive systems where you have a > SPOF master are difficult to DR manage. > > > However :) Things I'd really like to see in Gluster: > > - More flexible/easier management of servers and bricks > (add/remove/replace) > > - More flexible replication rules > > One of the things I really *really* like with LizardFS is the powerful > goal system and chunkservers. Nodes and disks can be trivially easily > added/removed on the fly and chunks will be shuffled, replicated or > deleted to balance the system. Individual objects can have difference > goals (replication levels) which can also be changed on the fly and > the system will rebalance them. Objects can even be changed from/to > simple replication to Erasure Encoded objects. > > > I doubt this could be fitted to the existing gluster, but is there > potential for this sort of thing in Gluster 4.0? I read the design > docs and they look ambitious. > > > Cheers, > >