Carlos Capriotti
2014-Mar-13 11:10 UTC
[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community
Hello, all. I am a little bit impressed by the lack of action on this topic. I hate to be "that guy", specially being new here, but it has to be done. If I've got this right, we have here a chance of developing Gluster even further, sponsored by Google, with a dedicated programmer for the summer. In other words, if we play our cards right, we can get a free programmer and at least a good start/advance on this fantastic. Well, I've checked the trello board, and there is a fair amount of things there. There are a couple of things that are not there as well. I think it would be nice to listen to the COMMUNITY (yes, that means YOU), for either suggestions, or at least a vote. My opinion, being also my vote, in order of PERSONAL preference: 1) There is a project going on (https://forge.gluster.org/disperse), that consists on re-writing the stripe module on gluster. This is specially important because it has a HUGE impact on Total Cost of Implementation (customer side), Total Cost of Ownership, and also matching what the competition has to offer. Among other things, it would allow gluster to implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and would, as far as I understand, allow you to use 3 nodes as a minimum stripe+replication. This means 25% less money in computer hardware, with increased data safety/resilience. 2) We have a recurring issue with split-brain solution. There is an entry on trello asking/suggesting a mechanism that arbitrates this resolution automatically. I pretty much think this could come together with another solution that is file replication consistency check. 3) Accelerator node project. Some storage solutions out there offer an "accelerator node", which is, in short, a, extra node with a lot of RAM, eventually fast disks (SSD), and that works like a proxy to the regular volumes. active chunks of files are moved there, logs (ZIL style) are recorded on fast media, among other things. There is NO active project for this, or trello entry, because it is something I started discussing with a few fellows just a couple of days ago. I thought of starting to play with RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do something more efficient, or at the very least start it, why not ? Now, c'mon ! Time is running out. We need hands on deck here, for a simple vote ! Can you share 3 lines with your thoughts ? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/309de272/attachment.html>
Bernhard Glomm
2014-Mar-13 11:29 UTC
[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community
thnx Carlos for your ambition Carlos I'm not much of a developer but for what it's worth here my thoughts: Your #1) sounds great, but does that mean object store or will it still be whole files that are handled? up to now I loved the feeling of having at least my files on the bricks if something went really wrong and not ending up with a huge number of xMB sized snippets splattered around (well I know pros and cons? and depending on scenario and size, but still though...) Your #2) could be incorporated in #1) somehow? "minimum 3 bricks, 1 per node" with kind of quorum mechanism??? As #3) I would vote for reliable encryption to be able to use third party storage,? just speed??? is affected by to many other things, bandwith, speed of underlaing storage, different speed of different bricks... so speed I would vote down on rank #4) ;-) Bernhard Am 13.03.2014 12:10:17, schrieb Carlos Capriotti:>> Hello, all.> > I am a little bit impressed by the lack of action on this topic. I hate to be "that guy", specially being new here, but it has to be done.?> > If I've got this right, we have here a chance of developing Gluster even further, sponsored by Google, with a dedicated programmer for the summer.?> > In other words, if we play our cards right, we can get a free programmer and at least a good start/advance on this fantastic.> > Well, I've checked the trello board, and there is a fair amount of things there.> > There are a couple of things that are not there as well.> > I think it would be nice to listen to the COMMUNITY (yes, that means YOU), for either suggestions, or at least a vote.> > My opinion, being also my vote, in order of PERSONAL preference:> > 1) There is a project going on (> https://forge.gluster.org/disperse> ), that consists on re-writing the stripe module on gluster. This is specially important because it has?a HUGE impact on Total Cost of Implementation (customer side), Total Cost of Ownership, and also matching what the competition has to offer. Among other things, it would allow gluster to implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and would, as far as I understand, allow you to use 3 nodes as a minimum stripe+replication. This means 25% less money in computer hardware, with increased data safety/resilience.>> 2) We have a recurring issue with split-brain solution. There is an entry on trello asking/suggesting a mechanism that arbitrates this resolution automatically. I pretty much think this could come together with another solution that is file replication consistency check.?>> 3) Accelerator node project. Some storage solutions out there offer an "accelerator node", which is, in short, a, extra node with a lot of RAM, eventually fast disks (SSD), and that works like a proxy to the regular volumes. active chunks of files are moved there, logs (ZIL style) are recorded on fast media, among other things. There is NO active project for this, or trello entry, because it is something I started discussing with a few fellows just a couple of days ago. I thought of starting to play with RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do something more efficient, or at the very least start it, why not ?>> Now, c'mon ! Time is running out. We need hands on deck here, for a simple vote !?> > Can you share 3 lines with your thoughts ?> > Thanks > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/2411ce02/attachment.html>
Jeff Darcy
2014-Mar-13 13:01 UTC
[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community
> I am a little bit impressed by the lack of action on this topic. I hate to be > "that guy", specially being new here, but it has to be done. > If I've got this right, we have here a chance of developing Gluster even > further, sponsored by Google, with a dedicated programmer for the summer. > In other words, if we play our cards right, we can get a free programmer and > at least a good start/advance on this fantastic.Welcome, Carlos. I think it's great that you're taking initiative here. However, it's also important to set proper expectations for what a GSoC intern could reasonably be expected to achieve. I've seen some amazing stuff out of GSoC, but if we set the bar too high then we end up with incomplete code and the student doesn't learn much except frustration. GlusterFS consists of 430K lines of code in the core project alone. Most of it's written in a style that is generally hard for newcomers to pick up - both callback-oriented and highly concurrent, often using our own "unique" interpretation of standard concepts. It's also in an area (storage) that is not well taught in most universities. Given those facts and the short duration of GSoC, it's important to focus on projects that don't require deep knowledge of existing code, to keep the learning curve short and productive time correspondingly high. With that in mind, let's look at some of your suggestions.> I think it would be nice to listen to the COMMUNITY (yes, that means YOU), > for either suggestions, or at least a vote.It certainly would have been nice to have you at the community IRC meeting yesterday, at which we discussed release content for 3.6 based on the feature proposals here: http://www.gluster.org/community/documentation/index.php/Planning36 The results are here: http://titanpad.com/glusterfs-3-6-planning> My opinion, being also my vote, in order of PERSONAL preference: > 1) There is a project going on ( https://forge.gluster.org/disperse ), that > consists on re-writing the stripe module on gluster. This is specially > important because it has a HUGE impact on Total Cost of Implementation > (customer side), Total Cost of Ownership, and also matching what the > competition has to offer. Among other things, it would allow gluster to > implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and > would, as far as I understand, allow you to use 3 nodes as a minimum > stripe+replication. This means 25% less money in computer hardware, with > increased data safety/resilience.This was decided as a core feature for 3.6. I'll let Xavier (the feature owner) answer w.r.t. whether there's any part of it that would be appropriate for GSoC.> 2) We have a recurring issue with split-brain solution. There is an entry on > trello asking/suggesting a mechanism that arbitrates this resolution > automatically. I pretty much think this could come together with another > solution that is file replication consistency check.This is also core for 3.6 under the name "policy based split brain resolution": http://www.gluster.org/community/documentation/index.php/Features/pbspbr Implementing this feature requires significant knowledge of AFR, which both causes split brain and would be involved in its repair. Because it's also one of our most complicated components, and the person who just rewrote it won't be around to offer help, I don't think this project *as a whole* would be a good fit for GSoC. On the other hand, there might be specific pieces of the policy implementation (not execution) that would be a good fit.> 3) Accelerator node project. Some storage solutions out there offer an > "accelerator node", which is, in short, a, extra node with a lot of RAM, > eventually fast disks (SSD), and that works like a proxy to the regular > volumes. active chunks of files are moved there, logs (ZIL style) are > recorded on fast media, among other things. There is NO active project for > this, or trello entry, because it is something I started discussing with a > few fellows just a couple of days ago. I thought of starting to play with > RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do > something more efficient, or at the very least start it, why not ?Looks like somebody has read the Isilon marketing materials. ;) A full production-level implementation of this, with cache consistency and so on, would be a major project. However, a non-consistent prototype good for specific use cases - especially Hadoop, as Jay mentions - would be pretty easy to build. Having a GlusterFS server (for the real clients) also be a GlusterFS client (to the real cluster) is pretty straightforward. Testing performance would also be a significant component of this, and IMO that's something more developers should learn about early in their careers. I encourage you to keep thinking about how this could be turned into a real GSoC proposal. Keep the ideas coming!
Ted Miller
2014-Mar-13 22:38 UTC
[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community
> [snip] > > 2) We have a recurring issue with split-brain solution. There is an entry > on trello asking/suggesting a mechanism that arbitrates this resolution > automatically. I pretty much think this could come together with another > solution that is file replication consistency check. >Anything to improve split-brain resolution would get my vote. Ted Miller Elkhart, IN -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/1d180dc8/attachment.html>
André Bauer
2014-Mar-17 16:39 UTC
[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community
Hi, i vote for 3, 2, 1. But i dont like the idea to have an extra node for 3, which means bandwidth/speed of the whole cluster is limited to the interface of the cache node (like in ceph). I had some similar whish in mind, but wanted to have a ssd cache in front of a brick. I know this means you need 4 SSDs on a 4 node cluster but imho its better than one caching node which is limitng the cluster. Mit freundlichen Gr??en Andr? Bauer MAGIX Software GmbH Andr? Bauer Administrator Postfach 200914 01194 Dresden Tel. Support Deutschland: 0900/1771115 (1,24 Euro/Min.) Tel. Support ?sterreich: 0900/454571 (1,56 Euro/Min.) Tel. Support Schweiz: 0900/454571 (1,50 CHF/Min.) Email: mailto:abauer at magix.net Web: http://www.magix.com Gesch?ftsf?hrer | Managing Directors: Dr. Arnd Schr?der, Erhard Rein, Michael Keith, Tilman Herberger Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205 The information in this email is intended only for the addressee named above. Access to this email by anyone else is unauthorized. If you are not the intended recipient of this message any disclosure, copying, distribution or any action taken in reliance on it is prohibited and may be unlawful. MAGIX does not warrant that any attachments are free from viruses or other defects and accepts no liability for any losses resulting from infected email transmissions. Please note that any views expressed in this email may be those of the originator and do not necessarily represent the agenda of the company. Am 13.03.2014 12:10, schrieb Carlos Capriotti:> Hello, all. > > I am a little bit impressed by the lack of action on this topic. I hate to > be "that guy", specially being new here, but it has to be done. > > If I've got this right, we have here a chance of developing Gluster even > further, sponsored by Google, with a dedicated programmer for the summer. > > In other words, if we play our cards right, we can get a free programmer > and at least a good start/advance on this fantastic. > > Well, I've checked the trello board, and there is a fair amount of things > there. > > There are a couple of things that are not there as well. > > I think it would be nice to listen to the COMMUNITY (yes, that means YOU), > for either suggestions, or at least a vote. > > My opinion, being also my vote, in order of PERSONAL preference: > > 1) There is a project going on (https://forge.gluster.org/disperse), that > consists on re-writing the stripe module on gluster. This is specially > important because it has a HUGE impact on Total Cost of Implementation > (customer side), Total Cost of Ownership, and also matching what the > competition has to offer. Among other things, it would allow gluster to > implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and > would, as far as I understand, allow you to use 3 nodes as a minimum > stripe+replication. This means 25% less money in computer hardware, with > increased data safety/resilience. > > 2) We have a recurring issue with split-brain solution. There is an entry > on trello asking/suggesting a mechanism that arbitrates this resolution > automatically. I pretty much think this could come together with another > solution that is file replication consistency check. > > 3) Accelerator node project. Some storage solutions out there offer an > "accelerator node", which is, in short, a, extra node with a lot of RAM, > eventually fast disks (SSD), and that works like a proxy to the regular > volumes. active chunks of files are moved there, logs (ZIL style) are > recorded on fast media, among other things. There is NO active project for > this, or trello entry, because it is something I started discussing with a > few fellows just a couple of days ago. I thought of starting to play with > RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do > something more efficient, or at the very least start it, why not ? > > Now, c'mon ! Time is running out. We need hands on deck here, for a simple > vote ! > > Can you share 3 lines with your thoughts ? > > Thanks > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >