thr3ads.net - Gluster users - [Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Carlos Capriotti

2014-Mar-13 11:10 UTC

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

Hello, all.

I am a little bit impressed by the lack of action on this topic. I hate to
be "that guy", specially being new here, but it has to be done.

If I've got this right, we have here a chance of developing Gluster even
further, sponsored by Google, with a dedicated programmer for the summer.

In other words, if we play our cards right, we can get a free programmer
and at least a good start/advance on this fantastic.

Well, I've checked the trello board, and there is a fair amount of things
there.

There are a couple of things that are not there as well.

I think it would be nice to listen to the COMMUNITY (yes, that means YOU),
for either suggestions, or at least a vote.

My opinion, being also my vote, in order of PERSONAL preference:

1) There is a project going on (https://forge.gluster.org/disperse), that
consists on re-writing the stripe module on gluster. This is specially
important because it has a HUGE impact on Total Cost of Implementation
(customer side), Total Cost of Ownership, and also matching what the
competition has to offer. Among other things, it would allow gluster to
implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and
would, as far as I understand, allow you to use 3 nodes as a minimum
stripe+replication. This means 25% less money in computer hardware, with
increased data safety/resilience.

2) We have a recurring issue with split-brain solution. There is an entry
on trello asking/suggesting a mechanism that arbitrates this resolution
automatically. I pretty much think this could come together with another
solution that is file replication consistency check.

3) Accelerator node project. Some storage solutions out there offer an
"accelerator node", which is, in short, a, extra node with a lot of
RAM,
eventually fast disks (SSD), and that works like a proxy to the regular
volumes. active chunks of files are moved there, logs (ZIL style) are
recorded on fast media, among other things. There is NO active project for
this, or trello entry, because it is something I started discussing with a
few fellows just a couple of days ago. I thought of starting to play with
RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do
something more efficient, or at the very least start it, why not ?

Now, c'mon ! Time is running out. We need hands on deck here, for a simple
vote !

Can you share 3 lines with your thoughts ?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/309de272/attachment.html>

Bernhard Glomm

2014-Mar-13 11:29 UTC

head link

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

thnx Carlos for your ambition Carlos
I'm not much of a developer but for what it's worth here my thoughts:

Your #1) sounds great, but does that mean object store or will it still be whole
files that are handled?
up to now I loved the feeling of having at least my files on the bricks if
something went really wrong and
not ending up with a huge number of xMB sized snippets splattered around (well I
know pros and cons?
and depending on scenario and size, but still though...)

Your #2) could be incorporated in #1) somehow? "minimum 3 bricks, 1 per
node" with kind of quorum mechanism???

As #3) I would vote for reliable encryption to be able to use third party
storage,?
just speed??? is affected by to many other things, bandwith, speed of underlaing
storage, different speed of different bricks...
so speed I would vote down on rank #4) ;-)

Bernhard
Am 13.03.2014 12:10:17, schrieb Carlos Capriotti:> 





> Hello, all.> 
> I am a little bit impressed by the lack of action on this topic. I hate to
be "that guy", specially being new here, but it has to be done.?>
> If I've got this right, we have here a chance of developing Gluster
even further, sponsored by Google, with a dedicated programmer for the
summer.?>
> In other words, if we play our cards right, we can get a free programmer
and at least a good start/advance on this fantastic.>
> Well, I've checked the trello board, and there is a fair amount of
things there.>
> There are a couple of things that are not there as well.> 
> I think it would be nice to listen to the COMMUNITY (yes, that means YOU),
for either suggestions, or at least a vote.>
> My opinion, being also my vote, in order of PERSONAL preference:> 
> 1) There is a project going on (> https://forge.gluster.org/disperse>
), that consists on re-writing the stripe module on gluster. This is specially
important because it has?a HUGE impact on Total Cost of Implementation (customer
side), Total Cost of Ownership, and also matching what the competition has to
offer. Among other things, it would allow gluster to implement a RAIDZ/RAID5
type of fault tolerance, much more efficient, and would, as far as I understand,
allow you to use 3 nodes as a minimum stripe+replication. This means 25% less
money in computer hardware, with increased data safety/resilience.>
> 2) We have a recurring issue with split-brain solution. There is an entry
on trello asking/suggesting a mechanism that arbitrates this resolution
automatically. I pretty much think this could come together with another
solution that is file replication consistency check.?>
> 3) Accelerator node project. Some storage solutions out there offer an
"accelerator node", which is, in short, a, extra node with a lot of
RAM, eventually fast disks (SSD), and that works like a proxy to the regular
volumes. active chunks of files are moved there, logs (ZIL style) are recorded
on fast media, among other things. There is NO active project for this, or
trello entry, because it is something I started discussing with a few fellows
just a couple of days ago. I thought of starting to play with RAM disks (tmpfs)
as scratch disks, but, since we have an opportunity to do something more
efficient, or at the very least start it, why not ?>
> Now, c'mon ! Time is running out. We need hands on deck here, for a
simple vote !?>
> Can you share 3 lines with your thoughts ?> 
> Thanks
> 
> > 


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/2411ce02/attachment.html>

Jeff Darcy

2014-Mar-13 13:01 UTC

head link

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

> I am a little bit impressed by the lack of action on this topic. I hate to
be
> "that guy", specially being new here, but it has to be done.
> If I've got this right, we have here a chance of developing Gluster
even
> further, sponsored by Google, with a dedicated programmer for the summer.
> In other words, if we play our cards right, we can get a free programmer
and
> at least a good start/advance on this fantastic.
Welcome, Carlos.  I think it's great that you're taking initiative here.
However, it's also important to set proper expectations for what a GSoC
intern
could reasonably be expected to achieve.  I've seen some amazing stuff out
of
GSoC, but if we set the bar too high then we end up with incomplete code and
the student doesn't learn much except frustration.

GlusterFS consists of 430K lines of code in the core project alone.  Most of
it's written in a style that is generally hard for newcomers to pick up -
both callback-oriented and highly concurrent, often using our own
"unique"
interpretation of standard concepts.  It's also in an area (storage) that is
not well taught in most universities.  Given those facts and the short
duration of GSoC, it's important to focus on projects that don't require
deep
knowledge of existing code, to keep the learning curve short and productive
time correspondingly high.  With that in mind, let's look at some of your
suggestions.
> I think it would be nice to listen to the COMMUNITY (yes, that means YOU),
> for either suggestions, or at least a vote.
It certainly would have been nice to have you at the community IRC meeting
yesterday, at which we discussed release content for 3.6 based on the
feature proposals here:

   http://www.gluster.org/community/documentation/index.php/Planning36

The results are here:

   http://titanpad.com/glusterfs-3-6-planning
> My opinion, being also my vote, in order of PERSONAL preference:
> 1) There is a project going on ( https://forge.gluster.org/disperse ), that
> consists on re-writing the stripe module on gluster. This is specially
> important because it has a HUGE impact on Total Cost of Implementation
> (customer side), Total Cost of Ownership, and also matching what the
> competition has to offer. Among other things, it would allow gluster to
> implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and
> would, as far as I understand, allow you to use 3 nodes as a minimum
> stripe+replication. This means 25% less money in computer hardware, with
> increased data safety/resilience.
This was decided as a core feature for 3.6.  I'll let Xavier (the feature
owner) answer w.r.t. whether there's any part of it that would be
appropriate for GSoC.
> 2) We have a recurring issue with split-brain solution. There is an entry
on
> trello asking/suggesting a mechanism that arbitrates this resolution
> automatically. I pretty much think this could come together with another
> solution that is file replication consistency check.
This is also core for 3.6 under the name "policy based split brain
resolution":

   http://www.gluster.org/community/documentation/index.php/Features/pbspbr

Implementing this feature requires significant knowledge of AFR, which both
causes split brain and would be involved in its repair.  Because it's also
one of our most complicated components, and the person who just rewrote it
won't be around to offer help, I don't think this project *as a whole*
would be a good fit for GSoC.  On the other hand, there might be specific
pieces of the policy implementation (not execution) that would be a good
fit.
> 3) Accelerator node project. Some storage solutions out there offer an
> "accelerator node", which is, in short, a, extra node with a lot
of RAM,
> eventually fast disks (SSD), and that works like a proxy to the regular
> volumes. active chunks of files are moved there, logs (ZIL style) are
> recorded on fast media, among other things. There is NO active project for
> this, or trello entry, because it is something I started discussing with a
> few fellows just a couple of days ago. I thought of starting to play with
> RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do
> something more efficient, or at the very least start it, why not ?
Looks like somebody has read the Isilon marketing materials.  ;)

A full production-level implementation of this, with cache consistency and
so on, would be a major project.  However, a non-consistent prototype good
for specific use cases - especially Hadoop, as Jay mentions - would be
pretty easy to build.  Having a GlusterFS server (for the real clients)
also be a GlusterFS client (to the real cluster) is pretty straightforward.
Testing performance would also be a significant component of this, and IMO
that's something more developers should learn about early in their careers.
I encourage you to keep thinking about how this could be turned into a real
GSoC proposal.


Keep the ideas coming!

Ted Miller

2014-Mar-13 22:38 UTC

head link

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

> [snip]
>
> 2) We have a recurring issue with split-brain solution. There is an entry 
> on trello asking/suggesting a mechanism that arbitrates this resolution 
> automatically. I pretty much think this could come together with another 
> solution that is file replication consistency check.
>Anything to improve split-brain resolution would get my vote.

Ted Miller
Elkhart, IN

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/1d180dc8/attachment.html>

André Bauer

2014-Mar-17 16:39 UTC

head link

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

Hi,

i vote for 3, 2, 1.

But i dont like the idea to have an extra node for 3, which means
bandwidth/speed of the whole cluster is limited to the interface of the
cache node (like in ceph).

I had some similar whish in mind, but wanted to have a ssd cache in
front of a brick. I know this means you need 4 SSDs on a 4 node cluster
but imho its better than one caching node which is limitng the cluster.


Mit freundlichen Gr??en

Andr? Bauer

MAGIX Software GmbH
Andr? Bauer
Administrator
Postfach 200914
01194 Dresden

Tel. Support Deutschland: 0900/1771115 (1,24 Euro/Min.)
Tel. Support ?sterreich:  0900/454571 (1,56 Euro/Min.)
Tel. Support Schweiz:     0900/454571 (1,50 CHF/Min.)

Email: mailto:abauer at magix.net
Web:   http://www.magix.com

Gesch?ftsf?hrer | Managing Directors: Dr. Arnd Schr?der, Erhard Rein,
Michael Keith, Tilman Herberger
Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205

The information in this email is intended only for the addressee named
above. Access to this email by anyone else is unauthorized. If you are
not the intended recipient of this message any disclosure, copying,
distribution or any action taken in reliance on it is prohibited and
may be unlawful.

MAGIX does not warrant that any attachments are free from viruses or
other defects and accepts no liability for any losses resulting from
infected email transmissions. Please note that any views expressed in
this email may be those of the originator and do not necessarily
represent the agenda of the company.

Am 13.03.2014 12:10, schrieb Carlos Capriotti:> Hello, all.
> 
> I am a little bit impressed by the lack of action on this topic. I hate to
> be "that guy", specially being new here, but it has to be done.
> 
> If I've got this right, we have here a chance of developing Gluster
even
> further, sponsored by Google, with a dedicated programmer for the summer.
> 
> In other words, if we play our cards right, we can get a free programmer
> and at least a good start/advance on this fantastic.
> 
> Well, I've checked the trello board, and there is a fair amount of
things
> there.
> 
> There are a couple of things that are not there as well.
> 
> I think it would be nice to listen to the COMMUNITY (yes, that means YOU),
> for either suggestions, or at least a vote.
> 
> My opinion, being also my vote, in order of PERSONAL preference:
> 
> 1) There is a project going on (https://forge.gluster.org/disperse), that
> consists on re-writing the stripe module on gluster. This is specially
> important because it has a HUGE impact on Total Cost of Implementation
> (customer side), Total Cost of Ownership, and also matching what the
> competition has to offer. Among other things, it would allow gluster to
> implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and
> would, as far as I understand, allow you to use 3 nodes as a minimum
> stripe+replication. This means 25% less money in computer hardware, with
> increased data safety/resilience.
> 
> 2) We have a recurring issue with split-brain solution. There is an entry
> on trello asking/suggesting a mechanism that arbitrates this resolution
> automatically. I pretty much think this could come together with another
> solution that is file replication consistency check.
> 
> 3) Accelerator node project. Some storage solutions out there offer an
> "accelerator node", which is, in short, a, extra node with a lot
of RAM,
> eventually fast disks (SSD), and that works like a proxy to the regular
> volumes. active chunks of files are moved there, logs (ZIL style) are
> recorded on fast media, among other things. There is NO active project for
> this, or trello entry, because it is something I started discussing with a
> few fellows just a couple of days ago. I thought of starting to play with
> RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do
> something more efficient, or at the very least start it, why not ?
> 
> Now, c'mon ! Time is running out. We need hands on deck here, for a
simple
> vote !
> 
> Can you share 3 lines with your thoughts ?
> 
> Thanks
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Mar 2014 - PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community