nathan at robotics.net
2008-Jun-16 15:34 UTC
[Lustre-discuss] Gluster then DRBD now Lustre?
I have been spending a lot of time with Gluster, I like it a lot, on the surface it looks great. I like that I can get RAID 6 like functionality out of it, however after testing found it is not just ready for prime time. Our day one config is two servers with 10TB each in NYC and SJC. Originally the plan was to active/active mirror them, but even with gig e the delay kills your write speed. Since Gluster did not work out we started testing DRBD. The plan was to active/active mirror the two servers in each site and then setup scripts to copy the data we need between sites. When we need more servers, we would add them in groups of two and use Gluster (hoping it is ready in 4 - 6 months) to unify the DRBD groups into a larger shared namespace. This is working in a test setup, however there are some down sides. The first is that DRBD only supports IP, so we have to run IPoIB over our our infiniband adapters, not an ideal solution. The second is that we are using infiniband adapters on centos 5.1 xen kernel and can''t bind them together because we need OFED 3 and it removed bind because it now is in the kernel, but not in 2.6.18 we need for xen. Anyway, my question is should I run Luster instead of DRBD and is there any time frame for RAID 6 like functionally out of Lustre? P.S. Once long long ago and far far away lustre had links that you could download software from. Today the only way I see to do it is to log into sun and then download. I have scripts that I use to build stuff and this is a big pain....><>Nathan Stratton CTO, BlinkMind, Inc. nathan at robotics.net nathan at blinkmind.com http://www.robotics.net http://www.blinkmind.com
On Mon, 2008-06-16 at 10:34 -0500, nathan at robotics.net wrote:> I have been spending a lot of time with Gluster, I like it a lot, on the > surface it looks great. I like that I can get RAID 6 like functionality > out of it, however after testing found it is not just ready for prime > time.I will disclaim off the top that I know nothing about Gluster...> Our day one config is two servers with 10TB each in NYC and SJC.These are your "file servers"? Are they meant to be servers for users local (i.e. on a LAN) to them? NYC == New York City? What is SJC?> Originally the plan was to active/active mirror them,You want to mirror these two machines, in real time, over a long distance?> but even with gig e > the delay kills your write speed.Of course. With real mirroring, your writes are going to be as slow as it takes the data to travel the link and be written to media as any reliable mirroring solution requires that both sides of the mirror be written before the writer is allowed to call the write() complete.> Since Gluster did not work out we > started testing DRBD. The plan was to active/active mirror the two servers > in each site and then setup scripts to copy the data we need between > sites.I''m not following this. If the two sites are active/active mirrored, why do you need scripts to copy data. I must be misunderstanding something about your scenario.> This is working in a test setup, however there are some down sides. The > first is that DRBD only supports IP, so we have to run IPoIB over our > our infiniband adapters, not an ideal solution. The second is that we are > using infiniband adapters on centos 5.1 xen kernel and can''t bind them > together because we need OFED 3There is no "OFED 3". Perhaps you mean 1.3?> Anyway, my question is should I run LusterLustre.> instead of DRBD and is there any > time frame for RAID 6 like functionally out of Lustre?There is no RAID functionality at all in Lustre. I don''t think I understand your use scenario well enough to make any other recommendations.> P.S. Once long long ago and far far away lustre had links that you could > download software from. Today the only way I see to do it is to log into > sun and then download.Correct.> I have scripts that I use to build stuff and this > is a big pain....Yes, systems like OpenWRT do similar things. I''m afraid I have no solution for this problem. I certainly have not tried it, but perhaps you can analyse the SDLC log-in and download process and automate it with your account credentials. Wget or CURL should be able to simulate the process a browser would go through. Likely you have to pay attention to, retrieve, store and send cookies throughout the process. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080616/37b15f4b/attachment-0001.bin
On Jun 16, 2008 11:54 -0400, Brian J. Murrell wrote:> On Mon, 2008-06-16 at 10:34 -0500, nathan at robotics.net wrote: > > Our day one config is two servers with 10TB each in NYC and SJC. > > These are your "file servers"? Are they meant to be servers for users > local (i.e. on a LAN) to them? NYC == New York City? What is SJC?SJC == San Jose, California> > P.S. Once long long ago and far far away lustre had links that you could > > download software from. Today the only way I see to do it is to log into > > sun and then download. I have scripts that I use to build stuff and this > > is a big pain.... > > Yes, systems like OpenWRT do similar things. I''m afraid I have no > solution for this problem. I certainly have not tried it, but perhaps > you can analyse the SDLC log-in and download process and automate it > with your account credentials. Wget or CURL should be able to simulate > the process a browser would go through. Likely you have to pay > attention to, retrieve, store and send cookies throughout the process.You can use CVS to download the source anonymously. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote:> > NYC == New York City? What > > is SJC? > > SJC == San Jose, CaliforniaThat''s why I thought, but if so, the following part loses me:> This is working in a test setup, however there are some down sides. > The first is that DRBD only supports IP, so we have to run IPoIB over > our our infiniband adapters, not an ideal solution.Nathan, you won''t be able to use Infiniband between Ney Work City and San Jose, CA, anyway, right? Even without considering IB cables'' length limitation, and unless you can use some kind of dedicated, special-purpose link between your sites, the public Internet is not really able to provide bandwidth nor latencies compatible with Infiniband standards. IP is probably your best bet, here, and DRBD would probably be an appropriate candidate for this kind of job. Although, you probably don''t want your synchronization data unencrypted over the public pipes, and you may need an extra VPN-ish layer to ensure data confidentiality. Cheers, -- Kilian
nathan at robotics.net
2008-Jun-16 21:36 UTC
[Lustre-discuss] Gluster then DRBD now Lustre?
On Mon, 16 Jun 2008, Kilian CAVALOTTI wrote:> On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote: >>> NYC == New York City? What >>> is SJC? >> >> SJC == San Jose, California > > That''s why I thought, but if so, the following part loses me: > >> This is working in a test setup, however there are some down sides. >> The first is that DRBD only supports IP, so we have to run IPoIB over >> our our infiniband adapters, not an ideal solution. > > Nathan, you won''t be able to use Infiniband between Ney Work City and > San Jose, CA, anyway, right? Even without considering IB cables'' length > limitation, and unless you can use some kind of dedicated, > special-purpose link between your sites, the public Internet is not > really able to provide bandwidth nor latencies compatible with > Infiniband standards.Ok, so in the original email east to west was what we originally wanted to do but realized that would not be possible because of round trip delay even over gig e. Instead of mirroring our traffic east west we are starting with 2 servers in each location tied together with Infiniband. The infiniband cables are only 5M. : ) Currently we are mirroring traffic with DRBD between the two local systems in each datacenter, but we are looking for the tradeoffs of switching to Lustre since DRBD does not support Infiniband. -Nathan
nathan at robotics.net wrote:> On Mon, 16 Jun 2008, Kilian CAVALOTTI wrote: > >> On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote: >>>> NYC == New York City? What >>>> is SJC? >>> SJC == San Jose, California >> That''s why I thought, but if so, the following part loses me: >> >>> This is working in a test setup, however there are some down sides. >>> The first is that DRBD only supports IP, so we have to run IPoIB over >>> our our infiniband adapters, not an ideal solution. >> Nathan, you won''t be able to use Infiniband between Ney Work City and >> San Jose, CA, anyway, right? Even without considering IB cables'' length >> limitation, and unless you can use some kind of dedicated, >> special-purpose link between your sites, the public Internet is not >> really able to provide bandwidth nor latencies compatible with >> Infiniband standards. > > Ok, so in the original email east to west was what we originally wanted to > do but realized that would not be possible because of round trip delay > even over gig e. Instead of mirroring our traffic east west we are starting > with 2 servers in each location tied together with Infiniband. The > infiniband cables are only 5M. : ) Currently we are mirroring traffic with > DRBD between the two local systems in each datacenter, but we are looking > for the tradeoffs of switching to Lustre since DRBD does not support > Infiniband. >Umm....Lustre is not a replacement for DRBD, so we''re very confused over here. Lustre is a way of making a big distributed filesystem out of a bunch of storage nodes. We don''t do replication, it''s basically RAID 0. So, you could use Lustre to make one big filesystem out of two local servers. You could even make one big filesystem out of your multiple locations over the WAN (it''s been done). But, you can''t use Lustre to mirror data. (yet, wait a year) So I think your Gluster expedition might have confused you. Gluster and Lustre are only words that sound somewhat the same, there is _no_ relationship between the two. (except the fact that there is some filesystem goop involved) You''re comparing apples to knee socks if you are attempting to map gluster experience to a Lustre setup. cliffw> -Nathan > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
nathan at robotics.net
2008-Jun-17 13:51 UTC
[Lustre-discuss] Gluster then DRBD now Lustre?
On Mon, 16 Jun 2008, Cliff White wrote:> Umm....Lustre is not a replacement for DRBD, so we''re very confused over > here. Lustre is a way of making a big distributed filesystem out of a bunch > of storage nodes. We don''t do replication, it''s basically RAID 0.So any node does you lose the data?> So, you could use Lustre to make one big filesystem out of two local servers. > You could even make one big filesystem out of your multiple locations over > the WAN (it''s been done). > > But, you can''t use Lustre to mirror data. (yet, wait a year)The plan is to add more and more servers, so the hope was to be able to start with 2 and get some redundancy and then grow.> So I think your Gluster expedition might have confused you. Gluster and > Lustre are only words that sound somewhat the same, there is _no_ > relationship between the two. (except the fact that there is some filesystem > goop involved) You''re comparing apples to knee socks if you are attempting to > map gluster experience to a Lustre setup.Hmm, I would say they are a little more like each other then that, but I understand. -Nathan
On Mon, 2008-06-16 at 16:36 -0500, nathan at robotics.net wrote:> > Ok, so in the original email east to west was what we originally wanted to > do but realized that would not be possible because of round trip delay > even over gig e.You have a dedicated gige pipe from the east coast to the west coast? What''s it''s RTT? But yes, doing block device mirroring over that distance would be prohibitive.> Instead of mirroring our traffic east west we are starting > with 2 servers in each location tied together with Infiniband. The > infiniband cables are only 5M. : ) Currently we are mirroring traffic with > DRBD between the two local systems in each datacenter, but we are looking > for the tradeoffs of switching to Lustre since DRBD does not support > Infiniband.As Cliff has said, Lustre is not a mirroring technology. It''s a global filesystem. In the future, we will have replication abilities to achieve the sort of goal you are trying to work towards currently, which is "loose" mirroring over large distances (read: latencies). b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080617/8978b9e2/attachment.bin
On Tue, 2008-06-17 at 08:51 -0500, nathan at robotics.net wrote:> > So any node does you lose the data?I will parse that as "so if you lose any node, you lose data?" and the answer to that is yes. If you lose an OST, you lose data. If you lose the MDT you lose the entire filesystem. Lustre assumes that the storage you assign to it to manage is reliable. That is, when you create an OST or MDT, we very strongly suggest you create it out of some sort of reliable storage like RAID 1/5/6 for an OST and RAID 1 for an MDT.> The plan is to add more and more servers, so the hope was to be able to > start with 2 and get some redundancy and then grow.Lustre will give you that ability. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080617/654731e7/attachment.bin
On Mon, Jun 16, 2008 at 11:55:12AM -0700, Kilian CAVALOTTI wrote:> On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote: > > > NYC == New York City? What > > > is SJC? > > > > SJC == San Jose, California > > That''s why I thought, but if so, the following part loses me: > > > This is working in a test setup, however there are some down sides. > > The first is that DRBD only supports IP, so we have to run IPoIB over > > our our infiniband adapters, not an ideal solution. > > Nathan, you won''t be able to use Infiniband between Ney Work City and > San Jose, CA, anyway, right? Even without considering IB cables'' length > limitation, and unless you can use some kind of dedicated, > special-purpose link between your sites, the public Internet is not > really able to provide bandwidth nor latencies compatible with > Infiniband standards. > > IP is probably your best bet, here, and DRBD would probably be an > appropriate candidate for this kind of job. Although, you probably > don''t want your synchronization data unencrypted over the public pipes, > and you may need an extra VPN-ish layer to ensure data confidentiality.If you have a dedicated gigabit link (no congestion), InfiniBand might work pretty well. I''ve used the Obsidian Longbow IB WAN extenders, and got better performance using IB than over TCP. I believe there is also a version that does AES encryption as well. Has anyone tried lustre over a 100ms latency IB wan link? That being said, IB WAN stuff is pretty new, so if you want to try this, it has some promise, but expect to do a lot of experimenting.
On Jun 17, 2008 10:40 -0500, Troy Benjegerdes wrote:> If you have a dedicated gigabit link (no congestion), InfiniBand might > work pretty well. I''ve used the Obsidian Longbow IB WAN extenders, and > got better performance using IB than over TCP. I believe there is also a > version that does AES encryption as well. > > Has anyone tried lustre over a 100ms latency IB wan link?I''m not sure of the latency, but Lustre is used over a WAN at Indiana University, and they can pretty much saturate a 10GigE link with a single client... Google for "Indiana Lustre WAN" and you''ll find a paper on this. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.