Robert Minvielle
2009-Feb-02 21:23 UTC
[Lustre-discuss] one server node fails, its all dead?
More testing today. I downed a server (OST) to see what would happen. Well, it does follow the FAQ :) The FAQ states: -- begin FAQ -- I don''t need failover, and don''t want shared storage. How will this work? If Lustre is configured without shared storage for failover, and a server node fails, then a client that tries to use that node will pause until the failed server is returned to operation. After a short delay (a configurable timeout value), applications waiting for those nodes can be aborted with a signal (kill or Ctrl-C), similar to the NFS soft-mount mode. When the node is returned to service, applications which have not been aborted will continue to run without errors or data loss. -- end FAQ -- So, if I have a server that goes down, the clients are out of luck. I have a hard time believing this is "acceptable". Ok, so it is "as good as" NFS, but I mean really, if a single storage unit fails all of my clients can do nothing? Am I missing something here or is this by design? The real reason I ask is that I am testing Lustre against a few other DPFS to see if we will move to Lustre. So far, some things are nice, and some are not nice. Writing seems to be faster, but reading is slower (than my other test DPFSs). Contacting Sun to ask about support took forever. At least four days for them to just call me back and tell me they could not give me a price without knowing how much storage I have (ugh, a pay per byte system, great). So, Lustre users, is it worth it? My setup would be 24 OST''s with about 100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so clients needing pretty fast read/write, connected via 10G ethernet (yes, I know I need a SAN but the physical locations will not allow it and the price is prohibitive, hence my looking at DPFSs)... Am I on the right track looking at Lustre, or should I go elsewhere? I also need commercial support of some kind (although it seems Sun is unsure of themselves here, they did not know who to contact when I contacted them "Lustre, we make a product called Lustre? Hold please"...
Brian J. Murrell
2009-Feb-02 21:46 UTC
[Lustre-discuss] one server node fails, its all dead?
On Mon, 2009-02-02 at 15:23 -0600, Robert Minvielle wrote:> > So, if I have a server that goes down, the clients are out of luck.Without failover configured, yes.> I have > a hard time believing this is "acceptable".Well, that''s completely subjective of course. If it''s not acceptable to you, then you can configure a second node that has access to the (i.e. shared) storage (i.e. OSTs or MDTs) for the failed node and service will continue on after the clients discover that the primary has failed and resume operations with the secondary. All of this happens transparent to the applications running on the clients.> Ok, so it is "as good as" NFS,If you don''t configure failover, yes. If you configure failover then it''s better.> but I mean really, if a single storage unit fails all of my clients can do > nothing?No, that''s not true. Even if you don''t have failover configured, any clients that do not attempt to access any files (or file stripes) on that failed OST don''t even notice and continue on merrily.> Am I missing something here or is this by design?It''s design.> Contacting Sun to ask about support took forever. At least four days for them > to just call me back and tell me they could not give me a price without > knowing how much storage I have (ugh, a pay per byte system, great).No. You must have misunderstood. We don''t charge "per byte". IIUC, support costs are a function of how many OSSes you have. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090202/b847f2c3/attachment.bin
Robert Minvielle wrote:> More testing today. I downed a server (OST) to see what would happen. Well, > it does follow the FAQ :) The FAQ states: > > -- begin FAQ -- > I don''t need failover, and don''t want shared storage. How will this work? > > If Lustre is configured without shared storage for failover, and a server node fails, then a client that tries to use that node will pause until the failed server is returned to operation. After a short delay (a configurable timeout value), applications waiting for those nodes can be aborted with a signal (kill or Ctrl-C), similar to the NFS soft-mount mode. > > When the node is returned to service, applications which have not been aborted will continue to run without errors or data loss. > > -- end FAQ -- > > So, if I have a server that goes down, the clients are out of luck. I have > a hard time believing this is "acceptable". Ok, so it is "as good as" NFS, > but I mean really, if a single storage unit fails all of my clients can do > nothing? Am I missing something here or is this by design?It says that if a server node fails, then any client trying to access that server node will block until its function is restored. If you don''t want failover nor shared storage, what is Lustre supposed to do? Other clients can write data to other nodes, and they can read data if it is on a working server node. The real reason> I ask is that I am testing Lustre against a few other DPFS to see if we will > move to Lustre. So far, some things are nice, and some are not nice. Writing > seems to be faster, but reading is slower (than my other test DPFSs). > Contacting Sun to ask about support took forever. At least four days for them > to just call me back and tell me they could not give me a price without > knowing how much storage I have (ugh, a pay per byte system, great). >Performance is going to very greatly with the storage hardware used. Sun can help there (if you can find the right people to find).> So, Lustre users, is it worth it? My setup would be 24 OST''s with about > 100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so clients > needing pretty fast read/write, connected via 10G ethernet (yes, I know I > need a SAN but the physical locations will not allow it and the price is > prohibitive, hence my looking at DPFSs)... Am I on the right track looking > at Lustre, or should I go elsewhere? I also need commercial support of some > kind (although it seems Sun is unsure of themselves here, they did not > know who to contact when I contacted them "Lustre, we make a product > called Lustre? Hold please"...Other companies can provide Lustre support. Data Direct Networks provides a packaged solution with their hardware. It can be configured with failover and all the other goodies one would want to ensure maximum uptime. Terascala provides Lustre support with their hardware. Although they are working (or may have delivered) failover configurations, their initial releases were non-shared storage, no-failover configurations. Craig> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Craig Tierney (craig.tierney at noaa.gov)
Andreas Dilger
2009-Feb-02 21:57 UTC
[Lustre-discuss] one server node fails, its all dead?
On Feb 02, 2009 15:23 -0600, Robert Minvielle wrote:> So, if I have a server that goes down, the clients are out of luck. I have > a hard time believing this is "acceptable". Ok, so it is "as good as" NFS, > but I mean really, if a single storage unit fails all of my clients can do > nothing? Am I missing something here or is this by design?It is possible for clients to create new files while a server is down, but as you can expect it isn''t possible to read any data from the failed server. In some cases users have used DRBD to do device replication instead of using shared storage.> The real reason > I ask is that I am testing Lustre against a few other DPFS to see if we will > move to Lustre. So far, some things are nice, and some are not nice. Writing > seems to be faster, but reading is slower (than my other test DPFSs). > Contacting Sun to ask about support took forever. At least four days for them > to just call me back and tell me they could not give me a price without > knowing how much storage I have (ugh, a pay per byte system, great).You can imagine that supporting the largest Lustre filesystem (1300+ OSTs with 10PB of storage and 30k+ clients) will take more effort than supporting a system with a handful of OSTs and clients. The support price is not per client, but rather per-OST, IIRC.> So, Lustre users, is it worth it? My setup would be 24 OST''s with about > 100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so clients > needing pretty fast read/write, connected via 10G ethernet (yes, I know I > need a SAN but the physical locations will not allow it and the price is > prohibitive, hence my looking at DPFSs)... Am I on the right track looking > at Lustre, or should I go elsewhere? I also need commercial support of some > kind (although it seems Sun is unsure of themselves here, they did not > know who to contact when I contacted them "Lustre, we make a product > called Lustre? Hold please"...Well, Sun is a big company, and Lustre was only acquired a year ago and does not necessarily generate a high call volume to the L1 support people, so they are not necessarily going to have information immediately handy. Note that Lustre itself does NOT need a SAN to work, unlike some other cluster filesystems. The only SAN requirement is for failover pairs of servers. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Kevin Van Maren
2009-Feb-03 14:46 UTC
[Lustre-discuss] one server node fails, its all dead?
On Feb 2, 2009, at 2:57 PM, Andreas Dilger <adilger at sun.com> wrote:> On Feb 02, 2009 15:23 -0600, Robert Minvielle wrote: >> So, if I have a server that goes down, the clients are out of luck. >> I have >> a hard time believing this is "acceptable". Ok, so it is "as good >> as" NFS, >> but I mean really, if a single storage unit fails all of my clients >> can do >> nothing? Am I missing something here or is this by design? > > It is possible for clients to create new files while a server is down, > but as you can expect it isn''t possible to read any data from the > failed > server. In some cases users have used DRBD to do device replication > instead of using shared storage. > >> The real reason >> I ask is that I am testing Lustre against a few other DPFS to see >> if we will >> move to Lustre. So far, some things are nice, and some are not >> nice. Writing >> seems to be faster, but reading is slower (than my other test DPFSs). >> Contacting Sun to ask about support took forever. At least four >> days for them >> to just call me back and tell me they could not give me a price >> without >> knowing how much storage I have (ugh, a pay per byte system, great). > > You can imagine that supporting the largest Lustre filesystem (1300+ > OSTs > with 10PB of storage and 30k+ clients) will take more effort than > supporting > a system with a handful of OSTs and clients. The support price is > not per > client, but rather per-OST, IIRC.Just to clarify: Lustre support prices are currently based on the number of OSS servers (machines serving OSTs), not the number of OSTs, and not the size of the storage.> >> So, Lustre users, is it worth it? My setup would be 24 OST''s with >> about >> 100TB of storage, 10G ethernet, RAID on each OST, at least 20 or so >> clients >> needing pretty fast read/write, connected via 10G ethernet (yes, I >> know I >> need a SAN but the physical locations will not allow it and the >> price is >> prohibitive, hence my looking at DPFSs)... Am I on the right track >> looking >> at Lustre, or should I go elsewhere? I also need commercial support >> of some >> kind (although it seems Sun is unsure of themselves here, they did >> not >> know who to contact when I contacted them "Lustre, we make a product >> called Lustre? Hold please"... > > Well, Sun is a big company, and Lustre was only acquired a year ago > and > does not necessarily generate a high call volume to the L1 support > people, > so they are not necessarily going to have information immediately > handy. > > Note that Lustre itself does NOT need a SAN to work, unlike some other > cluster filesystems. The only SAN requirement is for failover pairs > of > servers.Sun also has low-cost shared storage options for Luster -- search for Sun Storage Cluster.> Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss