Hi, I know this topic has been discussed here many times, but all the messages seem to be about Lustre 1.6. Has anything changed in Lustre 1.8 that would make it possible to set up two OSS with an OST shared using DRBD, in an active-active configuration? I have mounted a shared OST on two OSS nodes, none of them marked with "--failover", and it looked as if it was working, but I didn''t do any stress tests for reliablility. Does such setup ever have a chance to work for real or did it only look as if everything was OK? -- Andrzej Godziuk http://CloudAccess.net/
On Tue, Mar 02, 2010 at 01:09:51PM +0100, Andrew Godziuk wrote:> Has anything changed in Lustre 1.8 that would make it possible to set > up two OSS with an OST shared using DRBD, in an active-active > configuration?No, a lustre target (OST & MDT) must *never* be active on more than 1 server at a time and we have no plan to change this. Mounting the same OST on 2 different OSSs results in massive corruptions most of the time.> I have mounted a shared OST on two OSS nodes, none of them marked with > "--failover", and it looked as if it was working,In 1.6, we introduced MMP (stands for multiple mount protection) to prevent mounting the same target twice. However, we enable MMP in mkfs.lustre only when a failover node is specified, that''s why you are able to mount the OST twice.> but I didn''t do any stress tests for reliablility. Does such setup ever > have a chance to work for real or did it only look as if everything was OK?No, this has no chance to work. For now, all clients are probably still connected to the OST through the same OSS, but as soon as one client reconnects to the OST through the 2nd OSS (e.g. after some request timeouts), the two OSSs will concurrently write to the same ldiskfs filesystem and corrupt it. Johann
Johann, Thank you for your detailed answer, this made the picture much more clear. Then I guess this part of manual should be changed: "The active/passive configuration is seldom used for OST servers as it doubles hardware costs without improving performance. On the other hand, an active/active cluster configuration can improve performance by serving and providing arbitrary failover protection to a number of OSTs." to state explicitly that active/active scenario is only possible when OSS is active for some OSTs and passive for some others. -- Andrzej Godziuk http://CloudAccess.net/
On Tue, Mar 02, 2010 at 02:01:06PM +0100, Andrew Godziuk wrote:> Then I guess this part of manual should be changed: > > "The active/passive configuration is seldom used for OST servers as it > doubles hardware costs without improving performance. On the other > hand, an active/active cluster configuration can improve performance > by serving and providing arbitrary failover protection to a number of > OSTs." > > to state explicitly that active/active scenario is only possible when > OSS is active for some OSTs and passive for some others.Yes, i think this is explained in the next section: "For OST failover, multiple OSS nodes are configured to be able to serve the same OST. However, only one OSS node can serve the OST at a time. An OST can be moved between OSS nodes that have access to the same storage device using umount/mount commands. " BTW, in your case, since you did not specify a failover node for the OST at mkfs time, the lustre clients are not aware of the alternative path and thus won''t try to reach the OST through the 2nd OSS. So your filesystem should still be safe since the 2nd mount instance should never receive any client connection. However, I would still recommend to umount the OST on the 2nd OSS asap. Johann
On Tue, Mar 2, 2010 at 2:31 PM, Johann Lombardi <johann at sun.com> wrote:> On Tue, Mar 02, 2010 at 02:01:06PM +0100, Andrew Godziuk wrote: >> Then I guess this part of manual should be changed:...>> to state explicitly that active/active scenario is only possible when >> OSS is active for some OSTs and passive for some others. > > Yes, i think this is explained in the next section: > "For OST failover, multiple OSS nodes are configured to be able to serve the > same OST. However, only one OSS node can serve the OST at a time. An OST can be > moved between OSS nodes that have access to the same storage device using > umount/mount commands. "It sounded to me like contradiction and made me ask the question here. Now that I know, it sounds logical.> BTW, in your case, since you did not specify a failover node for the OST at > mkfs time, the lustre clients are not aware of the alternative path and thus > won''t try to reach the OST through the 2nd OSS. So your filesystem should > still be safe since the 2nd mount instance should never receive any client > connection. However, I would still recommend to umount the OST on the 2nd > OSS asap.This was just a test setup, I''ll be specifying --failover in the live setup for sure. Again, thank you very much for your help. -- Andrzej Godziuk http://CloudAccess.net/