Good Morning Folks, A quick question on lustre failover as far as OSSs are concerned. Can failover pairs be in an (for lack of a better phrase) active- active setup? I have a GPFS background where we would have NSDs (OSTs) split between two servers -- half the NSDs would be primarily served by one server, the other half on the other server. In the case of a failover, one server would take over all NSDs until such time as the primary was back in production. Looking at the lustre docs, it looks like this is not the standard operating procedure. Rather, it looks like a "active-passive" setup where one OSS owns all the OSTs and the failover is more a warm spare ready to kick into action when a failure occurs but not serving any data requests while in full production. Is this a correct analysis of the lustre side of things? ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
On Wed, 2009-05-13 at 10:19 -0700, John White wrote:> Good Morning Folks, > A quick question on lustre failover as far as OSSs are concerned. > Can failover pairs be in an (for lack of a better phrase) active- > active setup?You are not lacking a better phrase. That''s exactly the nomenclature we use to describe what you are looking for and yes, you most definitely can do active-active OSSes. I''d guest that a great portion of our failover-using customers are doing this.> Looking at the lustre docs, it looks like this is not the standard > operating procedure.Hrm. Can you point out where you are getting this impression from? Are you sure you are not just reading one of several scenarios?> Rather, it looks like a "active-passive" setup > where one OSS owns all the OSTs and the failover is more a warm spare > ready to kick into action when a failure occurs but not serving any > data requests while in full production.That''s certainly a valid operating mode for OSSes, and the only failover mode supported for MDSes, but active-active OSSes is most certainly supported, and documented I thought. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090513/a134e81b/attachment-0001.bin
It is normal for an OSS server pair to serve OSTs from both servers. So in that sense, it relates back to the GPFS NSD servers. The difference versus GPFS (where LUNs are active on both servers all the time, even though one is the primary server) is that the secondary server does NOT serve the OSTs being served by the primary, unless the primary is down and the OST has been failed over. Kevin John White wrote:> Good Morning Folks, > A quick question on lustre failover as far as OSSs are concerned. > Can failover pairs be in an (for lack of a better phrase) active- > active setup? I have a GPFS background where we would have NSDs > (OSTs) split between two servers -- half the NSDs would be primarily > served by one server, the other half on the other server. In the case > of a failover, one server would take over all NSDs until such time as > the primary was back in production. > > Looking at the lustre docs, it looks like this is not the standard > operating procedure. Rather, it looks like a "active-passive" setup > where one OSS owns all the OSTs and the failover is more a warm spare > ready to kick into action when a failure occurs but not serving any > data requests while in full production. Is this a correct analysis of > the lustre side of things? > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50B-3209C > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Also note that you will need third-party software to do this failover, unlike GPFS. jab> The difference versus GPFS (where LUNs are active on both > servers all the time, even though one is the primary server) > is that the secondary server does NOT serve the OSTs being > served by the primary, unless the primary is down and the OST > has been failed over. >