Hi List, I''m working for a small government organization in South America, and in our quest for a large distributed storage solution we ran across Lustre. We have some servers in which we''re trying to build a prototype to decide whether or not Lustre meets our requirements. Because of some hardware issues we''ve temporarily chosen RHEL5 as our OS but will also give Debian a try later (so I might bug you later on building Lustre from sources, but that will be another era). Right now we are trying to build a Lustre filesystem with 4 servers: 2 of them as MDS/MDT configured for failover, and the other two as OSS with one and two OST respectively. I''ve already installed the patched kernel, lustre-modules, lustre-ldiskfs, lustre itself and the e2fsprog lustre provides. I''ve also added the lnet module to modprobe.conf. So, now I guess I should start setting up my two MDS+MDT servers, and here is were I''m stuck: the lustre manual and Mount Conf say something about making a filesystem, showing how to proceed with MDT and OST. But I don''t understand how I''m supposed to set up the MDS and OSS later :-/ Also, I made different partitions in the servers for the MDT and OST. I *can* build the lustrefs using those partitions, can''t I? Node names are configured inside lustre, or are they just the domain names i set up in the computers? What do you call "sites"? (I quote from Mount Conf: "There should be one MGS per site, not one MGS per filesystem") If you could help me with all these little problems, it''d be great :) Thanks a lot in advance! With many doubts still, but hoping to get some solved soon, ra
Rayentray, ----- "Rayentray Tappa" <rtappa.ascentio at gmail.com> wrote:> Hi List, > I''m working for a small government organization in South America, and > in our quest for a large distributed storage solution we ran across > Lustre. > We have some servers in which we''re trying to build a prototype to > decide whether or not Lustre meets our requirements. > Because of some hardware issues we''ve temporarily chosen RHEL5 as our > OS but will also give Debian a try later (so I might bug you later on > building Lustre from sources, but that will be another era).It is personal choice but I find RHEL5 makes a very good Lustre server distrobition and is far easier to maintain Lustre updates on.> Right now we are trying to build a Lustre filesystem with 4 servers: > 2 of them as MDS/MDT configured for failover, and the other two as OSS > with one and two OST respectively.For testing you may want to simplify the setup and not bother with MDT failover. Failover is really only useful for the case where there is a hardware failure. Using STONITH to reboot crashed servers is usually sufficient for most small installations.> I''ve already installed the patched kernel, lustre-modules, > lustre-ldiskfs, lustre itself and the e2fsprog lustre provides. I''ve > also added the lnet module to modprobe.conf. > So, now I guess I should start setting up my two MDS+MDT servers, and > here is were I''m stuck: the lustre manual and Mount Conf say > something about making a filesystem, showing how to proceed with MDT and OST. > But I don''t understand how I''m supposed to set up the MDS and OSS later > :-/Basically you format the partitions with "mkfs.lustre" and then you mount them with "mount -t lustre /dev/blah /mnt/blah". When mounting for the first time they communicate with the MGS (config server) and create the required configs automatically.> Also, I made different partitions in the servers for the MDT and OST. > I *can* build the lustrefs using those partitions, can''t I?Yes you can use any device including partitions. You could also just use the raw unpartitioned devices (or software RAIDs, LVMs, DRDB).> Node names are configured inside lustre, or are they just the domain > names i set up in the computers?They are taken from the hostnames. When setting up failover you need to supply the failover hostname to the mkfs.lustre command.> What do you call "sites"? (I quote from Mount Conf: "There should be > one MGS per site, not one MGS per filesystem")In this context site means "organisation/company". The MGS holds the configuration for all Lustre filesystem in an organisation. You only need one. If you are testing or only plan on having a single filesystem then it makes sense to colocate the MGS and MDS on the same server/device. When you use mkfs.lustre on the MDT there is an option to make it the MGS too. Daire
On Wed, 2009-03-04 at 11:09 +0000, Daire Byrne wrote:> Rayentray, > > ----- "Rayentray Tappa" <rtappa.ascentio at gmail.com> wrote: > > > Hi List, > > I''m working for a small government organization in South America, and > > in our quest for a large distributed storage solution we ran across > > Lustre. > > We have some servers in which we''re trying to build a prototype to > > decide whether or not Lustre meets our requirements. > > Because of some hardware issues we''ve temporarily chosen RHEL5 as our > > OS but will also give Debian a try later (so I might bug you later on > > building Lustre from sources, but that will be another era). > > It is personal choice but I find RHEL5 makes a very good Lustre server > distrobition and is far easier to maintain Lustre updates on. >Thanks for the information, it''s good to know we should have little trouble maintaining this :)> > Right now we are trying to build a Lustre filesystem with 4 servers: > > 2 of them as MDS/MDT configured for failover, and the other two as OSS > > with one and two OST respectively. > > For testing you may want to simplify the setup and not bother with MDT > failover. Failover is really only useful for the case where there is a > hardware failure. Using STONITH to reboot crashed servers is usually > sufficient for most small installations. >I''m testing this in order to build a large system later, where high availability of the data store is a must - so we do want to check how this works, what involves to configure it and what may happen in case of failure and how to restore it. If it can be configured in a later stage, i.e. after having already tested Lustre itself, it might be a good idea to learn only a few things at the time :)> > I''ve already installed the patched kernel, lustre-modules, > > lustre-ldiskfs, lustre itself and the e2fsprog lustre provides. I''ve > > also added the lnet module to modprobe.conf. > > So, now I guess I should start setting up my two MDS+MDT servers, and > > here is were I''m stuck: the lustre manual and Mount Conf say > > something about making a filesystem, showing how to proceed with MDT and OST. > > But I don''t understand how I''m supposed to set up the MDS and OSS later > > :-/ > > Basically you format the partitions with "mkfs.lustre" and then you mount them > with "mount -t lustre /dev/blah /mnt/blah". When mounting for the first time > they communicate with the MGS (config server) and create the required configs > automatically. >Right :) I was informed yesterday that mds and oss are not actually configuired but created whenever i create the mdt or ost. It makes more sense now :)> > What do you call "sites"? (I quote from Mount Conf: "There should be > > one MGS per site, not one MGS per filesystem") > > In this context site means "organisation/company". The MGS holds the configuration > for all Lustre filesystem in an organisation. You only need one. If you are testing > or only plan on having a single filesystem then it makes sense to colocate the MGS > and MDS on the same server/device. When you use mkfs.lustre on the MDT there is an > option to make it the MGS too. >Our idea is to have a main Lustre running, which will be replicated in other two buildings. We still don''t know how we will handle the replication, maybe Lustre1.8 is already out then and we can use that, or maybe we handle it with a separate unit of software that we need to develop anyway with other purposes[1]. Does it make sense to have only one MGS in that configuration? [1] How easy/difficult is to become involved with Lustre development? I want to propose to the organization that, given we need replication and Lustre intends to provide it in version 2.0, we may help with its development. How much work would that be? (rough idea). Anyone I can contact specifically about this? Thanks, -- ra
Rayentray, ----- "Rayentray Tappa" <rtappa.ascentio at gmail.com> wrote:> I''m testing this in order to build a large system later, where high > availability of the data store is a must - so we do want to check how > this works, what involves to configure it and what may happen in case > of failure and how to restore it. > > If it can be configured in a later stage, i.e. after having already > tested Lustre itself, it might be a good idea to learn only a few > things at the time :)Fair enough. I would think you can add failover at a later stage using "tune2fs.lustre" but one of the developers/gurus will need to verify that.> Our idea is to have a main Lustre running, which will be replicated > in other two buildings. We still don''t know how we will handle the > replication, maybe Lustre1.8 is already out then and we can use that, > or maybe we handle it with a separate unit of software that we need to > develop anyway with other purposes[1]. Does it make sense to have > only one MGS in that configuration?If your network between the buildings is pretty solid then a single MGS should be fine - it doesn''t really do very much I/O. We have 4 buildings and a single MGS (with consistently low loads). There is some work going on to write replication tools atm called "lreplicate". You can watch the progress here: https://bugzilla.lustre.org/show_bug.cgi?id=16855 However this is geared towards using "changelogs" which I don''t think are due until Lustre v2.0 which may be a way off yet. Again the devs can better clarify that for you.> [1] How easy/difficult is to become involved with Lustre development? > I want to propose to the organization that, given we need replication > and Lustre intends to provide it in version 2.0, we may help with its > development. How much work would that be? (rough idea). Anyone I can > contact specifically about this?We do backups from a few Lustre filesystems to a large single Lustre filesystem using "e2scan" which can quickly scan the MDT for changed files and dump out a list which can then be fed into something like rsync and rm to synchronise filesystems. The Lustre e2fsprogs RPM includes e2scan. I''m more than happy to send our bash script but it is pretty hack-tastic! Daire