Phil Schwan
2006-May-19 07:36 UTC
Clarification? was Re: [Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64nodes
Hi Rob-- On 12/10/2004 9:57, Rob Martin wrote:> > I have a similar installation, and your answer to Gabe''s question has me a > little concerned that I might be trying to use Lustre for something it > can''t do. As it happens, I''m also using SLES9 and an internal SCSI array, > but for clarity I''m hoping these details aren''t too relevant.Already I think this is a different case -- he was trying to use Lustre like a SAN file system, where each node directly reads and writes from a shared disk drive. To be totally clear, this is NOT how Lustre works. If I understand you, your disks are not shared, but rather local to a single node. I''ll keep reading.> I''m building a small 3-server cluster. One machine (fs1) has a RAID5 array > of 72GB drives, with one 110 GB partition set aside for Lustre. This > machine is heavily redundant, with tape and dual power supplies, etc., and > hosts the login LDAP database and file services. > > The other two machines (as1 and as2) are being built as identical web and > application servers. Ideally, I''ll be able to manually balance loads by > turning services on or off on whichever machine I choose, just so that one > of them is providing each service. The data store for each service lives > on the clustered drive, served from fs1.You didn''t mention a metadata server... Somewhere you have a disk partition formatted for the MDS, and one of those nodes is running the MDS software. Right?> All three machines mount a Lustre-based OST called ''/cluster'' on node fs1.It is mounting an OST (which resides on an OSS), but it''s also mounting an MDS. So to be more precise, your file system clients are mounting a Lustre volume on "/cluster", which uses an MDS (but you didn''t tell me which node) and a single OSS (fs1). Am I right? Just like I pointed out in my previous email, if fs1 is both an OSS and a client then this is not a supported configuration with the presently-available Lustre releases. It will work fine for light testing, but is not a recommended stable production configuration.> Am I headed down a rabbit hole? Based on the literature, I thought each > machine in the cluster would be capable of accessing the data on the > drive, and that the MDS''s are providing the locking features necessary to > prevent data corruption. Theoretically, they won''t be accessing the same > files at the same time (services would be run on only one server), but I > guess I thought even that would be ok.The misunderstanding is that the other email was about two server nodes writing to the same LUN of a shared disk array at the same time. This is absolutely not how Lustre works, as they discovered. You only have one server accessing the disk array, and providing access to that array through the Lustre software services. This is exactly how Lustre works, and you are correct: the MDS and OSS nodes all run distributed lock servers, and they coordinate access to files to provide guaranteed POSIX semantics. Even if many nodes modify the same file at the same time. I hope this helps-- -Phil
Rob Martin
2006-May-19 07:36 UTC
Clarification? was Re: [Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64nodes
Hello Phil, I have a similar installation, and your answer to Gabe''s question has me a little concerned that I might be trying to use Lustre for something it can''t do. As it happens, I''m also using SLES9 and an internal SCSI array, but for clarity I''m hoping these details aren''t too relevant. I''m building a small 3-server cluster. One machine (fs1) has a RAID5 array of 72GB drives, with one 110 GB partition set aside for Lustre. This machine is heavily redundant, with tape and dual power supplies, etc., and hosts the login LDAP database and file services. The other two machines (as1 and as2) are being built as identical web and application servers. Ideally, I''ll be able to manually balance loads by turning services on or off on whichever machine I choose, just so that one of them is providing each service. The data store for each service lives on the clustered drive, served from fs1. All three machines mount a Lustre-based OST called ''/cluster'' on node fs1. Am I headed down a rabbit hole? Based on the literature, I thought each machine in the cluster would be capable of accessing the data on the drive, and that the MDS''s are providing the locking features necessary to prevent data corruption. Theoretically, they won''t be accessing the same files at the same time (services would be run on only one server), but I guess I thought even that would be ok. Help? Rob Martin Phil Schwan said:> Hi Gabe-- > > On 12/9/2004 15:55, Gabriel Afana wrote: > >> I have SuSE Enterprise Server 9 running on two AMD Opteron servers. Ihave an external RAID array with one large LUN on it that I need shared between the two nodes. I am using the built-in Lustre>> clustering software, but I am having an issue with it. I''ve set it allup and everything went smoothly, but the nodes aren''t updating each others file system.>> >> Both servers can mount and access external RAID ok, but the problem iswhen I make a change on the RAID partition fron one server, the other server doesn''t see it until its rebooted. For example, I can create a folder and access it from one server, but I dont see this folder on the RAID from the other server. If I reboot that other server, then I can see the folder on the RAID. Is there something missing or something I forgot to do during the setup?>> Any idea? > > Yes -- this is not how Lustre works, and you are probably badlycorrupting that file system with every modification.> > Lustre is not a shared-disk file system, in which lots of nodescooperate to> read and write a single shared LUN. Each Lustre node needs its own pool of > backend storage, and although they might be shared for the purpose offailover, two servers should never ever be using the same LUN at the same time. This is a recipe for guaranteed corruption.> > If you want to have two object servers plus a metadata server, you willneed> three separate LUNs or partitions. The client file system never talksdirectly to the disk; it communicates over a network to the server software> using the Lustre protocols. > > Running a file system client on the same node as an object server will work > for light testing, but is known to be unstable under heavy load, and sois not a supported configuration today. We are making several fixes right now> which will improve this situation, but they are not present in theversion that came with SLES9. So if you plan to run this in production with that code, your client nodes should be separate from your OSS nodes.> > I hope this helps-- > > -Phil > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >-- Rob Martin IT Services Manager Nexus Builders Group, Inc. 259 West Broadway Suite 100 Waukesha, WI 53186 262-650-2236 Direct 262-853-2339 Mobile rob.martin@nxsbg.com -- Rob Martin IT Services Manager Nexus Builders Group, Inc. 259 West Broadway Suite 100 Waukesha, WI 53186 262-650-2236 Direct 262-853-2339 Mobile rob.martin@nxsbg.com