Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes
Hi Gabe-- On 12/9/2004 15:55, Gabriel Afana wrote:> I have SuSE Enterprise Server 9 running on two AMD Opteron servers. > I have an external RAID array with one large LUN on it that I need > shared between the two nodes. I am using the built-in Lustre > clustering software, but I am having an issue with it. I''ve set it all > up and everything went smoothly, but the nodes aren''t updating each > others file system. > > Both servers can mount and access external RAID ok, but the problem > is when I make a change on the RAID partition fron one server, the > other server doesn''t see it until its rebooted. For example, I can > create a folder and access it from one server, but I dont see this > folder on the RAID from the other server. If I reboot that other > server, then I can see the folder on the RAID. Is there something > missing or something I forgot to do during the setup? > Any idea?Yes -- this is not how Lustre works, and you are probably badly corrupting that file system with every modification. Lustre is not a shared-disk file system, in which lots of nodes cooperate to read and write a single shared LUN. Each Lustre node needs its own pool of backend storage, and although they might be shared for the purpose of failover, two servers should never ever be using the same LUN at the same time. This is a recipe for guaranteed corruption. If you want to have two object servers plus a metadata server, you will need three separate LUNs or partitions. The client file system never talks directly to the disk; it communicates over a network to the server software using the Lustre protocols. Running a file system client on the same node as an object server will work for light testing, but is known to be unstable under heavy load, and so is not a supported configuration today. We are making several fixes right now which will improve this situation, but they are not present in the version that came with SLES9. So if you plan to run this in production with that code, your client nodes should be separate from your OSS nodes. I hope this helps-- -Phil
Gabriel Afana
2006-May-19 07:36 UTC
[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes
Hi Phil,> > I have SuSE Enterprise Server 9 running on two AMD Opteron servers. > > I have an external RAID array with one large LUN on it that I need > > shared between the two nodes. I am using the built-in Lustre > > clustering software, but I am having an issue with it. I''ve set it all > > up and everything went smoothly, but the nodes aren''t updating each > > others file system. > > > > Both servers can mount and access external RAID ok, but the problem > > is when I make a change on the RAID partition fron one server, the > > other server doesn''t see it until its rebooted. For example, I can > > create a folder and access it from one server, but I dont see this > > folder on the RAID from the other server. If I reboot that other > > server, then I can see the folder on the RAID. Is there something > > missing or something I forgot to do during the setup? > > Any idea? > > Yes -- this is not how Lustre works, and you are probably badly corrupting > that file system with every modification. > > Lustre is not a shared-disk file system, in which lots of nodes cooperateto> read and write a single shared LUN. Each Lustre node needs its own poolof> backend storage, and although they might be shared for the purpose of > failover, two servers should never ever be using the same LUN at the same > time. This is a recipe for guaranteed corruption. > > If you want to have two object servers plus a metadata server, you willneed> three separate LUNs or partitions. The client file system never talks > directly to the disk; it communicates over a network to the serversoftware> using the Lustre protocols. > > Running a file system client on the same node as an object server willwork> for light testing, but is known to be unstable under heavy load, and so is > not a supported configuration today. We are making several fixes rightnow> which will improve this situation, but they are not present in the version > that came with SLES9. So if you plan to run this in production with that > code, your client nodes should be separate from your OSS nodes. > > I hope this helps-- > > -PhilOk, now that I have a much better understanding of exactly what Lustre is and exactly how it works...I have a question. Is Lustre right for me? We are doing a website that is going to be offering a premium email service. Our setup now is 2 servers and 1 external RAID array. As I understand it isn''t good to have the OST and the client run on the same server, so for stable support, I would need an additional server for the OST to access the RAID (two servers for the OST if I want fail-over support). Also, we will need an additional server to be the MDS (two servers for fail-over support). So in order to have a stable cluster with fail-over support, I will need at least 6 servers right? I am wondering if this is practical for me because our idea was to seperate our customers into blocks: Customer 1-10,000 => Cluster block 1 Customers 10,001-20,000 => Cluster block 2 etc...etc. Each cluster block would consist of everything needed to run the service (two apache servers, 1 RAID array, two POP servers, two BOT servers...etc etc). This way we can keep the complexity of the cluster down, yet still be scalable and in the event of a major failure in any cluster block, it will only effect a portion of the customers. Because of this, each cluster will be small, as opposed to one huge cluster supporting all the customers. This is why I am wondering if Lustre is right for me. My alternative thought was to add two more servers, DB server 1 and DB server 2, and use those to connect directly to the RAID in an active/passive setup using a fail-over cluster between those two servers. Then, have the other two servers be the apache/client servers and when they need to access storage on the RAID, it will request it through the active DB server 1 (like an OST). This way the file system for the RAID is on the DB servers (not the clients) so the two apaches request the data from the DB servers so this way there can be one large LUN on the RAID. This way I can achieve total fail-over support with no single point of failure using only 4 servers and I will still achieve the same as I would through the same setup in Lustre (in which I would need to use 6 servers). I know Lustre would provide failover support for the clients, but I have a ServerIron XL load balancer for the apache servers so that can handle the load balancing. However, now I think, the load balancer is very expensive and would be cheaper just to build two additional servers and use the Lustre cluster :-) Do you think Lustre in one cluster with 6 servers would be better and more reliable than 4 servers in the configuration explained above (using hardware load balancing for the clients)? Sorry for all the mumbo jumbo :-) but I want to have a solid plan from the beginning so I dont have to fool around with this stuff later. THANKS EVERYBODY! Gabe
Gabriel Afana
2006-May-19 07:36 UTC
[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes
Two last questions, Can the MDS and OST be on the same server? I know OST and clients can''t run together due to instability, but if I can run the MDS and OST together that would make things easier for me for now until I can build some more servers. The other thing is can the MDS storage be on a small partition on the main RAID array itself? I mean, I have the 2 servers and the one external RAID array...I am think of using the 2 servers as clients, use an additional server as a MDS/OST server to share the storage on the RAID, but can I also use the RAID for the shared MDS data pool? Although it is not needed now, in the near future I will build additional servers and seperate MDS and OST and create fail-over servers for them, therefore I will need a shared data pool for the MDS. My RAID now is 1TB...thinking of making 1 small LUN for the MDS (100GBs), then split the rest of the 900GBs down the middle for the two OST servers (only 1 for now to start though). Would that work or do I need another seperate external raid for the storage pool of the MDS? Thanks! (promise...this is the last question!) :-) Gabe ----- Original Message ----- From: "Phil Schwan" <phil@clusterfs.com> To: "Gabriel Afana" <advertising@adtomi.com> Cc: <lustre-discuss@lists.clusterfs.com> Sent: Friday, December 10, 2004 6:28 AM Subject: Re: [Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes> Hi Gabe-- > > On 12/9/2004 15:55, Gabriel Afana wrote: > > > I have SuSE Enterprise Server 9 running on two AMD Opteron servers. > > I have an external RAID array with one large LUN on it that I need > > shared between the two nodes. I am using the built-in Lustre > > clustering software, but I am having an issue with it. I''ve set it all > > up and everything went smoothly, but the nodes aren''t updating each > > others file system. > > > > Both servers can mount and access external RAID ok, but the problem > > is when I make a change on the RAID partition fron one server, the > > other server doesn''t see it until its rebooted. For example, I can > > create a folder and access it from one server, but I dont see this > > folder on the RAID from the other server. If I reboot that other > > server, then I can see the folder on the RAID. Is there something > > missing or something I forgot to do during the setup? > > Any idea? > > Yes -- this is not how Lustre works, and you are probably badly corrupting > that file system with every modification. > > Lustre is not a shared-disk file system, in which lots of nodes cooperateto> read and write a single shared LUN. Each Lustre node needs its own poolof> backend storage, and although they might be shared for the purpose of > failover, two servers should never ever be using the same LUN at the same > time. This is a recipe for guaranteed corruption. > > If you want to have two object servers plus a metadata server, you willneed> three separate LUNs or partitions. The client file system never talks > directly to the disk; it communicates over a network to the serversoftware> using the Lustre protocols. > > Running a file system client on the same node as an object server willwork> for light testing, but is known to be unstable under heavy load, and so is > not a supported configuration today. We are making several fixes rightnow> which will improve this situation, but they are not present in the version > that came with SLES9. So if you plan to run this in production with that > code, your client nodes should be separate from your OSS nodes. > > I hope this helps-- > > -Phil > >
Gabriel Afana
2006-May-19 07:36 UTC
[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes
Hi, I have SuSE Enterprise Server 9 running on two AMD Opteron servers. I have an external RAID array with one large LUN on it that I need shared between the two nodes. I am using the built-in Lustre clustering software, but I am having an issue with it. I''ve set it all up and everything went smoothly, but the nodes aren''t updating each others file system. Both servers can mount and access external RAID ok, but the problem is when I make a change on the RAID partition fron one server, the other server doesn''t see it until its rebooted. For example, I can create a folder and access it from one server, but I dont see this folder on the RAID from the other server. If I reboot that other server, then I can see the folder on the RAID. Is there something missing or something I forgot to do during the setup? Any idea? Gabe
Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes
Hi Gabe-- On 12/13/2004 4:55, Gabriel Afana wrote:> > The other thing is can the MDS storage be on a small partition on the main > RAID array itself? I mean, I have the 2 servers and the one external RAID > array...I am think of using the 2 servers as clients, use an additional > server as a MDS/OST server to share the storage on the RAID, but can I also > use the RAID for the shared MDS data pool?Absolutely. The Lustre servers just use normal Linux block devices, so you can use whole "raw" disk device, a partition, or any other block device.> Although it is not needed now, > in the near future I will build additional servers and seperate MDS and OST > and create fail-over servers for them, therefore I will need a shared data > pool for the MDS. My RAID now is 1TB...thinking of making 1 small LUN for > the MDS (100GBs), then split the rest of the 900GBs down the middle for the > two OST servers (only 1 for now to start though). Would that work or do I > need another seperate external raid for the storage pool of the MDS?Sounds reasonable to me. -Phil
Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes
Hi Gabe-- On 12/12/2004 17:14, Gabriel Afana wrote:> > Ok, now that I have a much better understanding of exactly what Lustre is > and exactly how it works...I have a question. Is Lustre right for me? We > are doing a website that is going to be offering a premium email service. > Our setup now is 2 servers and 1 external RAID array. As I understand it > isn''t good to have the OST and the client run on the same server, so for > stable support, I would need an additional server for the OST to access the > RAID (two servers for the OST if I want fail-over support). Also, we will > need an additional server to be the MDS (two servers for fail-over support). > So in order to have a stable cluster with fail-over support, I will need at > least 6 servers right?You can run a single node (or failover pair) that provides both MDS and OSS services. So I think you only need four nodes. We''re making progress on the issues with running a client on the OSS, so there is light at the end of that tunnel, but we''re not there yet. Hope that helps-- -Phil