thr3ads.net - Lustre discuss - [Lustre-discuss] Clustering problems

If this information is useful, please help other people find it:
Share via:

Phil Schwan

2006-May-19 07:36 UTC

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

Hi Gabe--

On 12/9/2004 15:55, Gabriel Afana wrote:
> I have SuSE Enterprise Server 9 running on two AMD Opteron servers.
> I have an external RAID array with one large LUN on it that I need
> shared between the two nodes. I am using the built-in Lustre
> clustering software, but I am having an issue with it. I''ve set it
all
> up and everything went smoothly, but the nodes aren''t updating
each
> others file system.
> 
> Both servers can mount and access external RAID ok, but the problem
> is when I make a change on the RAID partition fron one server, the
> other server doesn''t see it until its rebooted. For example, I can
> create a folder and access it from one server, but I dont see this
> folder on the RAID from the other server. If I reboot that other
> server, then I can see the folder on the RAID.  Is there something
> missing or something I forgot to do during the setup?
> Any idea?
Yes -- this is not how Lustre works, and you are probably badly corrupting
that file system with every modification.

Lustre is not a shared-disk file system, in which lots of nodes cooperate to
read and write a single shared LUN.  Each Lustre node needs its own pool of
backend storage, and although they might be shared for the purpose of
failover, two servers should never ever be using the same LUN at the same
time.  This is a recipe for guaranteed corruption.

If you want to have two object servers plus a metadata server, you will need
three separate LUNs or partitions.  The client file system never talks
directly to the disk; it communicates over a network to the server software
using the Lustre protocols.

Running a file system client on the same node as an object server will work
for light testing, but is known to be unstable under heavy load, and so is
not a supported configuration today.  We are making several fixes right now
which will improve this situation, but they are not present in the version
that came with SLES9.  So if you plan to run this in production with that
code, your client nodes should be separate from your OSS nodes.

I hope this helps--

-Phil

Gabriel Afana

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

Hi Phil,
> > I have SuSE Enterprise Server 9 running on two AMD Opteron servers.
> > I have an external RAID array with one large LUN on it that I need
> > shared between the two nodes. I am using the built-in Lustre
> > clustering software, but I am having an issue with it. I''ve
set it all
> > up and everything went smoothly, but the nodes aren''t
updating each
> > others file system.
> >
> > Both servers can mount and access external RAID ok, but the problem
> > is when I make a change on the RAID partition fron one server, the
> > other server doesn''t see it until its rebooted. For example,
I can
> > create a folder and access it from one server, but I dont see this
> > folder on the RAID from the other server. If I reboot that other
> > server, then I can see the folder on the RAID.  Is there something
> > missing or something I forgot to do during the setup?
> > Any idea?
>
> Yes -- this is not how Lustre works, and you are probably badly corrupting
> that file system with every modification.
>
> Lustre is not a shared-disk file system, in which lots of nodes cooperate
to> read and write a single shared LUN.  Each Lustre node needs its own pool
of> backend storage, and although they might be shared for the purpose of
> failover, two servers should never ever be using the same LUN at the same
> time.  This is a recipe for guaranteed corruption.
>
> If you want to have two object servers plus a metadata server, you will
need> three separate LUNs or partitions.  The client file system never talks
> directly to the disk; it communicates over a network to the server
software> using the Lustre protocols.
>
> Running a file system client on the same node as an object server will
work> for light testing, but is known to be unstable under heavy load, and so is
> not a supported configuration today.  We are making several fixes right
now> which will improve this situation, but they are not present in the version
> that came with SLES9.  So if you plan to run this in production with that
> code, your client nodes should be separate from your OSS nodes.
>
> I hope this helps--
>
> -Phil
Ok, now that I have a much better understanding of exactly what Lustre is
and exactly how it works...I have a question.  Is Lustre right for me?  We
are doing a website that is going to be offering a premium email service.
Our setup now is 2 servers and 1 external RAID array.  As I understand it
isn''t good to have the OST and the client run on the same server, so
for
stable support, I would need an additional server for the OST to access the
RAID (two servers for the OST if I want fail-over support).  Also, we will
need an additional server to be the MDS (two servers for fail-over support).
So in order to have a stable cluster with fail-over support, I will need at
least 6 servers right?  I am wondering if this is practical for me because
our idea was to seperate our customers into blocks:

Customer 1-10,000 => Cluster block 1
Customers 10,001-20,000 => Cluster block 2
etc...etc.

Each cluster block would consist of everything needed to run the service
(two apache servers, 1 RAID array, two POP servers, two BOT servers...etc
etc).  This way we can keep the complexity of the cluster down, yet still be
scalable and in the event of a major failure in any cluster block, it will
only effect a portion of the customers.   Because of this, each cluster will
be small, as opposed to one huge cluster supporting all the customers.  This
is why I am wondering if Lustre is right for me.

My alternative thought was to add two more servers, DB server 1 and DB
server 2, and use those to connect directly to the RAID in an active/passive
setup using a fail-over cluster between those two servers.  Then, have the
other two servers be the apache/client servers and when they need to access
storage on the RAID, it will request it through the active DB server 1 (like
an OST).  This way the file system for the RAID is on the DB servers (not
the clients) so the two apaches request the data from the DB servers so this
way there can be one large LUN on the RAID.  This way I can achieve total
fail-over support with no single point of failure using only 4 servers and I
will still achieve the same as I would through the same setup in Lustre (in
which I would need to use 6 servers).  I know Lustre would provide failover
support for the clients, but I have a ServerIron XL load balancer for the
apache servers so that can handle the load balancing.

However, now I think, the load balancer is very expensive and would be
cheaper just to build two additional servers and use the Lustre cluster :-)
Do you think Lustre in one cluster with 6 servers would be better and more
reliable than 4 servers in the configuration explained above (using hardware
load balancing for the clients)?

Sorry for all the mumbo jumbo :-)  but I want to have a solid plan from the
beginning so I dont have to fool around with this stuff later.

THANKS EVERYBODY!

Gabe

Gabriel Afana

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

Two last questions, Can the MDS and OST be on the same server?  I know OST
and clients can''t run together due to instability, but if I can run the
MDS
and OST together that would make things easier for me for now until I can
build some more servers.

The other thing is can the MDS storage be on a small partition on the main
RAID array itself?  I mean, I have the 2 servers and the one external RAID
array...I am think of using the 2 servers as clients, use an additional
server as a MDS/OST server to share the storage on the RAID, but can I also
use the RAID for the shared MDS data pool?  Although it is not needed now,
in the near future I will build additional servers and seperate MDS and OST
and create fail-over servers for them, therefore I will need a shared data
pool for the MDS.  My RAID now is 1TB...thinking of making 1 small LUN for
the MDS (100GBs), then split the rest of the 900GBs down the middle for the
two OST servers (only 1 for now to start though).  Would that work or do I
need another seperate external raid for the storage pool of the MDS?

Thanks!

(promise...this is the last question!) :-)

Gabe
----- Original Message -----
From: "Phil Schwan" <phil@clusterfs.com>
To: "Gabriel Afana" <advertising@adtomi.com>
Cc: <lustre-discuss@lists.clusterfs.com>
Sent: Friday, December 10, 2004 6:28 AM
Subject: Re: [Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64
nodes

> Hi Gabe--
>
> On 12/9/2004 15:55, Gabriel Afana wrote:
>
> > I have SuSE Enterprise Server 9 running on two AMD Opteron servers.
> > I have an external RAID array with one large LUN on it that I need
> > shared between the two nodes. I am using the built-in Lustre
> > clustering software, but I am having an issue with it. I''ve
set it all
> > up and everything went smoothly, but the nodes aren''t
updating each
> > others file system.
> >
> > Both servers can mount and access external RAID ok, but the problem
> > is when I make a change on the RAID partition fron one server, the
> > other server doesn''t see it until its rebooted. For example,
I can
> > create a folder and access it from one server, but I dont see this
> > folder on the RAID from the other server. If I reboot that other
> > server, then I can see the folder on the RAID.  Is there something
> > missing or something I forgot to do during the setup?
> > Any idea?
>
> Yes -- this is not how Lustre works, and you are probably badly corrupting
> that file system with every modification.
>
> Lustre is not a shared-disk file system, in which lots of nodes cooperate
to> read and write a single shared LUN.  Each Lustre node needs its own pool
of> backend storage, and although they might be shared for the purpose of
> failover, two servers should never ever be using the same LUN at the same
> time.  This is a recipe for guaranteed corruption.
>
> If you want to have two object servers plus a metadata server, you will
need> three separate LUNs or partitions.  The client file system never talks
> directly to the disk; it communicates over a network to the server
software> using the Lustre protocols.
>
> Running a file system client on the same node as an object server will
work> for light testing, but is known to be unstable under heavy load, and so is
> not a supported configuration today.  We are making several fixes right
now> which will improve this situation, but they are not present in the version
> that came with SLES9.  So if you plan to run this in production with that
> code, your client nodes should be separate from your OSS nodes.
>
> I hope this helps--
>
> -Phil
>
>

Gabriel Afana

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

Hi,
I have SuSE Enterprise Server 9 running on two AMD Opteron servers.
I have an external RAID array with one large LUN on it that I need
shared between the two nodes. I am using the built-in Lustre
clustering software, but I am having an issue with it. I''ve set it all
up and everything went smoothly, but the nodes aren''t updating each
others file system.

Both servers can mount and access external RAID ok, but the problem
is when I make a change on the RAID partition fron one server, the
other server doesn''t see it until its rebooted. For example, I can
create a folder and access it from one server, but I dont see this
folder on the RAID from the other server. If I reboot that other
server, then I can see the folder on the RAID.  Is there something 
missing or something I forgot to do during the setup?
Any idea?

Gabe

Phil Schwan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

Hi Gabe--

On 12/13/2004 4:55, Gabriel Afana wrote:> 
> The other thing is can the MDS storage be on a small partition on the main
> RAID array itself?  I mean, I have the 2 servers and the one external RAID
> array...I am think of using the 2 servers as clients, use an additional
> server as a MDS/OST server to share the storage on the RAID, but can I also
> use the RAID for the shared MDS data pool?
Absolutely.  The Lustre servers just use normal Linux block devices, so you
can use whole "raw" disk device, a partition, or any other block
device.
> Although it is not needed now,
> in the near future I will build additional servers and seperate MDS and OST
> and create fail-over servers for them, therefore I will need a shared data
> pool for the MDS.  My RAID now is 1TB...thinking of making 1 small LUN for
> the MDS (100GBs), then split the rest of the 900GBs down the middle for the
> two OST servers (only 1 for now to start though).  Would that work or do I
> need another seperate external raid for the storage pool of the MDS?
Sounds reasonable to me.

-Phil

Phil Schwan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

Hi Gabe--

On 12/12/2004 17:14, Gabriel Afana wrote:> 
> Ok, now that I have a much better understanding of exactly what Lustre is
> and exactly how it works...I have a question.  Is Lustre right for me?  We
> are doing a website that is going to be offering a premium email service.
> Our setup now is 2 servers and 1 external RAID array.  As I understand it
> isn''t good to have the OST and the client run on the same server,
so for
> stable support, I would need an additional server for the OST to access the
> RAID (two servers for the OST if I want fail-over support).  Also, we will
> need an additional server to be the MDS (two servers for fail-over
support).
> So in order to have a stable cluster with fail-over support, I will need at
> least 6 servers right?
You can run a single node (or failover pair) that provides both MDS and OSS
services.  So I think you only need four nodes.

We''re making progress on the issues with running a client on the OSS,
so
there is light at the end of that tunnel, but we''re not there yet.

Hope that helps--

-Phil

Lustre discuss - May 2006 - Clustering problems - SuSE ES 9 w/ two AMD64 nodes

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes

[Lustre-discuss] Clustering problems - SuSE ES 9 w/ two AMD64 nodes