thr3ads.net - Lustre discuss - [Lustre-discuss] Performance Expectations of Lustre [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Nick Jennings

2009-Jan-26 15:51 UTC

[Lustre-discuss] Performance Expectations of Lustre

Hello (and a special hello to all my ex-co-workers from the CFS days :)

  The company where I work now has grown fast in the past year and we 
suddenly find ourselves in need of a lot of storage. For 5 years the 
company ran on a 60gig server, last year we got a 1TB RAID that is now 
almost full. In 1-2 years we could easily be using 10-15TB of storage.

  Instead of just adding another 1TB server, I need to plan for a more 
scalable solution. Immediately Lustre came to mind, but I''m wondering 
about the performance. Basically our company does niche web-hosting for 
"Creative Professionals" so we need fast access to the data in order
to
have snappy web services for our clients. Typically these are smaller 
files (2MB pictures, 50MB videos, .swf files, etc.).

  Also I''m wondering about the best way set this up in terms of speed 
and ease of growth. I want the web-servers and the storage pool to be 
independent of each other. So I can add web-servers as the web traffic 
increases, and add more storage ass our storage needs grow. We have the 
option of an MD3000 or MD3000i for back-end storage.

  I was thinking initially we could start with 2 servers, both attached 
to the storage array. setup as OSS'' and functioning as (load balanced) 
web-servers as well. In the future I could separate this out so that we 
have the web-servers on the "front line" mounting the data from the
OSS''
which will be on a private (gigE) network.

  Now, it''s been years since I''ve played with Lustre,
I''m sure some
stuff will come back to me as I start using it again, other things I''ll
probably have to re-learn. I wanted to get some input from the Lustre 
community on whether or not this seems like a reasonable use for Lustre? 
Are there alternatives out there which might fit my needs more? 
(specifically speed and a shared storage pool). Also, what kind of 
performance can I expect, am I out of touch to expect something similar 
to a directly attached RAID array?

  I appreciate any and all feedback, suggestions, comments etc.

Thanks,
- Nick

--
Nick Jennings
Senior Programmer & Systems Administrator
Creative Motion Design
nick at creativemotiondesign.com

Brian J. Murrell

2009-Jan-26 16:48 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

On Mon, 2009-01-26 at 16:51 +0100, Nick Jennings wrote:> Hello (and a special hello to all my ex-co-workers from the CFS days :)
And MVD days too.  ;-)
>   The company where I work now has grown fast in the past year and we 
> suddenly find ourselves in need of a lot of storage. For 5 years the 
> company ran on a 60gig server, last year we got a 1TB RAID that is now 
> almost full. In 1-2 years we could easily be using 10-15TB of storage.
Good on y''all for keeping the storage industry busy.  :-)
>   Instead of just adding another 1TB server, I need to plan for a more 
> scalable solution. Immediately Lustre came to mind, but I''m
wondering
> about the performance. Basically our company does niche web-hosting for 
> "Creative Professionals" so we need fast access to the data in
order to
> have snappy web services for our clients. Typically these are smaller 
> files (2MB pictures, 50MB videos, .swf files, etc.).
Well, I''m not sure those files would fall within our general
classification of "small files" (wherein we know we don''t
perform very
well).  Our small-file issues are usually characterized by "kernel
builds" and ~ use, where files are usually much smaller than 1MB.
>   Also I''m wondering about the best way set this up in terms of
speed
> and ease of growth. I want the web-servers and the storage pool to be 
> independent of each other. So I can add web-servers as the web traffic 
> increases, and add more storage ass our storage needs grow.
Well, your web-servers would be Lustre clients.  There is no
relationship, or rather requirements in terms of the number of clients
and servers being used.  You use as many servers as your client load
demands.  So you could imagine both ends of the spectrum where only a
relatively few clients could be used to tax quite a few servers or the
opposite where a lot of clients with modest demand requires only a few
servers.
>   I was thinking initially we could start with 2 servers, both attached 
> to the storage array. setup as OSS'' and functioning as (load
balanced)
> web-servers as well.
Sounds like you are describing 2 storage servers, which would require at
least 3 servers total.  Don''t forget about the MDS.  Also
don''t forget
about HA if that''s a concern for you.  You could make the 2 OSSes
failover partners for each other if you are willing to accept a possibly
lower performance impact when one of the OSSes failing.

If HA is important to you however, you need to address an MDS failover
with a second server to pick up the MDT should the active MDS fail.

As for OSSes being web-servers, that would require the OSS/Webservers
also be clients and that is an unsupported configuration due to the risk
of deadlock due to memory pressure.  The recommended architecture would
be to make the webservers Lustre clients.
>   Now, it''s been years since I''ve played with Lustre,
I''m sure some
> stuff will come back to me as I start using it again, other things
I''ll
> probably have to re-learn. I wanted to get some input from the Lustre 
> community on whether or not this seems like a reasonable use for Lustre? 
It''s most certainly reasonable, if you make modifications to your
architecture as above.
> performance can I expect, am I out of touch to expect something similar 
> to a directly attached RAID array?
I think our generally talked about numbers are something on the order of
achieving 80% of the raw storage bandwidth (assuming a capable network
and so on).  Maybe somebody who is closer to the benchmarking that we
are constantly doing can comment further on how close-to-raw-disk we are
achieving lately.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090126/5576099e/attachment-0001.bin

Nick Jennings

2009-Jan-26 18:54 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

Hi Brian! Thanks for the reply, comments below

Brian J. Murrell wrote:>>   Instead of just adding another 1TB server, I need to plan for a more 
>> scalable solution. Immediately Lustre came to mind, but I''m
wondering
>> about the performance. Basically our company does niche web-hosting for
>> "Creative Professionals" so we need fast access to the data
in order to
>> have snappy web services for our clients. Typically these are smaller 
>> files (2MB pictures, 50MB videos, .swf files, etc.).
> 
> Well, I''m not sure those files would fall within our general
> classification of "small files" (wherein we know we
don''t perform very
> well).  Our small-file issues are usually characterized by "kernel
> builds" and ~ use, where files are usually much smaller than 1MB.
  Aha, OK well then that''s good to know. There''s also some
kind of
read-ahead and client side caching right? So files which are accessed a 
lot will be faster to access.

>>   Also I''m wondering about the best way set this up in terms
of speed
>> and ease of growth. I want the web-servers and the storage pool to be 
>> independent of each other. So I can add web-servers as the web traffic 
>> increases, and add more storage ass our storage needs grow.
> 
> Well, your web-servers would be Lustre clients.  There is no
> relationship, or rather requirements in terms of the number of clients
> and servers being used.  You use as many servers as your client load
> demands.  So you could imagine both ends of the spectrum where only a
> relatively few clients could be used to tax quite a few servers or the
> opposite where a lot of clients with modest demand requires only a few
> servers.
> 
>>   I was thinking initially we could start with 2 servers, both attached
>> to the storage array. setup as OSS'' and functioning as (load
balanced)
>> web-servers as well.
> 
> Sounds like you are describing 2 storage servers, which would require at
> least 3 servers total.  Don''t forget about the MDS.  Also
don''t forget
> about HA if that''s a concern for you.  You could make the 2 OSSes
> failover partners for each other if you are willing to accept a possibly
> lower performance impact when one of the OSSes failing.
>
> If HA is important to you however, you need to address an MDS failover
> with a second server to pick up the MDT should the active MDS fail.
HA is definitely critical, if the storage pool becomes inaccessible we 
loose clients (and all fingers point at me!). However, I need to find a 
reasonable balance between cost / scalability / performance. The idea 
would be to start small, with the simplest configuration, but allow for 
a lot of growth. In a years time, if we are using 5TB of data, we will 
be in a very good position financially and can afford a systems expansion.

So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client 
node? Is it a smart thing to do to have the MDS and OSS share the same 
storage target (just a separate partition for the MDS)? What kind of 
system specs are advisable for each type (MDS, OSS & Client node) as far 
as RAM, CPU, disk configuration etc? Also, is it possible to add more 
OSS'' to take over existing OSTs that another OSS was previously 
managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), 
and the OSS is getting hammered, I set another OSS up and hand off 2 or 
3 OSTs from the old OSS to the new one, and set it up as failover for 
the remaining OSTs. Do-able?


> As for OSSes being web-servers, that would require the OSS/Webservers
> also be clients and that is an unsupported configuration due to the risk
> of deadlock due to memory pressure.  The recommended architecture would
> be to make the webservers Lustre clients.
I see, so from the get-go I''m going to need an internal gigE network
for
OSS/Client communication.

>> performance can I expect, am I out of touch to expect something similar
>> to a directly attached RAID array?
> 
> I think our generally talked about numbers are something on the order of
> achieving 80% of the raw storage bandwidth (assuming a capable network
> and so on).  Maybe somebody who is closer to the benchmarking that we
> are constantly doing can comment further on how close-to-raw-disk we are
> achieving lately.
Is it safe to say my bottleneck is going to be the OSS & not the 
network? Is there some documentation I can read about typical setups, 
usage cases & methods for optimal performance?

Thanks!
-Nick

Balagopal Pillai

2009-Jan-26 19:24 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

MD3000 series doesn''t seem to have raid 6 support, which could be very 
useful with lots of sata drives.
Also MD3000i doesn''t specify LACP support for the dual or quad Ethernet
ports on the enclosure. But
a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit 
from good ethernet bonding support in Linux.
I have a setup with eight MD1000s on two perc 5''s on two OSS.


Balagopal

Nick Jennings wrote:> Hi Brian! Thanks for the reply, comments below
>
> Brian J. Murrell wrote:
>   
>>>   Instead of just adding another 1TB server, I need to plan for a
more
>>> scalable solution. Immediately Lustre came to mind, but
I''m wondering
>>> about the performance. Basically our company does niche web-hosting
for
>>> "Creative Professionals" so we need fast access to the
data in order to
>>> have snappy web services for our clients. Typically these are
smaller
>>> files (2MB pictures, 50MB videos, .swf files, etc.).
>>>       
>> Well, I''m not sure those files would fall within our general
>> classification of "small files" (wherein we know we
don''t perform very
>> well).  Our small-file issues are usually characterized by "kernel
>> builds" and ~ use, where files are usually much smaller than 1MB.
>>     
>
>   Aha, OK well then that''s good to know. There''s also
some kind of
> read-ahead and client side caching right? So files which are accessed a 
> lot will be faster to access.
>
>
>   
>>>   Also I''m wondering about the best way set this up in
terms of speed
>>> and ease of growth. I want the web-servers and the storage pool to
be
>>> independent of each other. So I can add web-servers as the web
traffic
>>> increases, and add more storage ass our storage needs grow.
>>>       
>> Well, your web-servers would be Lustre clients.  There is no
>> relationship, or rather requirements in terms of the number of clients
>> and servers being used.  You use as many servers as your client load
>> demands.  So you could imagine both ends of the spectrum where only a
>> relatively few clients could be used to tax quite a few servers or the
>> opposite where a lot of clients with modest demand requires only a few
>> servers.
>>
>>     
>>>   I was thinking initially we could start with 2 servers, both
attached
>>> to the storage array. setup as OSS'' and functioning as
(load balanced)
>>> web-servers as well.
>>>       
>> Sounds like you are describing 2 storage servers, which would require
at
>> least 3 servers total.  Don''t forget about the MDS.  Also
don''t forget
>> about HA if that''s a concern for you.  You could make the 2
OSSes
>> failover partners for each other if you are willing to accept a
possibly
>> lower performance impact when one of the OSSes failing.
>>
>> If HA is important to you however, you need to address an MDS failover
>> with a second server to pick up the MDT should the active MDS fail.
>>     
>
> HA is definitely critical, if the storage pool becomes inaccessible we 
> loose clients (and all fingers point at me!). However, I need to find a 
> reasonable balance between cost / scalability / performance. The idea 
> would be to start small, with the simplest configuration, but allow for 
> a lot of growth. In a years time, if we are using 5TB of data, we will 
> be in a very good position financially and can afford a systems expansion.
>
> So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client 
> node? Is it a smart thing to do to have the MDS and OSS share the same 
> storage target (just a separate partition for the MDS)? What kind of 
> system specs are advisable for each type (MDS, OSS & Client node) as
far
> as RAM, CPU, disk configuration etc? Also, is it possible to add more 
> OSS'' to take over existing OSTs that another OSS was previously 
> managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), 
> and the OSS is getting hammered, I set another OSS up and hand off 2 or 
> 3 OSTs from the old OSS to the new one, and set it up as failover for 
> the remaining OSTs. Do-able?
>
>
>
>   
>> As for OSSes being web-servers, that would require the OSS/Webservers
>> also be clients and that is an unsupported configuration due to the
risk
>> of deadlock due to memory pressure.  The recommended architecture would
>> be to make the webservers Lustre clients.
>>     
>
> I see, so from the get-go I''m going to need an internal gigE
network for
> OSS/Client communication.
>
>
>   
>>> performance can I expect, am I out of touch to expect something
similar
>>> to a directly attached RAID array?
>>>       
>> I think our generally talked about numbers are something on the order
of
>> achieving 80% of the raw storage bandwidth (assuming a capable network
>> and so on).  Maybe somebody who is closer to the benchmarking that we
>> are constantly doing can comment further on how close-to-raw-disk we
are
>> achieving lately.
>>     
>
> Is it safe to say my bottleneck is going to be the OSS & not the 
> network? Is there some documentation I can read about typical setups, 
> usage cases & methods for optimal performance?
>
> Thanks!
> -Nick
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Nick Jennings

2009-Jan-26 19:36 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

Thank you very much for this feedback Balagopal, it''s extremely useful.
I will look into the MD1000 and revise my plan.
-Nick

Balagopal Pillai wrote:> MD3000 series doesn''t seem to have raid 6 support, which could be
very
> useful with lots of sata drives.
> Also MD3000i doesn''t specify LACP support for the dual or quad
Ethernet
> ports on the enclosure. But
> a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit 
> from good ethernet bonding support in Linux.
> I have a setup with eight MD1000s on two perc 5''s on two OSS.
> 
> 
> Balagopal
> 
> Nick Jennings wrote:
>> Hi Brian! Thanks for the reply, comments below
>>
>> Brian J. Murrell wrote:
>>   
>>>>   Instead of just adding another 1TB server, I need to plan for
a more
>>>> scalable solution. Immediately Lustre came to mind, but
I''m wondering
>>>> about the performance. Basically our company does niche
web-hosting for
>>>> "Creative Professionals" so we need fast access to
the data in order to
>>>> have snappy web services for our clients. Typically these are
smaller
>>>> files (2MB pictures, 50MB videos, .swf files, etc.).
>>>>       
>>> Well, I''m not sure those files would fall within our
general
>>> classification of "small files" (wherein we know we
don''t perform very
>>> well).  Our small-file issues are usually characterized by
"kernel
>>> builds" and ~ use, where files are usually much smaller than
1MB.
>>>     
>>   Aha, OK well then that''s good to know. There''s also
some kind of
>> read-ahead and client side caching right? So files which are accessed a
>> lot will be faster to access.
>>
>>
>>   
>>>>   Also I''m wondering about the best way set this up in
terms of speed
>>>> and ease of growth. I want the web-servers and the storage pool
to be
>>>> independent of each other. So I can add web-servers as the web
traffic
>>>> increases, and add more storage ass our storage needs grow.
>>>>       
>>> Well, your web-servers would be Lustre clients.  There is no
>>> relationship, or rather requirements in terms of the number of
clients
>>> and servers being used.  You use as many servers as your client
load
>>> demands.  So you could imagine both ends of the spectrum where only
a
>>> relatively few clients could be used to tax quite a few servers or
the
>>> opposite where a lot of clients with modest demand requires only a
few
>>> servers.
>>>
>>>     
>>>>   I was thinking initially we could start with 2 servers, both
attached
>>>> to the storage array. setup as OSS'' and functioning as
(load balanced)
>>>> web-servers as well.
>>>>       
>>> Sounds like you are describing 2 storage servers, which would
require at
>>> least 3 servers total.  Don''t forget about the MDS.  Also
don''t forget
>>> about HA if that''s a concern for you.  You could make the
2 OSSes
>>> failover partners for each other if you are willing to accept a
possibly
>>> lower performance impact when one of the OSSes failing.
>>>
>>> If HA is important to you however, you need to address an MDS
failover
>>> with a second server to pick up the MDT should the active MDS fail.
>>>     
>> HA is definitely critical, if the storage pool becomes inaccessible we 
>> loose clients (and all fingers point at me!). However, I need to find a
>> reasonable balance between cost / scalability / performance. The idea 
>> would be to start small, with the simplest configuration, but allow for
>> a lot of growth. In a years time, if we are using 5TB of data, we will 
>> be in a very good position financially and can afford a systems
expansion.
>>
>> So for starters, what can I get away with here? 1 OSS, 1MDS & 1
Client
>> node? Is it a smart thing to do to have the MDS and OSS share the same 
>> storage target (just a separate partition for the MDS)? What kind of 
>> system specs are advisable for each type (MDS, OSS & Client node)
as far
>> as RAM, CPU, disk configuration etc? Also, is it possible to add more 
>> OSS'' to take over existing OSTs that another OSS was
previously
>> managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), 
>> and the OSS is getting hammered, I set another OSS up and hand off 2 or
>> 3 OSTs from the old OSS to the new one, and set it up as failover for 
>> the remaining OSTs. Do-able?
>>
>>
>>
>>   
>>> As for OSSes being web-servers, that would require the
OSS/Webservers
>>> also be clients and that is an unsupported configuration due to the
risk
>>> of deadlock due to memory pressure.  The recommended architecture
would
>>> be to make the webservers Lustre clients.
>>>     
>> I see, so from the get-go I''m going to need an internal gigE
network for
>> OSS/Client communication.
>>
>>
>>   
>>>> performance can I expect, am I out of touch to expect something
similar
>>>> to a directly attached RAID array?
>>>>       
>>> I think our generally talked about numbers are something on the
order of
>>> achieving 80% of the raw storage bandwidth (assuming a capable
network
>>> and so on).  Maybe somebody who is closer to the benchmarking that
we
>>> are constantly doing can comment further on how close-to-raw-disk
we are
>>> achieving lately.
>>>     
>> Is it safe to say my bottleneck is going to be the OSS & not the 
>> network? Is there some documentation I can read about typical setups, 
>> usage cases & methods for optimal performance?
>>
>> Thanks!
>> -Nick
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Daniel Leaberry

2009-Jan-26 20:03 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

Nick Jennings wrote:> Hello (and a special hello to all my ex-co-workers from the CFS days :)
> 
>   The company where I work now has grown fast in the past year and we 
> suddenly find ourselves in need of a lot of storage. For 5 years the 
> company ran on a 60gig server, last year we got a 1TB RAID that is now 
> almost full. In 1-2 years we could easily be using 10-15TB of storage.
> 
>   Instead of just adding another 1TB server, I need to plan for a more 
> scalable solution. Immediately Lustre came to mind, but I''m
wondering
> about the performance. Basically our company does niche web-hosting for 
> "Creative Professionals" so we need fast access to the data in
order to
> have snappy web services for our clients. Typically these are smaller 
> files (2MB pictures, 50MB videos, .swf files, etc.).
<snip>

I''m going to send you down a different direction based on my
experience.

We run a 90TB lustre array on datadirect storage and while it works well 
we picked a different design for our website storage. We did this 
because although lustre works well it didn''t provide the robustness we 
needed with a website. This is no slight to the lustre team, just what I 
have observed over the last 2 years of lustre in production. 
Specifically failover takes time and locks the filesystem.

For our web storage we use mogilefs. We serve images (about 50 million 
and growing) and have 150TB of storage. It''s never been a problem,
it''s
written in perl and easy to follow the code, numerous other websites use 
it and it works. The only downside is mogilefs uses an api and there is 
no direct filesystem access. This is managable in a web infrastructure 
though. The benefits of lustre are speed and being able to take a 
pounding from clients. Neither is necessary in a web environment where 
if you''re lucky you''ll push 100 mbit/sec.

Again, I have large instances of both lustre and mogilefs. For a 4-5 
nines website with people pointing fingers at me if it breaks I would go 
with mogile. For a backend production system that needs to push 500+ 
MB/sec from 150 processing nodes, go with lustre.

Daniel

Wojciech Turek

2009-Jan-26 22:26 UTC

head link

Re: Performance Expectations of Lustre

FYI

Dell MD3000 storage supports RAID6 but you need to uprade RAID
controllers with the latest firmware to Version:07.35.22.60, A06

You can
download it from Dell support website. 

--

Initial Release of Firmware Generation 2 Featuring the Following
Enhancements

- Supports greater than 2TB LUNs 

- Added RAID6 support

- Enhanced IPv6 support for all ports

- Included Smart Battery (Smart BBU) management

- Enabled SNTP on management port

- Increased number of snapshots and volume copies per volume from 4 to
8 (an additional Premium Feature Key required) 

--

Best Regards,

Wojciech Turek,

Nick Jennings wrote:

Thank you very much for this feedback Balagopal, it''s extremely useful.
I will look into the MD1000 and revise my plan.
-Nick

Balagopal Pillai wrote:

MD3000 series doesn''t seem to have raid 6 support, which could be very 
useful with lots of sata drives.
Also MD3000i doesn''t specify LACP support for the dual or quad Ethernet
ports on the enclosure. But
a pe1950 + perc 6 with MD1000 has raid 6 support and the OSS can benefit 
from good ethernet bonding support in Linux.
I have a setup with eight MD1000s on two perc 5''s on two OSS.

Balagopal

Nick Jennings wrote:

Hi Brian! Thanks for the reply, comments below

Brian J. Murrell wrote:

  Instead of just adding another 1TB server, I need to plan for a more 
scalable solution. Immediately Lustre came to mind, but I''m wondering 
about the performance. Basically our company does niche web-hosting for 
"Creative Professionals" so we need fast access to the data in order
to
have snappy web services for our clients. Typically these are smaller 
files (2MB pictures, 50MB videos, .swf files, etc.).

Well, I''m not sure those files would fall within our general
classification of "small files" (wherein we know we don''t
perform very
well).  Our small-file issues are usually characterized by "kernel
builds" and ~ use, where files are usually much smaller than 1MB.

  Aha, OK well then that''s good to know. There''s also some
kind of
read-ahead and client side caching right? So files which are accessed a 
lot will be faster to access.

  Also I''m wondering about the best way set this up in terms of speed 
and ease of growth. I want the web-servers and the storage pool to be 
independent of each other. So I can add web-servers as the web traffic 
increases, and add more storage ass our storage needs grow.

Well, your web-servers would be Lustre clients.  There is no
relationship, or rather requirements in terms of the number of clients
and servers being used.  You use as many servers as your client load
demands.  So you could imagine both ends of the spectrum where only a
relatively few clients could be used to tax quite a few servers or the
opposite where a lot of clients with modest demand requires only a few
servers.

  I was thinking initially we could start with 2 servers, both attached 
to the storage array. setup as OSS'' and functioning as (load balanced) 
web-servers as well.

Sounds like you are describing 2 storage servers, which would require at
least 3 servers total.  Don''t forget about the MDS.  Also
don''t forget
about HA if that''s a concern for you.  You could make the 2 OSSes
failover partners for each other if you are willing to accept a possibly
lower performance impact when one of the OSSes failing.

If HA is important to you however, you need to address an MDS failover
with a second server to pick up the MDT should the active MDS fail.

HA is definitely critical, if the storage pool becomes inaccessible we 
loose clients (and all fingers point at me!). However, I need to find a 
reasonable balance between cost / scalability / performance. The idea 
would be to start small, with the simplest configuration, but allow for 
a lot of growth. In a years time, if we are using 5TB of data, we will 
be in a very good position financially and can afford a systems expansion.

So for starters, what can I get away with here? 1 OSS, 1MDS &amp; 1 Client 
node? Is it a smart thing to do to have the MDS and OSS share the same 
storage target (just a separate partition for the MDS)? What kind of 
system specs are advisable for each type (MDS, OSS &amp; Client node) as far
as RAM, CPU, disk configuration etc? Also, is it possible to add more 
OSS'' to take over existing OSTs that another OSS was previously 
managing? ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), 
and the OSS is getting hammered, I set another OSS up and hand off 2 or 
3 OSTs from the old OSS to the new one, and set it up as failover for 
the remaining OSTs. Do-able?

As for OSSes being web-servers, that would require the OSS/Webservers
also be clients and that is an unsupported configuration due to the risk
of deadlock due to memory pressure.  The recommended architecture would
be to make the webservers Lustre clients.

I see, so from the get-go I''m going to need an internal gigE network
for
OSS/Client communication.

performance can I expect, am I out of touch to expect something similar 
to a directly attached RAID array?

I think our generally talked about numbers are something on the order of
achieving 80% of the raw storage bandwidth (assuming a capable network
and so on).  Maybe somebody who is closer to the benchmarking that we
are constantly doing can comment further on how close-to-raw-disk we are
achieving lately.

Is it safe to say my bottleneck is going to be the OSS &amp; not the 
network? Is there some documentation I can read about typical setups, 
usage cases &amp; methods for optimal performance?

Thanks!
-Nick
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2009-Jan-27 09:30 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

On Jan 26, 2009  16:51 +0100, Nick Jennings wrote:>   The company where I work now has grown fast in the past year and we 
> suddenly find ourselves in need of a lot of storage. For 5 years the 
> company ran on a 60gig server, last year we got a 1TB RAID that is now 
> almost full. In 1-2 years we could easily be using 10-15TB of storage.
Nick,
to be honest, I wouldn''t necessarily recommend Lustre for a relatively
small installation like this.  The main benefit of using Lustre is
that it scales the IO bandwidth very well with additional OSS nodes,
but more nodes (and more complexity) also add more points of failure.

If you don''t need more bandwidth and/or size than can be easily served
from a single node then you can use something like NFS with a single
ext3 16TB filesystem today.

You didn''t mention the number of web servers that will be accessing the
filesystem, and of course lots of clients can bring an NFS server to
its knees, so that is definitely also something to consider.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Brian J. Murrell

2009-Jan-27 18:29 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

On Mon, 2009-01-26 at 19:54 +0100, Nick Jennings wrote:
>   Aha, OK well then that''s good to know. There''s also
some kind of
> read-ahead and client side caching right?
Indeed.  Both of those exist.
> So files which are accessed a 
> lot will be faster to access.
Yes, unless locks get revoked and the cache has to be flushed and/or
invalidated.  i.e. one client cannot cache (a portion of a file) that
another client updates, for obvious reasons.
> HA is definitely critical, if the storage pool becomes inaccessible we 
> loose clients (and all fingers point at me!).
Usual case.
> So for starters, what can I get away with here? 1 OSS, 1MDS & 1 Client 
> node? Is it a smart thing to do to have the MDS and OSS share the same 
> storage target (just a separate partition for the MDS)?
It''s less than ideal.  You will have the MDS and OSS competing for
resources in the failover case.
> What kind of 
> system specs are advisable for each type (MDS, OSS & Client node) as
far
> as RAM, CPU, disk configuration etc?
That''s completely subjective to the performance requirements you have.
Lots of RAM is good on the MDS for caching and soon, lots of RAM will be
good for caching on the OSS too.  And lots of RAM on the clients are
good also.  Lots of RAM everywhere.  :-)  OSS CPU requirements are
usually quite modest.  The MDS is helped by some CPU though.
> Also, is it possible to add more 
> OSS'' to take over existing OSTs that another OSS was previously 
> managing?
Sure.
>  ie. if I have the MD3000i split into 5x1TB volumes (5xOSTs), 
> and the OSS is getting hammered, I set another OSS up and hand off 2 or 
> 3 OSTs from the old OSS to the new one, and set it up as failover for 
> the remaining OSTs. Do-able?
Most definitely.  You will just need to regenerate the config so that
the clients know where they have been moved to.
> I see, so from the get-go I''m going to need an internal gigE
network for
> OSS/Client communication.
Yeah.
> Is it safe to say my bottleneck is going to be the OSS & not the 
> network?
I guess that depends on the quality of your Gige.  If you assume, say
80% of the Gige bandwidth, that''s 100MB/s, yes?  Depending on how many
disks you give the OSS and what kind of interconnect you use to the
disk, and what kind of bus you put the HBA and Gige cards into, you
could certainly wind up with a network bottleneck.
> Is there some documentation I can read about typical setups, 
> usage cases & methods for optimal performance?
Well, the ops manual is probably a good place to start.
manual.lustre.org.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090127/feaf20ad/attachment.bin

Arden Wiebe

2009-Jan-28 08:21 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

Nick:

In case I had a capitalization in the links I sent you mixed up. 
http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html and
http://www.ioio.ca/Lustre-tcp-bonding/images.html should work.  Go easy on my
old girl she only has one processor and is a complete hack to recover data after
root stroke and jail riot last year on main drive that I couldn''t
salvage.  Pity it had the only copy of the code I needed yesterday.

Aside from the webserver it originates from it should put a pretty clear visual
into how far you can take it for roughly how much TCO.  Again with valid points
for tuning to small file size as best as possible.  If you would like to see a
specific small file benchmark from some view I would do my best to produce if
you tell me what to write.

Ardenly;

Arden Wiebe  

 
--- On Mon, 1/26/09, Nick Jennings <nick at creativemotiondesign.com>
wrote:

From: Nick Jennings <nick at creativemotiondesign.com>
Subject: [Lustre-discuss] Performance Expectations of Lustre
To: lustre-discuss at lists.lustre.org
Date: Monday, January 26, 2009, 7:51 AM

Hello (and a special hello to all my ex-co-workers from the CFS days :)

? The company where I work now has grown fast in the past year and we 
suddenly find ourselves in need of a lot of storage. For 5 years the 
company ran on a 60gig server, last year we got a 1TB RAID that is now 
almost full. In 1-2 years we could easily be using 10-15TB of storage.

? Instead of just adding another 1TB server, I need to plan for a more 
scalable solution. Immediately Lustre came to mind, but I''m wondering 
about the performance. Basically our company does niche web-hosting for 
"Creative Professionals" so we need fast access to the data in order
to
have snappy web services for our clients. Typically these are smaller 
files (2MB pictures, 50MB videos, .swf files, etc.).

? Also I''m wondering about the best way set this up in terms of speed 
and ease of growth. I want the web-servers and the storage pool to be 
independent of each other. So I can add web-servers as the web traffic 
increases, and add more storage ass our storage needs grow. We have the 
option of an MD3000 or MD3000i for back-end storage.

? I was thinking initially we could start with 2 servers, both attached 
to the storage array. setup as OSS'' and functioning as (load balanced) 
web-servers as well. In the future I could separate this out so that we 
have the web-servers on the "front line" mounting the data from the
OSS''
which will be on a private (gigE) network.

? Now, it''s been years since I''ve played with Lustre,
I''m sure some
stuff will come back to me as I start using it again, other things I''ll
probably have to re-learn. I wanted to get some input from the Lustre 
community on whether or not this seems like a reasonable use for Lustre? 
Are there alternatives out there which might fit my needs more? 
(specifically speed and a shared storage pool). Also, what kind of 
performance can I expect, am I out of touch to expect something similar 
to a directly attached RAID array?

? I appreciate any and all feedback, suggestions, comments etc.

Thanks,
- Nick

--
Nick Jennings
Senior Programmer & Systems Administrator
Creative Motion Design
nick at creativemotiondesign.com
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Arden Wiebe

2009-Jan-28 08:42 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

Nick:

On another note I just had to do a mysqlcheck -p --auto-repair on a 23266 table
database tonight so probably not a good idea doing direct copies of
/var/lib/mysql to the lustre filesystem.  Correlated or not would be better to
mysqldump there instead.

Ardently;

Arden Wiebe

--- On Mon, 1/26/09, Nick Jennings <nick at creativemotiondesign.com>
wrote:

From: Nick Jennings <nick at creativemotiondesign.com>
Subject: [Lustre-discuss] Performance Expectations of Lustre
To: lustre-discuss at lists.lustre.org
Date: Monday, January 26, 2009, 7:51 AM

Hello (and a special hello to all my ex-co-workers from the CFS days :)

? The company where I work now has grown fast in the past year and we 
suddenly find ourselves in need of a lot of storage. For 5 years the 
company ran on a 60gig server, last year we got a 1TB RAID that is now 
almost full. In 1-2 years we could easily be using 10-15TB of storage.

? Instead of just adding another 1TB server, I need to plan for a more 
scalable solution. Immediately Lustre came to mind, but I''m wondering 
about the performance. Basically our company does niche web-hosting for 
"Creative Professionals" so we need fast access to the data in order
to
have snappy web services for our clients. Typically these are smaller 
files (2MB pictures, 50MB videos, .swf files, etc.).

? Also I''m wondering about the best way set this up in terms of speed 
and ease of growth. I want the web-servers and the storage pool to be 
independent of each other. So I can add web-servers as the web traffic 
increases, and add more storage ass our storage needs grow. We have the 
option of an MD3000 or MD3000i for back-end storage.

? I was thinking initially we could start with 2 servers, both attached 
to the storage array. setup as OSS'' and functioning as (load balanced) 
web-servers as well. In the future I could separate this out so that we 
have the web-servers on the "front line" mounting the data from the
OSS''
which will be on a private (gigE) network.

? Now, it''s been years since I''ve played with Lustre,
I''m sure some
stuff will come back to me as I start using it again, other things I''ll
probably have to re-learn. I wanted to get some input from the Lustre 
community on whether or not this seems like a reasonable use for Lustre? 
Are there alternatives out there which might fit my needs more? 
(specifically speed and a shared storage pool). Also, what kind of 
performance can I expect, am I out of touch to expect something similar 
to a directly attached RAID array?

? I appreciate any and all feedback, suggestions, comments etc.

Thanks,
- Nick

--
Nick Jennings
Senior Programmer & Systems Administrator
Creative Motion Design
nick at creativemotiondesign.com
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Nick Jennings

2009-Jan-29 15:12 UTC

head link

[Lustre-discuss] Performance Expectations of Lustre

Hi Andreas,

Andreas Dilger wrote:> On Jan 26, 2009  16:51 +0100, Nick Jennings wrote:
>>   The company where I work now has grown fast in the past year and we 
>> suddenly find ourselves in need of a lot of storage. For 5 years the 
>> company ran on a 60gig server, last year we got a 1TB RAID that is now 
>> almost full. In 1-2 years we could easily be using 10-15TB of storage.
> 
> Nick,
> to be honest, I wouldn''t necessarily recommend Lustre for a
relatively
> small installation like this.  The main benefit of using Lustre is
> that it scales the IO bandwidth very well with additional OSS nodes,
> but more nodes (and more complexity) also add more points of failure.
> 
> If you don''t need more bandwidth and/or size than can be easily
served
> from a single node then you can use something like NFS with a single
> ext3 16TB filesystem today.
>
> You didn''t mention the number of web servers that will be
accessing the
> filesystem, and of course lots of clients can bring an NFS server to
> its knees, so that is definitely also something to consider.
  Thanks for your input. I am starting to re-think my strategy here, 
though I''ve got to make a decision sometime very soon. I''ve
considered
GFS to manage the file locking, but am not sure I want to commit to it. 
There''s also ZFS (Sun) & OCFS (Oracle) which I''ve only
just started
reading about. (NOTE: If anyone has any input on these file systems I''d
be interested to hear it).

  NFS would be the simplest migration method, but offers the least 
amount of scalability. We are currently close to maxing out the 
resources of our single server (it''s our web server, database server, 
mail server and DNS server), so we will most likely be scaling our 
infrastructure to 3-4 nodes over the course of the year, all of which 
will need access to the NFS server (but perhaps only 2-3 really hitting 
it hard), I think even with 2-3 web-nodes hitting the NFS server, I''m 
going to be sorry I switched to NFS before next Christmas :)

  There''s also MogileFS which Daniel Leaberry pointed me too (thanks
for
the tip!) and I''ve been reading about it as well, but it''s
likely going
to be a fair amount of work to re-write a bunch of our legacy code 
(written by a developer who is no longer with the company) to access 
files via. the Mogile API. Not entirely impossible, just not my idea of 
a good time!

  So after giving it some thought I think Lustre might require too much 
of an initial investment, while being a bit overkill for the task at 
hand. It''s too bad as I was looking forward to the idea.

-Nick

Lustre discuss - Jan 2009 - Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

Re: Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre

[Lustre-discuss] Performance Expectations of Lustre