thr3ads.net - Lustre discuss - [Lustre-discuss] OSS Service Thread Count [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Wojciech Turek

2009-Jan-25 23:56 UTC

[Lustre-discuss] OSS Service Thread Count

Hi,

My lustre system specs:
Lustre-1.6.6
RHEL4
2 lustre file systems:  one consists  of 4 OSTs and other consists  of 
20 OSTs
4 x OSS/6OSTs
Storage: S2A9500
Clients: 600
Interconnect: Ethernet

I noticed that  my OSSs sometimes report very high load (around 500). I 
read that increasing number of ost_num_threads may help in such situation.
I am trying to calculate the optimal number of OST threads for my OSSs. 
Each OSS has 16GB of RAM and 2 dual core CPUs.
In the Lustre manual I read:
"An OSS can have a maximum of 512 service threads and a minimum of 2 
service threads. The number of service threads is a function of how much 
RAM and how many CPUs are on each OSS node (1 thread / 128MB * num_cpus)."

So if I understand above statement correctly the equation to calculate 
number of OST threads will look like this:

ost_num_threads = (RAM_size*Number_of_cores)/128MB

For my particular case it gives 512 ost_num_threads which is the Lustre 
max number for this particular parameter. Manual says that each thread 
uses actually 1.5MB of RAM, so 768MB of RAM will be consumed on each of 
my OSSs for I/O threads.

So I guess with 16GB of RAM the initial (default) value of the 
ost_num_threads is already being set to 512, is that correct?

I know that adding more OSSs and OSTs might help but at the moment this 
isn''t an option for me.

Is there any other way I could lower down high load on the OSSs? Can 
tuning client side help?

Best regards,

Wojciech

Oleg Drokin

2009-Jan-26 05:01 UTC

head link

[Lustre-discuss] OSS Service Thread Count

Hello!

On Jan 25, 2009, at 6:56 PM, Wojciech Turek wrote:
> For my particular case it gives 512 ost_num_threads which is the  
> Lustre
> max number for this particular parameter. Manual says that each thread
> uses actually 1.5MB of RAM, so 768MB of RAM will be consumed on each  
> of
> my OSSs for I/O threads.
> So I guess with 16GB of RAM the initial (default) value of the
> ost_num_threads is already being set to 512, is that correct?
> I know that adding more OSSs and OSTs might help but at the moment  
> this
> isn''t an option for me.
> Is there any other way I could lower down high load on the OSSs? Can
> tuning client side help?
To decrease the load you actually want to decrease the number of OST  
threads
(ost_num_threads module parameter to ost.ko module).
Essentially what is happening is your drives are only able to sustain  
certain
amount of parallel i/o activity before degrading the performance due  
to all the
seeking going on. Ideally you need to set the number of ost threads to  
this
number, but this is complicated by the fact that different workloads  
(as in
i/o sizes) result in different parallel streams the drives can handle.
Anyway after you reach that point of congestion the performance only  
goes
downhill, the threads just wait for i/o and contribute to your LA  
figures.
You need to experiment a bit to see what number of threads makes sense  
for you.
Perhaps start with number of threads equal to number of actual disk  
spindles
you have on that node (if you use raid5+, subtract any dead spindles  
not used
for actual data (e.g. 1/3 of spindles for raid5)) and and watch the  
performance
of the clients during usual workloads (not LA on OSSes, it won''t go  
much higher
than the max_threads you''d specify), if you feel the performance  
degraded,
try increasing thread count somewhat and see how that works until  
performance
starts degrading again or until you reach satisfactory performance.

If your disk configuration does not have writeback cache enabled and  
your
activity is mostly writes, you might also want to give patch from bug  
16919
a try, it removes synchronous journal commit requirements and  
therefore should
somewhat speedup OST writes in this case (unless you already use fast  
external
journal, or unless there is a write cache enabled that somewhat  
mitigates the
synchronousness of journal commit right now).

Hope that helps.

Bye,
     Oleg

Brian J. Murrell

2009-Jan-26 13:17 UTC

head link

[Lustre-discuss] OSS Service Thread Count

On Mon, 2009-01-26 at 00:01 -0500, Oleg Drokin wrote:> Hello!
In addition to Oleg''s suggestions...
> Essentially what is happening is your drives are only able to sustain  
> certain
> amount of parallel i/o activity before degrading the performance due  
> to all the
> seeking going on. Ideally you need to set the number of ost threads to  
> this
> number, but this is complicated by the fact that different workloads  
> (as in
> i/o sizes) result in different parallel streams the drives can handle.
Understanding the performance of your storage hardware is exactly why we
always recommend profiling it with the lustre iokit -- ideally, prior to
deployment of the filesystem.

The obdfilter-survey specifically profiles the overall throughput of
your hardware while the sgpdd-survey profiles individual disks.  The
former is supposed to be usable, non-destructively, on an existing
fileystem, however the latter is absolutely destructive and should not
be run anywhere you want preserve existing data.

Now, I mention the non-destructive nature of obdfilter-survey with
trepidation.  That is it''s intent, and for the number of times I have
used it has proven to be as advertised, however I am doubtful that that
specific aspect gets regularly tested by our QA department and as such
there is always a possibility of a bug sneaking in which voids that
intent.  Proceed with caution.

Anyway, the obdfilter-survey simply sends ranges of workloads to your
OSTs, varying in thread counts and I/O sizes.  When it''s all done it
gives you a (figurative, or graphic if you use the plot scripts on teh
data) picture of the performance abilities of your storage hardware and
will show you where the saturation points are.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090126/1d0451cb/attachment.bin

Lustre discuss - Jan 2009 - OSS Service Thread Count

[Lustre-discuss] OSS Service Thread Count

[Lustre-discuss] OSS Service Thread Count

[Lustre-discuss] OSS Service Thread Count