thr3ads.net - Lustre discuss - [Lustre-discuss] OST load distribution [May 2013]

If this information is useful, please help other people find it:
Share via:

Jure Pečar

2013-May-08 13:12 UTC

[Lustre-discuss] OST load distribution

Hello,

I have a lustre 2.2 environment which looks like this:

# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustre22-MDT0000_UUID      95.0G        9.4G       79.3G  11% /lustre[MDT:0]
lustre22-OST0000_UUID       5.5T        2.1T        3.3T  39% /lustre[OST:0]
lustre22-OST0001_UUID       5.5T        1.2T        4.3T  22% /lustre[OST:1]
lustre22-OST0002_UUID       5.5T     1016.0G        4.5T  18% /lustre[OST:2]
lustre22-OST0003_UUID       5.5T      948.3G        4.5T  17% /lustre[OST:3]
lustre22-OST0004_UUID       5.5T      812.3G        4.7T  15% /lustre[OST:4]
lustre22-OST0005_UUID       5.5T      641.4G        4.8T  11% /lustre[OST:5]
lustre22-OST0006_UUID       5.5T      619.4G        4.8T  11% /lustre[OST:6]
lustre22-OST0007_UUID       5.5T      587.0G        4.9T  11% /lustre[OST:7]
lustre22-OST0008_UUID       5.5T      539.7G        4.9T  10% /lustre[OST:8]
OST0009             : inactive device
lustre22-OST000a_UUID       5.5T      531.3G        4.9T  10% /lustre[OST:10]
lustre22-OST000b_UUID       5.5T      488.9G        5.0T   9% /lustre[OST:11]
lustre22-OST000c_UUID       5.5T      451.2G        5.0T   8% /lustre[OST:12]
lustre22-OST000d_UUID       5.5T      450.1G        5.0T   8% /lustre[OST:13]
lustre22-OST000e_UUID       5.5T      448.8G        5.0T   8% /lustre[OST:14]
lustre22-OST000f_UUID       5.5T      444.0G        5.0T   8% /lustre[OST:15]
lustre22-OST0010_UUID       5.5T      422.5G        5.0T   8% /lustre[OST:16]
lustre22-OST0011_UUID       5.5T      414.5G        5.0T   7% /lustre[OST:17]
lustre22-OST0012_UUID       5.5T      406.9G        5.1T   7% /lustre[OST:18]
OST0013             : inactive device

Reading through documentation I see that lustre should prefer those OSTs with
most free disk space (qos_prio_free is set to 91%). However my monitoring tells
me that OST0000 is the most loaded by far, having loadavg over 300 and network
traffic 3-5x higher than the rest.

I raised qos_threshold_rr to 55% and am waiting to see the results. Right now I
have clients reading and writing to this fs at around 600MB/s aggregated,
generating hundreds of files per job.

How soon am I expected to see the results?

What else can I do to spread the load from OST0000 evenly among the other OSTs?


-- 

Jure Pe?ar
http://jure.pecar.org

Lee, Brett

2013-May-08 14:05 UTC

head link

[Lustre-discuss] OST load distribution

> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Jure Pecar
> Sent: Wednesday, May 08, 2013 6:13 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] OST load distribution
> 
> 
> Hello,
> 
> I have a lustre 2.2 environment which looks like this:
> 
> # lfs df -h
> UUID                       bytes        Used   Available Use% Mounted on
> lustre22-MDT0000_UUID      95.0G        9.4G       79.3G  11%
/lustre[MDT:0]
> lustre22-OST0000_UUID       5.5T        2.1T        3.3T  39%
/lustre[OST:0]
> lustre22-OST0001_UUID       5.5T        1.2T        4.3T  22%
/lustre[OST:1]
> lustre22-OST0002_UUID       5.5T     1016.0G        4.5T  18%
/lustre[OST:2]
> lustre22-OST0003_UUID       5.5T      948.3G        4.5T  17%
/lustre[OST:3]
> lustre22-OST0004_UUID       5.5T      812.3G        4.7T  15%
/lustre[OST:4]
> lustre22-OST0005_UUID       5.5T      641.4G        4.8T  11%
/lustre[OST:5]
> lustre22-OST0006_UUID       5.5T      619.4G        4.8T  11%
/lustre[OST:6]
> lustre22-OST0007_UUID       5.5T      587.0G        4.9T  11%
/lustre[OST:7]
> lustre22-OST0008_UUID       5.5T      539.7G        4.9T  10%
/lustre[OST:8]
> OST0009             : inactive device
> lustre22-OST000a_UUID       5.5T      531.3G        4.9T  10%
/lustre[OST:10]
> lustre22-OST000b_UUID       5.5T      488.9G        5.0T   9%
/lustre[OST:11]
> lustre22-OST000c_UUID       5.5T      451.2G        5.0T   8%
/lustre[OST:12]
> lustre22-OST000d_UUID       5.5T      450.1G        5.0T   8%
/lustre[OST:13]
> lustre22-OST000e_UUID       5.5T      448.8G        5.0T   8%
/lustre[OST:14]
> lustre22-OST000f_UUID       5.5T      444.0G        5.0T   8%
/lustre[OST:15]
> lustre22-OST0010_UUID       5.5T      422.5G        5.0T   8%
/lustre[OST:16]
> lustre22-OST0011_UUID       5.5T      414.5G        5.0T   7%
/lustre[OST:17]
> lustre22-OST0012_UUID       5.5T      406.9G        5.1T   7%
/lustre[OST:18]
> OST0013             : inactive device
> 
> Reading through documentation I see that lustre should prefer those OSTs
> with most free disk space (qos_prio_free is set to 91%). However my
> monitoring tells me that OST0000 is the most loaded by far, having loadavg
> over 300 and network traffic 3-5x higher than the rest.
Hi Jure,

The qos_prio_free setting applies after the QOS algorithm is selected.
> 
> I raised qos_threshold_rr to 55% and am waiting to see the results. Right
now
> I have clients reading and writing to this fs at around 600MB/s aggregated,
> generating hundreds of files per job.
The qos_threshold_rr setting dictates whether the RR or QOS algorithms are used.
Setting it to 55% tells the MDS to use QOS only when the difference in OST
utilization is greater than 55.  You probably should go back to the default of
17% to keep OSTs balanced, unless there is a reason to trade off less equally
distributed data for performance.> 
> How soon am I expected to see the results?
> 
> What else can I do to spread the load from OST0000 evenly among the other
> OSTs?
> 
> 
> --
> 
> Jure Pe?ar
> http://jure.pecar.org
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Best,
--
Brett Lee
Sr. Systems Engineer
Intel High Performance Data Division

Jure Pečar

2013-May-08 14:30 UTC

head link

[Lustre-discuss] OST load distribution

On Wed, 8 May 2013 14:05:18 +0000
"Lee, Brett" <brett.lee at intel.com> wrote:
> The qos_threshold_rr setting dictates whether the RR or QOS algorithms are
used.  Setting it to 55% tells the MDS to use QOS only when the difference in
OST utilization is greater than 55.  You probably should go back to the default
of 17% to keep OSTs balanced, unless there is a reason to trade off less equally
distributed data for performance.
I noticed that lfs df -i returns same numbers for all OSTs (19%), which means
that most of them hold many more smaller files than the first one.

After I set qos_threshold_rr to 55%, load on first OST slowly decreased while fs
throughput remained about the same. I hope it will stay like this but will
observe closely.


-- 

Jure Pe?ar
http://jure.pecar.org

Stearman, Marc

2013-May-08 15:08 UTC

head link

[Lustre-discuss] OST load distribution

I''ve seen issues like this where a user used lfs setstripe -i 0 for
their directory when they really wanted lfs setstripe -i -1.  The 0 will create
all files starting on index 0 (OST 0), where -1 will be the default.  It could
be that one of your users is creating ALL their files to start on OST0 making it
more busy than the rest.  The successive stripes would be placed elswhere on the
file system.

-Marc

----
D. Marc Stearman
Lustre Operations Lead
stearman2 at llnl.gov
925.423.9670




On May 8, 2013, at 6:12 AM, Jure Pe?ar <pegasus at nerv.eu.org> wrote:
> 
> Hello,
> 
> I have a lustre 2.2 environment which looks like this:
> 
> # lfs df -h
> UUID                       bytes        Used   Available Use% Mounted on
> lustre22-MDT0000_UUID      95.0G        9.4G       79.3G  11%
/lustre[MDT:0]
> lustre22-OST0000_UUID       5.5T        2.1T        3.3T  39%
/lustre[OST:0]
> lustre22-OST0001_UUID       5.5T        1.2T        4.3T  22%
/lustre[OST:1]
> lustre22-OST0002_UUID       5.5T     1016.0G        4.5T  18%
/lustre[OST:2]
> lustre22-OST0003_UUID       5.5T      948.3G        4.5T  17%
/lustre[OST:3]
> lustre22-OST0004_UUID       5.5T      812.3G        4.7T  15%
/lustre[OST:4]
> lustre22-OST0005_UUID       5.5T      641.4G        4.8T  11%
/lustre[OST:5]
> lustre22-OST0006_UUID       5.5T      619.4G        4.8T  11%
/lustre[OST:6]
> lustre22-OST0007_UUID       5.5T      587.0G        4.9T  11%
/lustre[OST:7]
> lustre22-OST0008_UUID       5.5T      539.7G        4.9T  10%
/lustre[OST:8]
> OST0009             : inactive device
> lustre22-OST000a_UUID       5.5T      531.3G        4.9T  10%
/lustre[OST:10]
> lustre22-OST000b_UUID       5.5T      488.9G        5.0T   9%
/lustre[OST:11]
> lustre22-OST000c_UUID       5.5T      451.2G        5.0T   8%
/lustre[OST:12]
> lustre22-OST000d_UUID       5.5T      450.1G        5.0T   8%
/lustre[OST:13]
> lustre22-OST000e_UUID       5.5T      448.8G        5.0T   8%
/lustre[OST:14]
> lustre22-OST000f_UUID       5.5T      444.0G        5.0T   8%
/lustre[OST:15]
> lustre22-OST0010_UUID       5.5T      422.5G        5.0T   8%
/lustre[OST:16]
> lustre22-OST0011_UUID       5.5T      414.5G        5.0T   7%
/lustre[OST:17]
> lustre22-OST0012_UUID       5.5T      406.9G        5.1T   7%
/lustre[OST:18]
> OST0013             : inactive device
> 
> Reading through documentation I see that lustre should prefer those OSTs
with most free disk space (qos_prio_free is set to 91%). However my monitoring
tells me that OST0000 is the most loaded by far, having loadavg over 300 and
network traffic 3-5x higher than the rest.
> 
> I raised qos_threshold_rr to 55% and am waiting to see the results. Right
now I have clients reading and writing to this fs at around 600MB/s aggregated,
generating hundreds of files per job.
> 
> How soon am I expected to see the results?
> 
> What else can I do to spread the load from OST0000 evenly among the other
OSTs?
> 
> 
> -- 
> 
> Jure Pe?ar
> http://jure.pecar.org
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Dilger, Andreas

2013-May-08 22:21 UTC

head link

[Lustre-discuss] OST load distribution

On 2013-05-08, at 7:14, "Jure Pe?ar" <pegasus at nerv.eu.org>
wrote:> I have a lustre 2.2 environment which looks like this:
> 
> # lfs df -h
> UUID                       bytes        Used   Available Use% Mounted on
> lustre22-MDT0000_UUID      95.0G        9.4G       79.3G  11%
/lustre[MDT:0]
> lustre22-OST0000_UUID       5.5T        2.1T        3.3T  39%
/lustre[OST:0]
> lustre22-OST0001_UUID       5.5T        1.2T        4.3T  22%
/lustre[OST:1]
> lustre22-OST0002_UUID       5.5T     1016.0G        4.5T  18%
/lustre[OST:2]
> lustre22-OST0003_UUID       5.5T      948.3G        4.5T  17%
/lustre[OST:3]
[snip more OSTs with same usage]> 
> What else can I do to spread the load from OST0000 evenly among the other
OSTs?

Once you have found the source of the problem, then it may be best to do nothing
if you have a high file turnover rate.  Lustre will eventually balance itself
out.

You can proactively find large files on this OST and migrate them to other OSTs.
This will make copies of these files, and will also put a high load on OST0000.

 Note this is only currently safe if you "know" the migrated files are
not in use, or at opened read-only. That depends on your workload and users
(e.g. users not logged in or running jobs, older files, etc).

client# lfs find /lustre -ost lustre22-OST0000 -mtime +10 -size +1G  >
ost0000-list.txt
{edit ost0000-list.txt to only contain known inactive files}
client# lfs_migrate < ost0000-list.txt

In Lustre 2.4 it will be possible to migrate files that are in use, since it
will preserve the inode numbers.

If you can''t find the source of the problem, and OST0000 is getting
very full, you could mark the OST inactive on the MDS node:

mds# lctl --device %lustre22-OST0000 deactivate

And no new objects will be allocated on the OST after that time. 

Cheers, Andreas

Lustre discuss - May 2013 - OST load distribution

[Lustre-discuss] OST load distribution

[Lustre-discuss] OST load distribution

[Lustre-discuss] OST load distribution

[Lustre-discuss] OST load distribution

[Lustre-discuss] OST load distribution