Hello, I have a lustre 2.2 environment which looks like this: # lfs df -h UUID bytes Used Available Use% Mounted on lustre22-MDT0000_UUID 95.0G 9.4G 79.3G 11% /lustre[MDT:0] lustre22-OST0000_UUID 5.5T 2.1T 3.3T 39% /lustre[OST:0] lustre22-OST0001_UUID 5.5T 1.2T 4.3T 22% /lustre[OST:1] lustre22-OST0002_UUID 5.5T 1016.0G 4.5T 18% /lustre[OST:2] lustre22-OST0003_UUID 5.5T 948.3G 4.5T 17% /lustre[OST:3] lustre22-OST0004_UUID 5.5T 812.3G 4.7T 15% /lustre[OST:4] lustre22-OST0005_UUID 5.5T 641.4G 4.8T 11% /lustre[OST:5] lustre22-OST0006_UUID 5.5T 619.4G 4.8T 11% /lustre[OST:6] lustre22-OST0007_UUID 5.5T 587.0G 4.9T 11% /lustre[OST:7] lustre22-OST0008_UUID 5.5T 539.7G 4.9T 10% /lustre[OST:8] OST0009 : inactive device lustre22-OST000a_UUID 5.5T 531.3G 4.9T 10% /lustre[OST:10] lustre22-OST000b_UUID 5.5T 488.9G 5.0T 9% /lustre[OST:11] lustre22-OST000c_UUID 5.5T 451.2G 5.0T 8% /lustre[OST:12] lustre22-OST000d_UUID 5.5T 450.1G 5.0T 8% /lustre[OST:13] lustre22-OST000e_UUID 5.5T 448.8G 5.0T 8% /lustre[OST:14] lustre22-OST000f_UUID 5.5T 444.0G 5.0T 8% /lustre[OST:15] lustre22-OST0010_UUID 5.5T 422.5G 5.0T 8% /lustre[OST:16] lustre22-OST0011_UUID 5.5T 414.5G 5.0T 7% /lustre[OST:17] lustre22-OST0012_UUID 5.5T 406.9G 5.1T 7% /lustre[OST:18] OST0013 : inactive device Reading through documentation I see that lustre should prefer those OSTs with most free disk space (qos_prio_free is set to 91%). However my monitoring tells me that OST0000 is the most loaded by far, having loadavg over 300 and network traffic 3-5x higher than the rest. I raised qos_threshold_rr to 55% and am waiting to see the results. Right now I have clients reading and writing to this fs at around 600MB/s aggregated, generating hundreds of files per job. How soon am I expected to see the results? What else can I do to spread the load from OST0000 evenly among the other OSTs? -- Jure Pe?ar http://jure.pecar.org
> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss- > bounces at lists.lustre.org] On Behalf Of Jure Pecar > Sent: Wednesday, May 08, 2013 6:13 AM > To: lustre-discuss at lists.lustre.org > Subject: [Lustre-discuss] OST load distribution > > > Hello, > > I have a lustre 2.2 environment which looks like this: > > # lfs df -h > UUID bytes Used Available Use% Mounted on > lustre22-MDT0000_UUID 95.0G 9.4G 79.3G 11% /lustre[MDT:0] > lustre22-OST0000_UUID 5.5T 2.1T 3.3T 39% /lustre[OST:0] > lustre22-OST0001_UUID 5.5T 1.2T 4.3T 22% /lustre[OST:1] > lustre22-OST0002_UUID 5.5T 1016.0G 4.5T 18% /lustre[OST:2] > lustre22-OST0003_UUID 5.5T 948.3G 4.5T 17% /lustre[OST:3] > lustre22-OST0004_UUID 5.5T 812.3G 4.7T 15% /lustre[OST:4] > lustre22-OST0005_UUID 5.5T 641.4G 4.8T 11% /lustre[OST:5] > lustre22-OST0006_UUID 5.5T 619.4G 4.8T 11% /lustre[OST:6] > lustre22-OST0007_UUID 5.5T 587.0G 4.9T 11% /lustre[OST:7] > lustre22-OST0008_UUID 5.5T 539.7G 4.9T 10% /lustre[OST:8] > OST0009 : inactive device > lustre22-OST000a_UUID 5.5T 531.3G 4.9T 10% /lustre[OST:10] > lustre22-OST000b_UUID 5.5T 488.9G 5.0T 9% /lustre[OST:11] > lustre22-OST000c_UUID 5.5T 451.2G 5.0T 8% /lustre[OST:12] > lustre22-OST000d_UUID 5.5T 450.1G 5.0T 8% /lustre[OST:13] > lustre22-OST000e_UUID 5.5T 448.8G 5.0T 8% /lustre[OST:14] > lustre22-OST000f_UUID 5.5T 444.0G 5.0T 8% /lustre[OST:15] > lustre22-OST0010_UUID 5.5T 422.5G 5.0T 8% /lustre[OST:16] > lustre22-OST0011_UUID 5.5T 414.5G 5.0T 7% /lustre[OST:17] > lustre22-OST0012_UUID 5.5T 406.9G 5.1T 7% /lustre[OST:18] > OST0013 : inactive device > > Reading through documentation I see that lustre should prefer those OSTs > with most free disk space (qos_prio_free is set to 91%). However my > monitoring tells me that OST0000 is the most loaded by far, having loadavg > over 300 and network traffic 3-5x higher than the rest.Hi Jure, The qos_prio_free setting applies after the QOS algorithm is selected.> > I raised qos_threshold_rr to 55% and am waiting to see the results. Right now > I have clients reading and writing to this fs at around 600MB/s aggregated, > generating hundreds of files per job.The qos_threshold_rr setting dictates whether the RR or QOS algorithms are used. Setting it to 55% tells the MDS to use QOS only when the difference in OST utilization is greater than 55. You probably should go back to the default of 17% to keep OSTs balanced, unless there is a reason to trade off less equally distributed data for performance.> > How soon am I expected to see the results? > > What else can I do to spread the load from OST0000 evenly among the other > OSTs? > > > -- > > Jure Pe?ar > http://jure.pecar.org > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussBest, -- Brett Lee Sr. Systems Engineer Intel High Performance Data Division
On Wed, 8 May 2013 14:05:18 +0000 "Lee, Brett" <brett.lee at intel.com> wrote:> The qos_threshold_rr setting dictates whether the RR or QOS algorithms are used. Setting it to 55% tells the MDS to use QOS only when the difference in OST utilization is greater than 55. You probably should go back to the default of 17% to keep OSTs balanced, unless there is a reason to trade off less equally distributed data for performance.I noticed that lfs df -i returns same numbers for all OSTs (19%), which means that most of them hold many more smaller files than the first one. After I set qos_threshold_rr to 55%, load on first OST slowly decreased while fs throughput remained about the same. I hope it will stay like this but will observe closely. -- Jure Pe?ar http://jure.pecar.org
I''ve seen issues like this where a user used lfs setstripe -i 0 for their directory when they really wanted lfs setstripe -i -1. The 0 will create all files starting on index 0 (OST 0), where -1 will be the default. It could be that one of your users is creating ALL their files to start on OST0 making it more busy than the rest. The successive stripes would be placed elswhere on the file system. -Marc ---- D. Marc Stearman Lustre Operations Lead stearman2 at llnl.gov 925.423.9670 On May 8, 2013, at 6:12 AM, Jure Pe?ar <pegasus at nerv.eu.org> wrote:> > Hello, > > I have a lustre 2.2 environment which looks like this: > > # lfs df -h > UUID bytes Used Available Use% Mounted on > lustre22-MDT0000_UUID 95.0G 9.4G 79.3G 11% /lustre[MDT:0] > lustre22-OST0000_UUID 5.5T 2.1T 3.3T 39% /lustre[OST:0] > lustre22-OST0001_UUID 5.5T 1.2T 4.3T 22% /lustre[OST:1] > lustre22-OST0002_UUID 5.5T 1016.0G 4.5T 18% /lustre[OST:2] > lustre22-OST0003_UUID 5.5T 948.3G 4.5T 17% /lustre[OST:3] > lustre22-OST0004_UUID 5.5T 812.3G 4.7T 15% /lustre[OST:4] > lustre22-OST0005_UUID 5.5T 641.4G 4.8T 11% /lustre[OST:5] > lustre22-OST0006_UUID 5.5T 619.4G 4.8T 11% /lustre[OST:6] > lustre22-OST0007_UUID 5.5T 587.0G 4.9T 11% /lustre[OST:7] > lustre22-OST0008_UUID 5.5T 539.7G 4.9T 10% /lustre[OST:8] > OST0009 : inactive device > lustre22-OST000a_UUID 5.5T 531.3G 4.9T 10% /lustre[OST:10] > lustre22-OST000b_UUID 5.5T 488.9G 5.0T 9% /lustre[OST:11] > lustre22-OST000c_UUID 5.5T 451.2G 5.0T 8% /lustre[OST:12] > lustre22-OST000d_UUID 5.5T 450.1G 5.0T 8% /lustre[OST:13] > lustre22-OST000e_UUID 5.5T 448.8G 5.0T 8% /lustre[OST:14] > lustre22-OST000f_UUID 5.5T 444.0G 5.0T 8% /lustre[OST:15] > lustre22-OST0010_UUID 5.5T 422.5G 5.0T 8% /lustre[OST:16] > lustre22-OST0011_UUID 5.5T 414.5G 5.0T 7% /lustre[OST:17] > lustre22-OST0012_UUID 5.5T 406.9G 5.1T 7% /lustre[OST:18] > OST0013 : inactive device > > Reading through documentation I see that lustre should prefer those OSTs with most free disk space (qos_prio_free is set to 91%). However my monitoring tells me that OST0000 is the most loaded by far, having loadavg over 300 and network traffic 3-5x higher than the rest. > > I raised qos_threshold_rr to 55% and am waiting to see the results. Right now I have clients reading and writing to this fs at around 600MB/s aggregated, generating hundreds of files per job. > > How soon am I expected to see the results? > > What else can I do to spread the load from OST0000 evenly among the other OSTs? > > > -- > > Jure Pe?ar > http://jure.pecar.org > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2013-05-08, at 7:14, "Jure Pe?ar" <pegasus at nerv.eu.org> wrote:> I have a lustre 2.2 environment which looks like this: > > # lfs df -h > UUID bytes Used Available Use% Mounted on > lustre22-MDT0000_UUID 95.0G 9.4G 79.3G 11% /lustre[MDT:0] > lustre22-OST0000_UUID 5.5T 2.1T 3.3T 39% /lustre[OST:0] > lustre22-OST0001_UUID 5.5T 1.2T 4.3T 22% /lustre[OST:1] > lustre22-OST0002_UUID 5.5T 1016.0G 4.5T 18% /lustre[OST:2] > lustre22-OST0003_UUID 5.5T 948.3G 4.5T 17% /lustre[OST:3][snip more OSTs with same usage]> > What else can I do to spread the load from OST0000 evenly among the other OSTs?Once you have found the source of the problem, then it may be best to do nothing if you have a high file turnover rate. Lustre will eventually balance itself out. You can proactively find large files on this OST and migrate them to other OSTs. This will make copies of these files, and will also put a high load on OST0000. Note this is only currently safe if you "know" the migrated files are not in use, or at opened read-only. That depends on your workload and users (e.g. users not logged in or running jobs, older files, etc). client# lfs find /lustre -ost lustre22-OST0000 -mtime +10 -size +1G > ost0000-list.txt {edit ost0000-list.txt to only contain known inactive files} client# lfs_migrate < ost0000-list.txt In Lustre 2.4 it will be possible to migrate files that are in use, since it will preserve the inode numbers. If you can''t find the source of the problem, and OST0000 is getting very full, you could mark the OST inactive on the MDS node: mds# lctl --device %lustre22-OST0000 deactivate And no new objects will be allocated on the OST after that time. Cheers, Andreas