thr3ads.net - Lustre discuss - [Lustre-discuss] Unbalanced load across OST''s [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Aaron Everett

2009-Mar-19 18:33 UTC

[Lustre-discuss] Unbalanced load across OST''s

Hello all,

 

We are running 1.6.6 with a shared mgs/mdt and 3 ost''s. We run a set of
tests that write heavily, then we review the results and delete the
data. Usually the load is evenly spread across all 3 ost''s. I noticed
this afternoon that the load does not seem to be distributed.

 

OST0000 has a load of 50+ with iowait of around 10%

OST0001 has a load of <1 with >99% idle

OST0002 has a load of <1 with >99% idle

 
>From a client all 3 OST''s appear online:
 

[aeverett at englogin01 ~]$ lctl device_list

  0 UP mgc MGC172.16.14.10 at tcp 19dde65d-8eba-22b0-b618-f59bfbd36cde 5

  1 UP lov fortefs-clilov-f7cc4800 c86e2947-f2bf-5e47-541f-6ff3f13af9a0
4

  2 UP mdc fortefs-MDT0000-mdc-f7cc4800
c86e2947-f2bf-5e47-541f-6ff3f13af9a0 5

  3 UP osc fortefs-OST0000-osc-f7cc4800
c86e2947-f2bf-5e47-541f-6ff3f13af9a0 5

  4 UP osc fortefs-OST0001-osc-f7cc4800
c86e2947-f2bf-5e47-541f-6ff3f13af9a0 5

  5 UP osc fortefs-OST0002-osc-f7cc4800
c86e2947-f2bf-5e47-541f-6ff3f13af9a0 5

[aeverett at englogin01 ~]$

 
>From MGS/MDT claims Lustre is healthy:
 

[aeverett at lustrefs ~]$ cat /proc/fs/lustre/health_check 

healthy

[aeverett at lustrefs ~]$

 

df confirms the lopsided writes:

 

OST0000:

Filesystem            Size  Used Avail Use% Mounted on

/dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0

 

OST0001:

Filesystem            Size  Used Avail Use% Mounted on

/dev/sdb1             1.2T  317G  828G  28% /mnt/fortefs/ost0

 

OST0002:

Filesystem            Size  Used Avail Use% Mounted on

/dev/sdb1             1.2T  315G  831G  28% /mnt/fortefs/ost0

 

What else should I be checking? Has the MGS/MDT lost track of OST0001
and OST0002 somehow? Clients can still read data that is on OST0001 and
OST0002. I confirmed this using lfs getstripe and cat''ing files on
those
devices. If I edit the file, the file is written to OST0000.

 

Regards,
Aaron

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090319/ec97ae45/attachment.html

Brian J. Murrell

2009-Mar-19 19:12 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:> Hello all,
Hi,
> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a set
> of tests that write heavily, then we review the results and delete the
> data. Usually the load is evenly spread across all 3 ost?s. I noticed
> this afternoon that the load does not seem to be distributed.
Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few
large files before you measure distribution?
> OST0000 has a load of 50+ with iowait of around 10%
> 
> OST0001 has a load of <1 with >99% idle
> 
> OST0002 has a load of <1 with >99% idle
What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and
after the test?
> df confirms the lopsided writes:
lfs df [-i] from a client is usually more illustrative of use.  As I say
above, if you can quiesce the filesystem for the test above, do an lfs
df; lfs df -i before the test and after.  Assuming you were successful
in quiescing, you should see the change to the OSTs that your test
effected.
> OST0000:
> 
> Filesystem            Size  Used Avail Use% Mounted on
> 
> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
What''s important is what it looked like before the test too.  Your test
could have, for example, wrote a single object (i.e. file) of nearly
300G for all we can tell from what you''ve posted so far.

b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090319/7e7805bd/attachment.bin

Aaron Everett

2009-Mar-20 00:19 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

Thanks for the reply.

File sizes are all <1GB and most files are <1MB. For a test, I copied a
typical result set from a non-lustre mount to my lustre directory. Total size of
the test is 42GB. I included before/after results for lfs df -i from a client.

Before test:
[root at englogin01 backups]# lfs df 
UUID                 1K-blocks      Used Available  Use% Mounted on
fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID 1264472876 701771484 562701392   55% /lustre/work[OST:0]
fortefs-OST0001_UUID 1264472876 396097912 868374964   31% /lustre/work[OST:1]
fortefs-OST0002_UUID 1264472876 393607384 870865492   31% /lustre/work[OST:2]

filesystem summary:  3793418628 1491476780 2301941848   39% /lustre/work

[root at englogin01 backups]# lfs df -i
UUID                    Inodes     IUsed     IFree IUse% Mounted on
fortefs-MDT0000_UUID 497433511  33195991 464237520    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID  80289792  13585653  66704139   16% /lustre/work[OST:0]
fortefs-OST0001_UUID  80289792   7014185  73275607    8% /lustre/work[OST:1]
fortefs-OST0002_UUID  80289792   7013859  73275933    8% /lustre/work[OST:2]

filesystem summary:  497433511  33195991 464237520    6% /lustre/work


After test:

[aeverett at englogin01 ~]$ lfs df
UUID                 1K-blocks      Used Available  Use% Mounted on
fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID 1264472876 759191664 505281212   60% /lustre/work[OST:0]
fortefs-OST0001_UUID 1264472876 395929536 868543340   31% /lustre/work[OST:1]
fortefs-OST0002_UUID 1264472876 393392924 871079952   31% /lustre/work[OST:2]

filesystem summary:  3793418628 1548514124 2244904504   40% /lustre/work

[aeverett at englogin01 ~]$ lfs df -i
UUID                    Inodes     IUsed     IFree IUse% Mounted on
fortefs-MDT0000_UUID 497511996  33298931 464213065    6% /lustre/work[MDT:0]
fortefs-OST0000_UUID  80289792  13665028  66624764   17% /lustre/work[OST:0]
fortefs-OST0001_UUID  80289792   7013783  73276009    8% /lustre/work[OST:1]
fortefs-OST0002_UUID  80289792   7013456  73276336    8% /lustre/work[OST:2]

filesystem summary:  497511996  33298931 464213065    6% /lustre/work






-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Brian J. Murrell
Sent: Thursday, March 19, 2009 3:13 PM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:> Hello all,
Hi,
> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a set 
> of tests that write heavily, then we review the results and delete the 
> data. Usually the load is evenly spread across all 3 ost?s. I noticed 
> this afternoon that the load does not seem to be distributed.
Striping as well as file count and size affects OST distribution as well.  Are
any of the data involved striped?  Are you writing very few large files before
you measure distribution?
> OST0000 has a load of 50+ with iowait of around 10%
> 
> OST0001 has a load of <1 with >99% idle
> 
> OST0002 has a load of <1 with >99% idle
What does lfs df say before and after such a test that produces the above
results?  Does it bear out even use amongst the OST before, and after the test?
> df confirms the lopsided writes:
lfs df [-i] from a client is usually more illustrative of use.  As I say above,
if you can quiesce the filesystem for the test above, do an lfs df; lfs df -i
before the test and after.  Assuming you were successful in quiescing, you
should see the change to the OSTs that your test effected.
> OST0000:
> 
> Filesystem            Size  Used Avail Use% Mounted on
> 
> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
What''s important is what it looked like before the test too.  Your test
could have, for example, wrote a single object (i.e. file) of nearly 300G for
all we can tell from what you''ve posted so far.

b.

Kevin Van Maren

2009-Mar-20 12:56 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

There are several things that could have been done.  The most likely are:

1) you deactivated the OSTs on the MSD, using something like:

# lctl set_param ost.work-OST0001.active=0
# lctl set_param ost.work-OST0002.active=0

2) you set the file stripe on the directory to use only OST0, as with

# lfs setstripe -i 0 .

I would think that you''d remember #1, so my guess would be #2, which 
could have happened when someone intended to do "lfs setstripe -c 0".
Do an "lfs getstripe ."  A simple:

"lfs setstripe -i -1 ." in each directory

should clear it up going forward.  Note that existing files will NOT be 
re-striped, but new files will be balanced going forward.

Kevin


Aaron Everett wrote:> Thanks for the reply.
>
> File sizes are all <1GB and most files are <1MB. For a test, I copied
a typical result set from a non-lustre mount to my lustre directory. Total size
of the test is 42GB. I included before/after results for lfs df -i from a
client.
>
> Before test:
> [root at englogin01 backups]# lfs df 
> UUID                 1K-blocks      Used Available  Use% Mounted on
> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>
> filesystem summary:  3793418628 1491476780 2301941848   39% /lustre/work
>
> [root at englogin01 backups]# lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>
> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>
>
> After test:
>
> [aeverett at englogin01 ~]$ lfs df
> UUID                 1K-blocks      Used Available  Use% Mounted on
> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>
> filesystem summary:  3793418628 1548514124 2244904504   40% /lustre/work
>
> [aeverett at englogin01 ~]$ lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>
> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>
>
>
>
>
>
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
> Sent: Thursday, March 19, 2009 3:13 PM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>   
>> Hello all,
>>     
>
> Hi,
>
>   
>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a set 
>> of tests that write heavily, then we review the results and delete the 
>> data. Usually the load is evenly spread across all 3 ost?s. I noticed 
>> this afternoon that the load does not seem to be distributed.
>>     
>
> Striping as well as file count and size affects OST distribution as well. 
Are any of the data involved striped?  Are you writing very few large files
before you measure distribution?
>
>   
>> OST0000 has a load of 50+ with iowait of around 10%
>>
>> OST0001 has a load of <1 with >99% idle
>>
>> OST0002 has a load of <1 with >99% idle
>>     
>
> What does lfs df say before and after such a test that produces the above
results?  Does it bear out even use amongst the OST before, and after the test?
>
>   
>> df confirms the lopsided writes:
>>     
>
> lfs df [-i] from a client is usually more illustrative of use.  As I say
above, if you can quiesce the filesystem for the test above, do an lfs df; lfs
df -i before the test and after.  Assuming you were successful in quiescing, you
should see the change to the OSTs that your test effected.
>
>   
>> OST0000:
>>
>> Filesystem            Size  Used Avail Use% Mounted on
>>
>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>     
>
> What''s important is what it looked like before the test too.  Your
test could have, for example, wrote a single object (i.e. file) of nearly 300G
for all we can tell from what you''ve posted so far.
>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Aaron Everett

2009-Mar-20 15:14 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

Hello, I tried the suggestion of using lfs setstripe and it appears that
everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.

[root at englogin01 teststripe]# pwd
/lustre/work/aeverett/teststripe
[root at englogin01 aeverett]# mkdir teststripe
[root at englogin01 aeverett]# cd teststripe/
[root at englogin01 teststripe]# lfs setstripe -i -1 .
[root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .

[root at englogin01 teststripe]# lfs getstripe *
OBDS:
0: fortefs-OST0000_UUID ACTIVE
1: fortefs-OST0001_UUID ACTIVE
2: fortefs-OST0002_UUID ACTIVE
RHEL4WS_update
default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
RHEL4WS_update/rhn-packagesws.tgz
        obdidx           objid          objid            group
             0        77095451      0x498621b                0

RHEL4WS_update/rhn-packages
default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
RHEL4WS_update/kernel
default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
RHEL4WS_update/tools.tgz
        obdidx           objid          objid            group
             0        77096794      0x498675a                0

RHEL4WS_update/install
        obdidx           objid          objid            group
             0        77096842      0x498678a                0

RHEL4WS_update/installlinks
        obdidx           objid          objid            group
             0        77096843      0x498678b                0

RHEL4WS_update/ssh_config
        obdidx           objid          objid            group
             0        77096844      0x498678c                0

RHEL4WS_update/sshd_config
        obdidx           objid          objid            group
             0        77096845      0x498678d                0

.............. continues on like this for about 100 files with incrementing
objid numbers and obdidx = 0 and group = 0.


Thanks for all the help,
Aaron


-----Original Message-----
From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
Sent: Friday, March 20, 2009 8:57 AM
To: Aaron Everett
Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

There are several things that could have been done.  The most likely are:

1) you deactivated the OSTs on the MSD, using something like:

# lctl set_param ost.work-OST0001.active=0
# lctl set_param ost.work-OST0002.active=0

2) you set the file stripe on the directory to use only OST0, as with

# lfs setstripe -i 0 .

I would think that you''d remember #1, so my guess would be #2, which 
could have happened when someone intended to do "lfs setstripe -c 0".
Do an "lfs getstripe ."  A simple:

"lfs setstripe -i -1 ." in each directory

should clear it up going forward.  Note that existing files will NOT be 
re-striped, but new files will be balanced going forward.

Kevin


Aaron Everett wrote:> Thanks for the reply.
>
> File sizes are all <1GB and most files are <1MB. For a test, I copied
a typical result set from a non-lustre mount to my lustre directory. Total size
of the test is 42GB. I included before/after results for lfs df -i from a
client.
>
> Before test:
> [root at englogin01 backups]# lfs df 
> UUID                 1K-blocks      Used Available  Use% Mounted on
> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>
> filesystem summary:  3793418628 1491476780 2301941848   39% /lustre/work
>
> [root at englogin01 backups]# lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>
> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>
>
> After test:
>
> [aeverett at englogin01 ~]$ lfs df
> UUID                 1K-blocks      Used Available  Use% Mounted on
> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>
> filesystem summary:  3793418628 1548514124 2244904504   40% /lustre/work
>
> [aeverett at englogin01 ~]$ lfs df -i
> UUID                    Inodes     IUsed     IFree IUse% Mounted on
> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>
> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>
>
>
>
>
>
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
> Sent: Thursday, March 19, 2009 3:13 PM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>   
>> Hello all,
>>     
>
> Hi,
>
>   
>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a set 
>> of tests that write heavily, then we review the results and delete the 
>> data. Usually the load is evenly spread across all 3 ost?s. I noticed 
>> this afternoon that the load does not seem to be distributed.
>>     
>
> Striping as well as file count and size affects OST distribution as well. 
Are any of the data involved striped?  Are you writing very few large files
before you measure distribution?
>
>   
>> OST0000 has a load of 50+ with iowait of around 10%
>>
>> OST0001 has a load of <1 with >99% idle
>>
>> OST0002 has a load of <1 with >99% idle
>>     
>
> What does lfs df say before and after such a test that produces the above
results?  Does it bear out even use amongst the OST before, and after the test?
>
>   
>> df confirms the lopsided writes:
>>     
>
> lfs df [-i] from a client is usually more illustrative of use.  As I say
above, if you can quiesce the filesystem for the test above, do an lfs df; lfs
df -i before the test and after.  Assuming you were successful in quiescing, you
should see the change to the OSTs that your test effected.
>
>   
>> OST0000:
>>
>> Filesystem            Size  Used Avail Use% Mounted on
>>
>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>     
>
> What''s important is what it looked like before the test too.  Your
test could have, for example, wrote a single object (i.e. file) of nearly 300G
for all we can tell from what you''ve posted so far.
>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Kevin Van Maren

2009-Mar-20 15:54 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

If Lustre deactivated them, there should be something in the log.  You 
can check the status with something like:
# cat /proc/fs/lustre/osc/*-OST*-osc/active
on the MDS node (or using lctl).

You can also try setting the index to 1 or 2, which should force 
allocations there.

Kevin


Aaron Everett wrote:> Hello, I tried the suggestion of using lfs setstripe and it appears that
everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.
>
> [root at englogin01 teststripe]# pwd
> /lustre/work/aeverett/teststripe
> [root at englogin01 aeverett]# mkdir teststripe
> [root at englogin01 aeverett]# cd teststripe/
> [root at englogin01 teststripe]# lfs setstripe -i -1 .
> [root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .
>
> [root at englogin01 teststripe]# lfs getstripe *
> OBDS:
> 0: fortefs-OST0000_UUID ACTIVE
> 1: fortefs-OST0001_UUID ACTIVE
> 2: fortefs-OST0002_UUID ACTIVE
> RHEL4WS_update
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/rhn-packagesws.tgz
>         obdidx           objid          objid            group
>              0        77095451      0x498621b                0
>
> RHEL4WS_update/rhn-packages
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/kernel
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/tools.tgz
>         obdidx           objid          objid            group
>              0        77096794      0x498675a                0
>
> RHEL4WS_update/install
>         obdidx           objid          objid            group
>              0        77096842      0x498678a                0
>
> RHEL4WS_update/installlinks
>         obdidx           objid          objid            group
>              0        77096843      0x498678b                0
>
> RHEL4WS_update/ssh_config
>         obdidx           objid          objid            group
>              0        77096844      0x498678c                0
>
> RHEL4WS_update/sshd_config
>         obdidx           objid          objid            group
>              0        77096845      0x498678d                0
>
> .............. continues on like this for about 100 files with incrementing
objid numbers and obdidx = 0 and group = 0.
>
>
> Thanks for all the help,
> Aaron
>
>
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Friday, March 20, 2009 8:57 AM
> To: Aaron Everett
> Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> There are several things that could have been done.  The most likely are:
>
> 1) you deactivated the OSTs on the MSD, using something like:
>
> # lctl set_param ost.work-OST0001.active=0
> # lctl set_param ost.work-OST0002.active=0
>
> 2) you set the file stripe on the directory to use only OST0, as with
>
> # lfs setstripe -i 0 .
>
> I would think that you''d remember #1, so my guess would be #2,
which
> could have happened when someone intended to do "lfs setstripe -c
0".
> Do an "lfs getstripe ."  A simple:
>
> "lfs setstripe -i -1 ." in each directory
>
> should clear it up going forward.  Note that existing files will NOT be 
> re-striped, but new files will be balanced going forward.
>
> Kevin
>
>
> Aaron Everett wrote:
>   
>> Thanks for the reply.
>>
>> File sizes are all <1GB and most files are <1MB. For a test, I
copied a typical result set from a non-lustre mount to my lustre directory.
Total size of the test is 42GB. I included before/after results for lfs df -i
from a client.
>>
>> Before test:
>> [root at englogin01 backups]# lfs df 
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1491476780 2301941848   39%
/lustre/work
>>
>> [root at englogin01 backups]# lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>>
>>
>> After test:
>>
>> [aeverett at englogin01 ~]$ lfs df
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1548514124 2244904504   40%
/lustre/work
>>
>> [aeverett at englogin01 ~]$ lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
>> Sent: Thursday, March 19, 2009 3:13 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>
>> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>>   
>>     
>>> Hello all,
>>>     
>>>       
>> Hi,
>>
>>   
>>     
>>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a
set
>>> of tests that write heavily, then we review the results and delete
the
>>> data. Usually the load is evenly spread across all 3 ost?s. I
noticed
>>> this afternoon that the load does not seem to be distributed.
>>>     
>>>       
>> Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few large
files before you measure distribution?
>>
>>   
>>     
>>> OST0000 has a load of 50+ with iowait of around 10%
>>>
>>> OST0001 has a load of <1 with >99% idle
>>>
>>> OST0002 has a load of <1 with >99% idle
>>>     
>>>       
>> What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and after the
test?
>>
>>   
>>     
>>> df confirms the lopsided writes:
>>>     
>>>       
>> lfs df [-i] from a client is usually more illustrative of use.  As I
say above, if you can quiesce the filesystem for the test above, do an lfs df;
lfs df -i before the test and after.  Assuming you were successful in quiescing,
you should see the change to the OSTs that your test effected.
>>
>>   
>>     
>>> OST0000:
>>>
>>> Filesystem            Size  Used Avail Use% Mounted on
>>>
>>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>>     
>>>       
>> What''s important is what it looked like before the test too. 
Your test could have, for example, wrote a single object (i.e. file) of nearly
300G for all we can tell from what you''ve posted so far.
>>
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Aaron Everett

2009-Mar-20 16:30 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

I assume this is wrong:

[root at lustrefs osc]# pwd
/proc/fs/lustre/osc
[root at lustrefs osc]# ls
fortefs-OST0000-osc  num_refs
[root at lustrefs osc]#

I should be seeing something like this - correct?
fortefs-OST0000-osc
fortefs-OST0001-osc
fortefs-OST0002-osc
num_refs

This was done from the MDS
Aaron


-----Original Message-----
From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
Sent: Friday, March 20, 2009 11:55 AM
To: Aaron Everett
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

If Lustre deactivated them, there should be something in the log.  You 
can check the status with something like:
# cat /proc/fs/lustre/osc/*-OST*-osc/active
on the MDS node (or using lctl).

You can also try setting the index to 1 or 2, which should force 
allocations there.

Kevin


Aaron Everett wrote:> Hello, I tried the suggestion of using lfs setstripe and it appears that
everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.
>
> [root at englogin01 teststripe]# pwd
> /lustre/work/aeverett/teststripe
> [root at englogin01 aeverett]# mkdir teststripe
> [root at englogin01 aeverett]# cd teststripe/
> [root at englogin01 teststripe]# lfs setstripe -i -1 .
> [root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .
>
> [root at englogin01 teststripe]# lfs getstripe *
> OBDS:
> 0: fortefs-OST0000_UUID ACTIVE
> 1: fortefs-OST0001_UUID ACTIVE
> 2: fortefs-OST0002_UUID ACTIVE
> RHEL4WS_update
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/rhn-packagesws.tgz
>         obdidx           objid          objid            group
>              0        77095451      0x498621b                0
>
> RHEL4WS_update/rhn-packages
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/kernel
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/tools.tgz
>         obdidx           objid          objid            group
>              0        77096794      0x498675a                0
>
> RHEL4WS_update/install
>         obdidx           objid          objid            group
>              0        77096842      0x498678a                0
>
> RHEL4WS_update/installlinks
>         obdidx           objid          objid            group
>              0        77096843      0x498678b                0
>
> RHEL4WS_update/ssh_config
>         obdidx           objid          objid            group
>              0        77096844      0x498678c                0
>
> RHEL4WS_update/sshd_config
>         obdidx           objid          objid            group
>              0        77096845      0x498678d                0
>
> .............. continues on like this for about 100 files with incrementing
objid numbers and obdidx = 0 and group = 0.
>
>
> Thanks for all the help,
> Aaron
>
>
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Friday, March 20, 2009 8:57 AM
> To: Aaron Everett
> Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> There are several things that could have been done.  The most likely are:
>
> 1) you deactivated the OSTs on the MSD, using something like:
>
> # lctl set_param ost.work-OST0001.active=0
> # lctl set_param ost.work-OST0002.active=0
>
> 2) you set the file stripe on the directory to use only OST0, as with
>
> # lfs setstripe -i 0 .
>
> I would think that you''d remember #1, so my guess would be #2,
which
> could have happened when someone intended to do "lfs setstripe -c
0".
> Do an "lfs getstripe ."  A simple:
>
> "lfs setstripe -i -1 ." in each directory
>
> should clear it up going forward.  Note that existing files will NOT be 
> re-striped, but new files will be balanced going forward.
>
> Kevin
>
>
> Aaron Everett wrote:
>   
>> Thanks for the reply.
>>
>> File sizes are all <1GB and most files are <1MB. For a test, I
copied a typical result set from a non-lustre mount to my lustre directory.
Total size of the test is 42GB. I included before/after results for lfs df -i
from a client.
>>
>> Before test:
>> [root at englogin01 backups]# lfs df 
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1491476780 2301941848   39%
/lustre/work
>>
>> [root at englogin01 backups]# lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>>
>>
>> After test:
>>
>> [aeverett at englogin01 ~]$ lfs df
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1548514124 2244904504   40%
/lustre/work
>>
>> [aeverett at englogin01 ~]$ lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
>> Sent: Thursday, March 19, 2009 3:13 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>
>> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>>   
>>     
>>> Hello all,
>>>     
>>>       
>> Hi,
>>
>>   
>>     
>>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a
set
>>> of tests that write heavily, then we review the results and delete
the
>>> data. Usually the load is evenly spread across all 3 ost?s. I
noticed
>>> this afternoon that the load does not seem to be distributed.
>>>     
>>>       
>> Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few large
files before you measure distribution?
>>
>>   
>>     
>>> OST0000 has a load of 50+ with iowait of around 10%
>>>
>>> OST0001 has a load of <1 with >99% idle
>>>
>>> OST0002 has a load of <1 with >99% idle
>>>     
>>>       
>> What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and after the
test?
>>
>>   
>>     
>>> df confirms the lopsided writes:
>>>     
>>>       
>> lfs df [-i] from a client is usually more illustrative of use.  As I
say above, if you can quiesce the filesystem for the test above, do an lfs df;
lfs df -i before the test and after.  Assuming you were successful in quiescing,
you should see the change to the OSTs that your test effected.
>>
>>   
>>     
>>> OST0000:
>>>
>>> Filesystem            Size  Used Avail Use% Mounted on
>>>
>>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>>     
>>>       
>> What''s important is what it looked like before the test too. 
Your test could have, for example, wrote a single object (i.e. file) of nearly
300G for all we can tell from what you''ve posted so far.
>>
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Aaron Everett

2009-Mar-23 14:22 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

Clients still see all 3 OST''s, but it seems like the MGS/MDT machine is
missing 2 of the 3 OST machines:

[root at lustrefs ~]# lctl device_list
  0 UP mgs MGS MGS 63
  1 UP mgc MGC172.16.14.10 at tcp 97c99c52-d7f9-1b74-171e-f71c99990707 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov fortefs-mdtlov fortefs-mdtlov_UUID 4
  4 UP mds fortefs-MDT0000 fortefs-MDT0000_UUID 57
  5 UP osc fortefs-OST0000-osc fortefs-mdtlov_UUID 5
[root at lustrefs ~]#

What steps should I take to force the OST''s to rejoin?

Thanks!
Aaron


-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Aaron Everett
Sent: Friday, March 20, 2009 12:30 PM
To: Kevin.Vanmaren at Sun.COM
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

I assume this is wrong:

[root at lustrefs osc]# pwd
/proc/fs/lustre/osc
[root at lustrefs osc]# ls
fortefs-OST0000-osc  num_refs
[root at lustrefs osc]#

I should be seeing something like this - correct?
fortefs-OST0000-osc
fortefs-OST0001-osc
fortefs-OST0002-osc
num_refs

This was done from the MDS
Aaron


-----Original Message-----
From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
Sent: Friday, March 20, 2009 11:55 AM
To: Aaron Everett
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

If Lustre deactivated them, there should be something in the log.  You 
can check the status with something like:
# cat /proc/fs/lustre/osc/*-OST*-osc/active
on the MDS node (or using lctl).

You can also try setting the index to 1 or 2, which should force 
allocations there.

Kevin


Aaron Everett wrote:> Hello, I tried the suggestion of using lfs setstripe and it appears that
everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.
>
> [root at englogin01 teststripe]# pwd
> /lustre/work/aeverett/teststripe
> [root at englogin01 aeverett]# mkdir teststripe
> [root at englogin01 aeverett]# cd teststripe/
> [root at englogin01 teststripe]# lfs setstripe -i -1 .
> [root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .
>
> [root at englogin01 teststripe]# lfs getstripe *
> OBDS:
> 0: fortefs-OST0000_UUID ACTIVE
> 1: fortefs-OST0001_UUID ACTIVE
> 2: fortefs-OST0002_UUID ACTIVE
> RHEL4WS_update
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/rhn-packagesws.tgz
>         obdidx           objid          objid            group
>              0        77095451      0x498621b                0
>
> RHEL4WS_update/rhn-packages
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/kernel
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/tools.tgz
>         obdidx           objid          objid            group
>              0        77096794      0x498675a                0
>
> RHEL4WS_update/install
>         obdidx           objid          objid            group
>              0        77096842      0x498678a                0
>
> RHEL4WS_update/installlinks
>         obdidx           objid          objid            group
>              0        77096843      0x498678b                0
>
> RHEL4WS_update/ssh_config
>         obdidx           objid          objid            group
>              0        77096844      0x498678c                0
>
> RHEL4WS_update/sshd_config
>         obdidx           objid          objid            group
>              0        77096845      0x498678d                0
>
> .............. continues on like this for about 100 files with incrementing
objid numbers and obdidx = 0 and group = 0.
>
>
> Thanks for all the help,
> Aaron
>
>
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Friday, March 20, 2009 8:57 AM
> To: Aaron Everett
> Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> There are several things that could have been done.  The most likely are:
>
> 1) you deactivated the OSTs on the MSD, using something like:
>
> # lctl set_param ost.work-OST0001.active=0
> # lctl set_param ost.work-OST0002.active=0
>
> 2) you set the file stripe on the directory to use only OST0, as with
>
> # lfs setstripe -i 0 .
>
> I would think that you''d remember #1, so my guess would be #2,
which
> could have happened when someone intended to do "lfs setstripe -c
0".
> Do an "lfs getstripe ."  A simple:
>
> "lfs setstripe -i -1 ." in each directory
>
> should clear it up going forward.  Note that existing files will NOT be 
> re-striped, but new files will be balanced going forward.
>
> Kevin
>
>
> Aaron Everett wrote:
>   
>> Thanks for the reply.
>>
>> File sizes are all <1GB and most files are <1MB. For a test, I
copied a typical result set from a non-lustre mount to my lustre directory.
Total size of the test is 42GB. I included before/after results for lfs df -i
from a client.
>>
>> Before test:
>> [root at englogin01 backups]# lfs df 
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1491476780 2301941848   39%
/lustre/work
>>
>> [root at englogin01 backups]# lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>>
>>
>> After test:
>>
>> [aeverett at englogin01 ~]$ lfs df
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1548514124 2244904504   40%
/lustre/work
>>
>> [aeverett at englogin01 ~]$ lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
>> Sent: Thursday, March 19, 2009 3:13 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>
>> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>>   
>>     
>>> Hello all,
>>>     
>>>       
>> Hi,
>>
>>   
>>     
>>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a
set
>>> of tests that write heavily, then we review the results and delete
the
>>> data. Usually the load is evenly spread across all 3 ost?s. I
noticed
>>> this afternoon that the load does not seem to be distributed.
>>>     
>>>       
>> Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few large
files before you measure distribution?
>>
>>   
>>     
>>> OST0000 has a load of 50+ with iowait of around 10%
>>>
>>> OST0001 has a load of <1 with >99% idle
>>>
>>> OST0002 has a load of <1 with >99% idle
>>>     
>>>       
>> What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and after the
test?
>>
>>   
>>     
>>> df confirms the lopsided writes:
>>>     
>>>       
>> lfs df [-i] from a client is usually more illustrative of use.  As I
say above, if you can quiesce the filesystem for the test above, do an lfs df;
lfs df -i before the test and after.  Assuming you were successful in quiescing,
you should see the change to the OSTs that your test effected.
>>
>>   
>>     
>>> OST0000:
>>>
>>> Filesystem            Size  Used Avail Use% Mounted on
>>>
>>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>>     
>>>       
>> What''s important is what it looked like before the test too. 
Your test could have, for example, wrote a single object (i.e. file) of nearly
300G for all we can tell from what you''ve posted so far.
>>
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Aaron Everett

2009-Mar-25 14:09 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

Any advice on this? Any resources I can look at? I''ve been looking
through archives, re-reading the Lustre manual, but I''m still stuck.
Clients see all 3 OST, but the MGS/MDT only sees OST000

Thanks in advance
Aaron

-----Original Message-----
From: Aaron Everett 
Sent: Monday, March 23, 2009 10:23 AM
To: Aaron Everett; Kevin.Vanmaren at Sun.COM
Cc: lustre-discuss at lists.lustre.org
Subject: RE: [Lustre-discuss] Unbalanced load across OST''s

Clients still see all 3 OST''s, but it seems like the MGS/MDT machine is
missing 2 of the 3 OST machines:

[root at lustrefs ~]# lctl device_list
  0 UP mgs MGS MGS 63
  1 UP mgc MGC172.16.14.10 at tcp 97c99c52-d7f9-1b74-171e-f71c99990707 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov fortefs-mdtlov fortefs-mdtlov_UUID 4
  4 UP mds fortefs-MDT0000 fortefs-MDT0000_UUID 57
  5 UP osc fortefs-OST0000-osc fortefs-mdtlov_UUID 5
[root at lustrefs ~]#

What steps should I take to force the OST''s to rejoin?

Thanks!
Aaron


-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Aaron Everett
Sent: Friday, March 20, 2009 12:30 PM
To: Kevin.Vanmaren at Sun.COM
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

I assume this is wrong:

[root at lustrefs osc]# pwd
/proc/fs/lustre/osc
[root at lustrefs osc]# ls
fortefs-OST0000-osc  num_refs
[root at lustrefs osc]#

I should be seeing something like this - correct?
fortefs-OST0000-osc
fortefs-OST0001-osc
fortefs-OST0002-osc
num_refs

This was done from the MDS
Aaron


-----Original Message-----
From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
Sent: Friday, March 20, 2009 11:55 AM
To: Aaron Everett
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

If Lustre deactivated them, there should be something in the log.  You 
can check the status with something like:
# cat /proc/fs/lustre/osc/*-OST*-osc/active
on the MDS node (or using lctl).

You can also try setting the index to 1 or 2, which should force 
allocations there.

Kevin


Aaron Everett wrote:> Hello, I tried the suggestion of using lfs setstripe and it appears that
everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.
>
> [root at englogin01 teststripe]# pwd
> /lustre/work/aeverett/teststripe
> [root at englogin01 aeverett]# mkdir teststripe
> [root at englogin01 aeverett]# cd teststripe/
> [root at englogin01 teststripe]# lfs setstripe -i -1 .
> [root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .
>
> [root at englogin01 teststripe]# lfs getstripe *
> OBDS:
> 0: fortefs-OST0000_UUID ACTIVE
> 1: fortefs-OST0001_UUID ACTIVE
> 2: fortefs-OST0002_UUID ACTIVE
> RHEL4WS_update
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/rhn-packagesws.tgz
>         obdidx           objid          objid            group
>              0        77095451      0x498621b                0
>
> RHEL4WS_update/rhn-packages
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/kernel
> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
> RHEL4WS_update/tools.tgz
>         obdidx           objid          objid            group
>              0        77096794      0x498675a                0
>
> RHEL4WS_update/install
>         obdidx           objid          objid            group
>              0        77096842      0x498678a                0
>
> RHEL4WS_update/installlinks
>         obdidx           objid          objid            group
>              0        77096843      0x498678b                0
>
> RHEL4WS_update/ssh_config
>         obdidx           objid          objid            group
>              0        77096844      0x498678c                0
>
> RHEL4WS_update/sshd_config
>         obdidx           objid          objid            group
>              0        77096845      0x498678d                0
>
> .............. continues on like this for about 100 files with incrementing
objid numbers and obdidx = 0 and group = 0.
>
>
> Thanks for all the help,
> Aaron
>
>
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Friday, March 20, 2009 8:57 AM
> To: Aaron Everett
> Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> There are several things that could have been done.  The most likely are:
>
> 1) you deactivated the OSTs on the MSD, using something like:
>
> # lctl set_param ost.work-OST0001.active=0
> # lctl set_param ost.work-OST0002.active=0
>
> 2) you set the file stripe on the directory to use only OST0, as with
>
> # lfs setstripe -i 0 .
>
> I would think that you''d remember #1, so my guess would be #2,
which
> could have happened when someone intended to do "lfs setstripe -c
0".
> Do an "lfs getstripe ."  A simple:
>
> "lfs setstripe -i -1 ." in each directory
>
> should clear it up going forward.  Note that existing files will NOT be 
> re-striped, but new files will be balanced going forward.
>
> Kevin
>
>
> Aaron Everett wrote:
>   
>> Thanks for the reply.
>>
>> File sizes are all <1GB and most files are <1MB. For a test, I
copied a typical result set from a non-lustre mount to my lustre directory.
Total size of the test is 42GB. I included before/after results for lfs df -i
from a client.
>>
>> Before test:
>> [root at englogin01 backups]# lfs df 
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1491476780 2301941848   39%
/lustre/work
>>
>> [root at englogin01 backups]# lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497433511  33195991 464237520    6% /lustre/work
>>
>>
>> After test:
>>
>> [aeverett at englogin01 ~]$ lfs df
>> UUID                 1K-blocks      Used Available  Use% Mounted on
>> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>>
>> filesystem summary:  3793418628 1548514124 2244904504   40%
/lustre/work
>>
>> [aeverett at englogin01 ~]$ lfs df -i
>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
>> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
>> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
>> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>>
>> filesystem summary:  497511996  33298931 464213065    6% /lustre/work
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
>> Sent: Thursday, March 19, 2009 3:13 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>
>> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>>   
>>     
>>> Hello all,
>>>     
>>>       
>> Hi,
>>
>>   
>>     
>>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run a
set
>>> of tests that write heavily, then we review the results and delete
the
>>> data. Usually the load is evenly spread across all 3 ost?s. I
noticed
>>> this afternoon that the load does not seem to be distributed.
>>>     
>>>       
>> Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few large
files before you measure distribution?
>>
>>   
>>     
>>> OST0000 has a load of 50+ with iowait of around 10%
>>>
>>> OST0001 has a load of <1 with >99% idle
>>>
>>> OST0002 has a load of <1 with >99% idle
>>>     
>>>       
>> What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and after the
test?
>>
>>   
>>     
>>> df confirms the lopsided writes:
>>>     
>>>       
>> lfs df [-i] from a client is usually more illustrative of use.  As I
say above, if you can quiesce the filesystem for the test above, do an lfs df;
lfs df -i before the test and after.  Assuming you were successful in quiescing,
you should see the change to the OSTs that your test effected.
>>
>>   
>>     
>>> OST0000:
>>>
>>> Filesystem            Size  Used Avail Use% Mounted on
>>>
>>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>>     
>>>       
>> What''s important is what it looked like before the test too. 
Your test could have, for example, wrote a single object (i.e. file) of nearly
300G for all we can tell from what you''ve posted so far.
>>
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Kevin Van Maren

2009-Mar-25 16:51 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

Yes, you are correct that the issue is with the MDS not seeing the OSTs.

Has restarting the mds or clients made any difference?  If there is 
nothing in the logs, and you haven''t made any configuration changes, I 
don''t have an answer for you as for why that is the case.

Kevin


Aaron Everett wrote:> Any advice on this? Any resources I can look at? I''ve been looking
through archives, re-reading the Lustre manual, but I''m still stuck.
Clients see all 3 OST, but the MGS/MDT only sees OST000
>
> Thanks in advance
> Aaron
>
> -----Original Message-----
> From: Aaron Everett 
> Sent: Monday, March 23, 2009 10:23 AM
> To: Aaron Everett; Kevin.Vanmaren at Sun.COM
> Cc: lustre-discuss at lists.lustre.org
> Subject: RE: [Lustre-discuss] Unbalanced load across OST''s
>
> Clients still see all 3 OST''s, but it seems like the MGS/MDT
machine is missing 2 of the 3 OST machines:
>
> [root at lustrefs ~]# lctl device_list
>   0 UP mgs MGS MGS 63
>   1 UP mgc MGC172.16.14.10 at tcp 97c99c52-d7f9-1b74-171e-f71c99990707 5
>   2 UP mdt MDS MDS_uuid 3
>   3 UP lov fortefs-mdtlov fortefs-mdtlov_UUID 4
>   4 UP mds fortefs-MDT0000 fortefs-MDT0000_UUID 57
>   5 UP osc fortefs-OST0000-osc fortefs-mdtlov_UUID 5
> [root at lustrefs ~]#
>
> What steps should I take to force the OST''s to rejoin?
>
> Thanks!
> Aaron
>
>
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Aaron Everett
> Sent: Friday, March 20, 2009 12:30 PM
> To: Kevin.Vanmaren at Sun.COM
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> I assume this is wrong:
>
> [root at lustrefs osc]# pwd
> /proc/fs/lustre/osc
> [root at lustrefs osc]# ls
> fortefs-OST0000-osc  num_refs
> [root at lustrefs osc]#
>
> I should be seeing something like this - correct?
> fortefs-OST0000-osc
> fortefs-OST0001-osc
> fortefs-OST0002-osc
> num_refs
>
> This was done from the MDS
> Aaron
>
>
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Friday, March 20, 2009 11:55 AM
> To: Aaron Everett
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> If Lustre deactivated them, there should be something in the log.  You 
> can check the status with something like:
> # cat /proc/fs/lustre/osc/*-OST*-osc/active
> on the MDS node (or using lctl).
>
> You can also try setting the index to 1 or 2, which should force 
> allocations there.
>
> Kevin
>
>
> Aaron Everett wrote:
>   
>> Hello, I tried the suggestion of using lfs setstripe and it appears
that everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.
>>
>> [root at englogin01 teststripe]# pwd
>> /lustre/work/aeverett/teststripe
>> [root at englogin01 aeverett]# mkdir teststripe
>> [root at englogin01 aeverett]# cd teststripe/
>> [root at englogin01 teststripe]# lfs setstripe -i -1 .
>> [root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .
>>
>> [root at englogin01 teststripe]# lfs getstripe *
>> OBDS:
>> 0: fortefs-OST0000_UUID ACTIVE
>> 1: fortefs-OST0001_UUID ACTIVE
>> 2: fortefs-OST0002_UUID ACTIVE
>> RHEL4WS_update
>> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
>> RHEL4WS_update/rhn-packagesws.tgz
>>         obdidx           objid          objid            group
>>              0        77095451      0x498621b                0
>>
>> RHEL4WS_update/rhn-packages
>> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
>> RHEL4WS_update/kernel
>> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
>> RHEL4WS_update/tools.tgz
>>         obdidx           objid          objid            group
>>              0        77096794      0x498675a                0
>>
>> RHEL4WS_update/install
>>         obdidx           objid          objid            group
>>              0        77096842      0x498678a                0
>>
>> RHEL4WS_update/installlinks
>>         obdidx           objid          objid            group
>>              0        77096843      0x498678b                0
>>
>> RHEL4WS_update/ssh_config
>>         obdidx           objid          objid            group
>>              0        77096844      0x498678c                0
>>
>> RHEL4WS_update/sshd_config
>>         obdidx           objid          objid            group
>>              0        77096845      0x498678d                0
>>
>> .............. continues on like this for about 100 files with
incrementing objid numbers and obdidx = 0 and group = 0.
>>
>>
>> Thanks for all the help,
>> Aaron
>>
>>
>> -----Original Message-----
>> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
>> Sent: Friday, March 20, 2009 8:57 AM
>> To: Aaron Everett
>> Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>
>> There are several things that could have been done.  The most likely
are:
>>
>> 1) you deactivated the OSTs on the MSD, using something like:
>>
>> # lctl set_param ost.work-OST0001.active=0
>> # lctl set_param ost.work-OST0002.active=0
>>
>> 2) you set the file stripe on the directory to use only OST0, as with
>>
>> # lfs setstripe -i 0 .
>>
>> I would think that you''d remember #1, so my guess would be #2,
which
>> could have happened when someone intended to do "lfs setstripe -c
0".
>> Do an "lfs getstripe ."  A simple:
>>
>> "lfs setstripe -i -1 ." in each directory
>>
>> should clear it up going forward.  Note that existing files will NOT be
>> re-striped, but new files will be balanced going forward.
>>
>> Kevin
>>
>>
>> Aaron Everett wrote:
>>   
>>     
>>> Thanks for the reply.
>>>
>>> File sizes are all <1GB and most files are <1MB. For a test,
I copied a typical result set from a non-lustre mount to my lustre directory.
Total size of the test is 42GB. I included before/after results for lfs df -i
from a client.
>>>
>>> Before test:
>>> [root at englogin01 backups]# lfs df 
>>> UUID                 1K-blocks      Used Available  Use% Mounted on
>>> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  3793418628 1491476780 2301941848   39%
/lustre/work
>>>
>>> [root at englogin01 backups]# lfs df -i
>>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>>> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  497433511  33195991 464237520    6%
/lustre/work
>>>
>>>
>>> After test:
>>>
>>> [aeverett at englogin01 ~]$ lfs df
>>> UUID                 1K-blocks      Used Available  Use% Mounted on
>>> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  3793418628 1548514124 2244904504   40%
/lustre/work
>>>
>>> [aeverett at englogin01 ~]$ lfs df -i
>>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>>> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  497511996  33298931 464213065    6%
/lustre/work
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
>>> Sent: Thursday, March 19, 2009 3:13 PM
>>> To: lustre-discuss at lists.lustre.org
>>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>>
>>> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>>>   
>>>     
>>>       
>>>> Hello all,
>>>>     
>>>>       
>>>>         
>>> Hi,
>>>
>>>   
>>>     
>>>       
>>>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run
a set
>>>> of tests that write heavily, then we review the results and
delete the
>>>> data. Usually the load is evenly spread across all 3 ost?s. I
noticed
>>>> this afternoon that the load does not seem to be distributed.
>>>>     
>>>>       
>>>>         
>>> Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few large
files before you measure distribution?
>>>
>>>   
>>>     
>>>       
>>>> OST0000 has a load of 50+ with iowait of around 10%
>>>>
>>>> OST0001 has a load of <1 with >99% idle
>>>>
>>>> OST0002 has a load of <1 with >99% idle
>>>>     
>>>>       
>>>>         
>>> What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and after the
test?
>>>
>>>   
>>>     
>>>       
>>>> df confirms the lopsided writes:
>>>>     
>>>>       
>>>>         
>>> lfs df [-i] from a client is usually more illustrative of use.  As
I say above, if you can quiesce the filesystem for the test above, do an lfs df;
lfs df -i before the test and after.  Assuming you were successful in quiescing,
you should see the change to the OSTs that your test effected.
>>>
>>>   
>>>     
>>>       
>>>> OST0000:
>>>>
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>
>>>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>>>     
>>>>       
>>>>         
>>> What''s important is what it looked like before the test
too.  Your test could have, for example, wrote a single object (i.e. file) of
nearly 300G for all we can tell from what you''ve posted so far.
>>>
>>> b.
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>   
>>>     
>>>       
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Aaron Everett

2009-Mar-26 15:47 UTC

head link

[Lustre-discuss] Unbalanced load across OST''s

Is there significance to the last digit on the OST in device_list? I''m
assuming it is some sort of status. Notice below that from OST0000, the last
digits are 63. When I run lctl dl from OST0001 and OST0002 (the OST''s I
can not write to) the last digits are 61. I just noticed it this morning.

[aeverett at lustrefs01 ~]$ lctl dl
  0 UP mgc MGC172.16.14.10 at tcp c951a722-71ad-368c-d791-c4e0efee7120 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter fortefs-OST0000 fortefs-OST0000_UUID 63
[aeverett at lustrefs01 ~]$ ssh lustrefs02 lctl dl
  0 UP mgc MGC172.16.14.10 at tcp 5389dc05-63d9-2ae9-1f21-2d9471e82166 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter fortefs-OST0001 fortefs-OST0001_UUID 61
[aeverett at lustrefs01 ~]$ ssh lustrefs03 lctl dl
  0 UP mgc MGC172.16.14.10 at tcp 01325f6d-ee28-aa97-910c-fef806443a99 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter fortefs-OST0002 fortefs-OST0002_UUID 61
[aeverett at lustrefs01 ~]$

MDT:
[aeverett at lustrefs01 ~]$ ssh lustrefs lctl dl
  0 UP mgs MGS MGS 67
  1 UP mgc MGC172.16.14.10 at tcp 97c99c52-d7f9-1b74-171e-f71c99990707 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov fortefs-mdtlov fortefs-mdtlov_UUID 4
  4 UP mds fortefs-MDT0000 fortefs-MDT0000_UUID 61
  5 UP osc fortefs-OST0000-osc fortefs-mdtlov_UUID 5
[aeverett at lustrefs01 ~]$

The log files on the OST''s are clean. But I did notice that there are
errors on the MDT. I don''t know how I missed these earlier:

Mar 26 09:19:35 lustrefs kernel: LustreError:
5816:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:57:45 lustrefs kernel: LustreError:
5823:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:57:48 lustrefs kernel: LustreError:
5481:0:(lov_ea.c:243:lsm_unpackmd_plain()) OST index 1 missing
Mar 26 10:57:48 lustrefs kernel: LustreError:
5481:0:(lov_ea.c:243:lsm_unpackmd_plain()) Skipped 156 previous similar messages
Mar 26 10:57:48 lustrefs kernel: LustreError:
5497:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:57:51 lustrefs kernel: LustreError:
5482:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:57:51 lustrefs kernel: LustreError:
5482:0:(lov_ea.c:238:lsm_unpackmd_plain()) Skipped 5 previous similar messages
Mar 26 10:57:56 lustrefs kernel: LustreError:
7299:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:57:56 lustrefs kernel: LustreError:
7299:0:(lov_ea.c:238:lsm_unpackmd_plain()) Skipped 60 previous similar messages
Mar 26 10:58:07 lustrefs kernel: LustreError:
5510:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:58:07 lustrefs kernel: LustreError:
5510:0:(lov_ea.c:238:lsm_unpackmd_plain()) Skipped 116 previous similar messages
Mar 26 10:58:27 lustrefs kernel: LustreError:
7320:0:(lov_ea.c:238:lsm_unpackmd_plain()) OST index 2 more than OST count 2
Mar 26 10:58:27 lustrefs kernel: LustreError:
7320:0:(lov_ea.c:238:lsm_unpackmd_plain()) Skipped 240 previous similar messages
Mar 26 11:00:13 lustrefs kernel: LustreError:
4200:0:(lov_ea.c:243:lsm_unpackmd_plain()) OST index 1 missing
Mar 26 11:00:13 lustrefs kernel: LustreError:
4200:0:(lov_ea.c:243:lsm_unpackmd_plain()) Skipped 506 previous similar messages

If I can''t get the OST''s back online, I will move the data
off, recreate the filesystem and replace the data. I just wish I understood the
cause...

Thanks for all the help
Aaron

-----Original Message-----
From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
Sent: Wednesday, March 25, 2009 12:51 PM
To: Aaron Everett
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Unbalanced load across OST''s

Yes, you are correct that the issue is with the MDS not seeing the OSTs.

Has restarting the mds or clients made any difference?  If there is 
nothing in the logs, and you haven''t made any configuration changes, I 
don''t have an answer for you as for why that is the case.

Kevin


Aaron Everett wrote:> Any advice on this? Any resources I can look at? I''ve been looking
through archives, re-reading the Lustre manual, but I''m still stuck.
Clients see all 3 OST, but the MGS/MDT only sees OST000
>
> Thanks in advance
> Aaron
>
> -----Original Message-----
> From: Aaron Everett 
> Sent: Monday, March 23, 2009 10:23 AM
> To: Aaron Everett; Kevin.Vanmaren at Sun.COM
> Cc: lustre-discuss at lists.lustre.org
> Subject: RE: [Lustre-discuss] Unbalanced load across OST''s
>
> Clients still see all 3 OST''s, but it seems like the MGS/MDT
machine is missing 2 of the 3 OST machines:
>
> [root at lustrefs ~]# lctl device_list
>   0 UP mgs MGS MGS 63
>   1 UP mgc MGC172.16.14.10 at tcp 97c99c52-d7f9-1b74-171e-f71c99990707 5
>   2 UP mdt MDS MDS_uuid 3
>   3 UP lov fortefs-mdtlov fortefs-mdtlov_UUID 4
>   4 UP mds fortefs-MDT0000 fortefs-MDT0000_UUID 57
>   5 UP osc fortefs-OST0000-osc fortefs-mdtlov_UUID 5
> [root at lustrefs ~]#
>
> What steps should I take to force the OST''s to rejoin?
>
> Thanks!
> Aaron
>
>
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Aaron Everett
> Sent: Friday, March 20, 2009 12:30 PM
> To: Kevin.Vanmaren at Sun.COM
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> I assume this is wrong:
>
> [root at lustrefs osc]# pwd
> /proc/fs/lustre/osc
> [root at lustrefs osc]# ls
> fortefs-OST0000-osc  num_refs
> [root at lustrefs osc]#
>
> I should be seeing something like this - correct?
> fortefs-OST0000-osc
> fortefs-OST0001-osc
> fortefs-OST0002-osc
> num_refs
>
> This was done from the MDS
> Aaron
>
>
> -----Original Message-----
> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
> Sent: Friday, March 20, 2009 11:55 AM
> To: Aaron Everett
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>
> If Lustre deactivated them, there should be something in the log.  You 
> can check the status with something like:
> # cat /proc/fs/lustre/osc/*-OST*-osc/active
> on the MDS node (or using lctl).
>
> You can also try setting the index to 1 or 2, which should force 
> allocations there.
>
> Kevin
>
>
> Aaron Everett wrote:
>   
>> Hello, I tried the suggestion of using lfs setstripe and it appears
that everything is still being written to only OST0000. You mentioned the
OST''s may have been deactivated. Is it possible that last time we
restarted Lustre they came up in a deactivated or read only state? Last week we
brought our Lustre machines offline to swap out UPS''s.
>>
>> [root at englogin01 teststripe]# pwd
>> /lustre/work/aeverett/teststripe
>> [root at englogin01 aeverett]# mkdir teststripe
>> [root at englogin01 aeverett]# cd teststripe/
>> [root at englogin01 teststripe]# lfs setstripe -i -1 .
>> [root at englogin01 teststripe]# cp -R /home/aeverett/RHEL4WS_update/ .
>>
>> [root at englogin01 teststripe]# lfs getstripe *
>> OBDS:
>> 0: fortefs-OST0000_UUID ACTIVE
>> 1: fortefs-OST0001_UUID ACTIVE
>> 2: fortefs-OST0002_UUID ACTIVE
>> RHEL4WS_update
>> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
>> RHEL4WS_update/rhn-packagesws.tgz
>>         obdidx           objid          objid            group
>>              0        77095451      0x498621b                0
>>
>> RHEL4WS_update/rhn-packages
>> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
>> RHEL4WS_update/kernel
>> default stripe_count: 1 stripe_size: 1048576 stripe_offset: 0
>> RHEL4WS_update/tools.tgz
>>         obdidx           objid          objid            group
>>              0        77096794      0x498675a                0
>>
>> RHEL4WS_update/install
>>         obdidx           objid          objid            group
>>              0        77096842      0x498678a                0
>>
>> RHEL4WS_update/installlinks
>>         obdidx           objid          objid            group
>>              0        77096843      0x498678b                0
>>
>> RHEL4WS_update/ssh_config
>>         obdidx           objid          objid            group
>>              0        77096844      0x498678c                0
>>
>> RHEL4WS_update/sshd_config
>>         obdidx           objid          objid            group
>>              0        77096845      0x498678d                0
>>
>> .............. continues on like this for about 100 files with
incrementing objid numbers and obdidx = 0 and group = 0.
>>
>>
>> Thanks for all the help,
>> Aaron
>>
>>
>> -----Original Message-----
>> From: Kevin.Vanmaren at Sun.COM [mailto:Kevin.Vanmaren at Sun.COM] 
>> Sent: Friday, March 20, 2009 8:57 AM
>> To: Aaron Everett
>> Cc: Brian J. Murrell; lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>
>> There are several things that could have been done.  The most likely
are:
>>
>> 1) you deactivated the OSTs on the MSD, using something like:
>>
>> # lctl set_param ost.work-OST0001.active=0
>> # lctl set_param ost.work-OST0002.active=0
>>
>> 2) you set the file stripe on the directory to use only OST0, as with
>>
>> # lfs setstripe -i 0 .
>>
>> I would think that you''d remember #1, so my guess would be #2,
which
>> could have happened when someone intended to do "lfs setstripe -c
0".
>> Do an "lfs getstripe ."  A simple:
>>
>> "lfs setstripe -i -1 ." in each directory
>>
>> should clear it up going forward.  Note that existing files will NOT be
>> re-striped, but new files will be balanced going forward.
>>
>> Kevin
>>
>>
>> Aaron Everett wrote:
>>   
>>     
>>> Thanks for the reply.
>>>
>>> File sizes are all <1GB and most files are <1MB. For a test,
I copied a typical result set from a non-lustre mount to my lustre directory.
Total size of the test is 42GB. I included before/after results for lfs df -i
from a client.
>>>
>>> Before test:
>>> [root at englogin01 backups]# lfs df 
>>> UUID                 1K-blocks      Used Available  Use% Mounted on
>>> fortefs-MDT0000_UUID 1878903960 129326660 1749577300    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID 1264472876 701771484 562701392   55%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID 1264472876 396097912 868374964   31%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID 1264472876 393607384 870865492   31%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  3793418628 1491476780 2301941848   39%
/lustre/work
>>>
>>> [root at englogin01 backups]# lfs df -i
>>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>>> fortefs-MDT0000_UUID 497433511  33195991 464237520    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID  80289792  13585653  66704139   16%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID  80289792   7014185  73275607    8%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID  80289792   7013859  73275933    8%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  497433511  33195991 464237520    6%
/lustre/work
>>>
>>>
>>> After test:
>>>
>>> [aeverett at englogin01 ~]$ lfs df
>>> UUID                 1K-blocks      Used Available  Use% Mounted on
>>> fortefs-MDT0000_UUID 1878903960 129425104 1749478856    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID 1264472876 759191664 505281212   60%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID 1264472876 395929536 868543340   31%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID 1264472876 393392924 871079952   31%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  3793418628 1548514124 2244904504   40%
/lustre/work
>>>
>>> [aeverett at englogin01 ~]$ lfs df -i
>>> UUID                    Inodes     IUsed     IFree IUse% Mounted on
>>> fortefs-MDT0000_UUID 497511996  33298931 464213065    6%
/lustre/work[MDT:0]
>>> fortefs-OST0000_UUID  80289792  13665028  66624764   17%
/lustre/work[OST:0]
>>> fortefs-OST0001_UUID  80289792   7013783  73276009    8%
/lustre/work[OST:1]
>>> fortefs-OST0002_UUID  80289792   7013456  73276336    8%
/lustre/work[OST:2]
>>>
>>> filesystem summary:  497511996  33298931 464213065    6%
/lustre/work
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Brian J.
Murrell
>>> Sent: Thursday, March 19, 2009 3:13 PM
>>> To: lustre-discuss at lists.lustre.org
>>> Subject: Re: [Lustre-discuss] Unbalanced load across OST''s
>>>
>>> On Thu, 2009-03-19 at 14:33 -0400, Aaron Everett wrote:
>>>   
>>>     
>>>       
>>>> Hello all,
>>>>     
>>>>       
>>>>         
>>> Hi,
>>>
>>>   
>>>     
>>>       
>>>> We are running 1.6.6 with a shared mgs/mdt and 3 ost?s. We run
a set
>>>> of tests that write heavily, then we review the results and
delete the
>>>> data. Usually the load is evenly spread across all 3 ost?s. I
noticed
>>>> this afternoon that the load does not seem to be distributed.
>>>>     
>>>>       
>>>>         
>>> Striping as well as file count and size affects OST distribution as
well.  Are any of the data involved striped?  Are you writing very few large
files before you measure distribution?
>>>
>>>   
>>>     
>>>       
>>>> OST0000 has a load of 50+ with iowait of around 10%
>>>>
>>>> OST0001 has a load of <1 with >99% idle
>>>>
>>>> OST0002 has a load of <1 with >99% idle
>>>>     
>>>>       
>>>>         
>>> What does lfs df say before and after such a test that produces the
above results?  Does it bear out even use amongst the OST before, and after the
test?
>>>
>>>   
>>>     
>>>       
>>>> df confirms the lopsided writes:
>>>>     
>>>>       
>>>>         
>>> lfs df [-i] from a client is usually more illustrative of use.  As
I say above, if you can quiesce the filesystem for the test above, do an lfs df;
lfs df -i before the test and after.  Assuming you were successful in quiescing,
you should see the change to the OSTs that your test effected.
>>>
>>>   
>>>     
>>>       
>>>> OST0000:
>>>>
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>
>>>> /dev/sdb1             1.2T  602G  544G  53% /mnt/fortefs/ost0
>>>>     
>>>>       
>>>>         
>>> What''s important is what it looked like before the test
too.  Your test could have, for example, wrote a single object (i.e. file) of
nearly 300G for all we can tell from what you''ve posted so far.
>>>
>>> b.
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>   
>>>     
>>>       
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Lustre discuss - Mar 2009 - Unbalanced load across OST's

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s

[Lustre-discuss] Unbalanced load across OST''s