thr3ads.net - Lustre discuss - [Lustre-discuss] Balancing I/O Load [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Charles Taylor

2007-Nov-29 17:55 UTC

[Lustre-discuss] Balancing I/O Load

We are seeing some disturbing (probably due to our ignorance)  
behavior from lustre 1.6.3 right now.     We have 8 OSSs with 3 OSTs  
per OSS (24 physical LUNs).   We just created a brand new lustre file  
system across this configuration using the default mkfs.lustre  
formatting options.    We have this file system mounted across 400  
clients.

At the moment, we have 63 IOzone threads running on roughly 60  
different clients.    The balance among the OSSs is terrible and  
within each OSS, the balance across the OSTs (luns) is even worse.     
We have one OSS with a load of 100 and another that is not being  
touched.    On several of the OSSs, only one OST (luns) is being used  
while the other two are ignored entirely.

This is really just a bnuch of random I/O (both large and small  
block) from a bunch of random clients (as will occur in real-life)  
and our lustre implementation is not making very good use of the  
available resources.   Can this be tuned?    What are we doing  
wrong?    The 1.6 operations manual (version 1.9) does not say a lot  
about options for balancing the work load among OSSs/OSTs.     
Shouldn''t lustre be doing a better job (by default) of distributing  
the workload?

Charlie Taylor
UF HPC Center

FWIW, the servers are dual-processor, dual-core Opterons (275s) with  
4GB RAM each.   They are running CentOS 5 w/ a  
2.6.18-8.1.14.el5Lustre (patched lustre, smp kernel) and the deadline  
I/O scheduler.   If it matters, our OSTs are atop LVM2 volumes (for  
management).    The back-end storage is all Fibre-channel RAID  
(Xyratex).    We have tuned the servers and know that we can get  
roughly 500MB/s per server across a striped *local* file system.

Tom.Wang

2007-Nov-29 22:30 UTC

head link

[Lustre-discuss] Balancing I/O Load

Charles Taylor wrote:> We are seeing some disturbing (probably due to our ignorance)  
> behavior from lustre 1.6.3 right now.     We have 8 OSSs with 3 OSTs  
> per OSS (24 physical LUNs).   We just created a brand new lustre file  
> system across this configuration using the default mkfs.lustre  
> formatting options.    We have this file system mounted across 400  
> clients.
>
> At the moment, we have 63 IOzone threads running on roughly 60  
> different clients.    The balance among the OSSs is terrible and  
> within each OSS, the balance across the OSTs (luns) is even worse.     
> We have one OSS with a load of 100 and another that is not being  
> touched.    On several of the OSSs, only one OST (luns) is being used  
> while the other two are ignored entirely.
>
> This is really just a bnuch of random I/O (both large and small  
> block) from a bunch of random clients (as will occur in real-life)  
> and our lustre implementation is not making very good use of the  
> available resources.   Can this be tuned?    What are we doing  
> wrong?    The 1.6 operations manual (version 1.9) does not say a lot  
> about options for balancing the work load among OSSs/OSTs.     
> Shouldn''t lustre be doing a better job (by default) of
distributing
> the workload?
>   Actually, lustre 1.6 of course consider balancing ost load. When it create
the objects, it will choose the OST by its usages and also  try to  
choose the ost
from different OSS.  If the ost usages is about the same, lustre choose 
the OST
by rr policy.  I think  you  should  check  your  ost usage. And also
use lfs find to get the files located in that heave load OSS and copy 
them to redistribute that.

Thanks
WangDi

 
> Charlie Taylor
> UF HPC Center
>
> FWIW, the servers are dual-processor, dual-core Opterons (275s) with  
> 4GB RAM each.   They are running CentOS 5 w/ a  
> 2.6.18-8.1.14.el5Lustre (patched lustre, smp kernel) and the deadline  
> I/O scheduler.   If it matters, our OSTs are atop LVM2 volumes (for  
> management).    The back-end storage is all Fibre-channel RAID  
> (Xyratex).    We have tuned the servers and know that we can get  
> roughly 500MB/s per server across a striped *local* file system. 
>    
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Charles Taylor

2007-Nov-30 11:16 UTC

head link

[Lustre-discuss] Balancing I/O Load

Ok.  I guess this was something we were doing in our tests.   Now  
running over 100 iozone threads across a random mix of IB (o2ib) and  
tcp (ethernet<->IPoIB) clients.   All the OSTs on all the OSSs are  
going at 100%.   Very nice!   And interactive response is still very  
good.  :)


Charlie Taylor
UF HPC Center

On Nov 29, 2007, at 12:55 PM, Charles Taylor wrote:
>
> We are seeing some disturbing (probably due to our ignorance)
> behavior from lustre 1.6.3 right now.     We have 8 OSSs with 3 OSTs
> per OSS (24 physical LUNs).   We just created a brand new lustre file
> system across this configuration using the default mkfs.lustre
> formatting options.    We have this file system mounted across 400
> clients.
>
> At the moment, we have 63 IOzone threads running on roughly 60
> different clients.    The balance among the OSSs is terrible and
> within each OSS, the balance across the OSTs (luns) is even worse.
> We have one OSS with a load of 100 and another that is not being
> touched.    On several of the OSSs, only one OST (luns) is being used
> while the other two are ignored entirely.
>
> This is really just a bnuch of random I/O (both large and small
> block) from a bunch of random clients (as will occur in real-life)
> and our lustre implementation is not making very good use of the
> available resources.   Can this be tuned?    What are we doing
> wrong?    The 1.6 operations manual (version 1.9) does not say a lot
> about options for balancing the work load among OSSs/OSTs.
> Shouldn''t lustre be doing a better job (by default) of
distributing
> the workload?
>
> Charlie Taylor
> UF HPC Center
>
> FWIW, the servers are dual-processor, dual-core Opterons (275s) with
> 4GB RAM each.   They are running CentOS 5 w/ a
> 2.6.18-8.1.14.el5Lustre (patched lustre, smp kernel) and the deadline
> I/O scheduler.   If it matters, our OSTs are atop LVM2 volumes (for
> management).    The back-end storage is all Fibre-channel RAID
> (Xyratex).    We have tuned the servers and know that we can get
> roughly 500MB/s per server across a striped *local* file system.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Possibly Parallel Threads

Search for more reasonably related threads

Lustre discuss - Nov 2007 - Balancing I/O Load

[Lustre-discuss] Balancing I/O Load

[Lustre-discuss] Balancing I/O Load

[Lustre-discuss] Balancing I/O Load

Possibly Parallel Threads