thr3ads.net - Lustre discuss - [Lustre-discuss] Performance problems with Lustre 1.6.1 [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Juan Piernas Canovas

2007-Oct-01 22:56 UTC

[Lustre-discuss] Performance problems with Lustre 1.6.1

Hi all,

I have set up a small Lustre file system with 1 MDS and 8 OSS/OST. The 
particularity of our system is that every OSS is also a client of the 
file system (there are 8 clients altogether).

The file system has a 1 GB file striped across all the OSTs. On every 
OST, there is a process which reads the file chunks stored locally, 
e.g., in its own OST (since the processes have the striping information 
of the file, each one knows which portions of the file are stored in its 
OST).

The problem that I have is that, when the stripe size is 1MB (what means 
that there are 1024 chunks in total, or 128 chunks per OST), it takes 
more than 400 seconds to read the file, and the network traffic is very 
high. However, if the stripe size is 128 MB (8 chunks altogether, one 
per OST), it takes only around 100 seconds to read the file, and the 
network traffic is 1/10th the previous one. Note that, in both cases, 
the data I/O operations are local and that the processes read the same 
amount of data.

Could this be a problem with the lock mechanism and the caching on the 
clients? If so, I have seen that the ldlm can be disabled, but, how? 
(The processes read from disjoint parts of the file, so they do not 
really need the ldlm service).

Thanks in advance,

    Juan.

Kilian CAVALOTTI

2007-Oct-01 23:01 UTC

head link

[Lustre-discuss] Performance problems with Lustre 1.6.1

Hi Juan,

On Monday 01 October 2007 03:56:33 pm Juan Piernas Canovas
wrote:> I have set up a small Lustre file system with 1 MDS and 8 OSS/OST.
> The particularity of our system is that every OSS is also a client of
> the file system (there are 8 clients altogether).
I''m not sure if that''s related, but I recall reading in the
Lustre
manual that running a client and an OST on the same machine could lead 
to a whole range of unexpected results, including deadlocks:
http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-26-1.html#wp1072362

Cheers,
-- 
Kilian

Juan Piernas Canovas

2007-Oct-01 23:08 UTC

head link

[Lustre-discuss] Performance problems with Lustre 1.6.1

Hi Kilian,

Thanks for your reply. Yes, I have also read that a deadlock can occur, 
but I have not had that problem so far.

I have forgotten to mention that if I use 8 independent files, one per 
OST, and every process reads the file in its OST, the performance is 
even better. Therefore, I assume (maybe, wrongly) that this is a 
consistency/synchronization problem when several process access to the 
same file, even when all of them are reading non-overlapped portions.

Regards,

    Juan.

Kilian CAVALOTTI wrote:> Hi Juan,
>
> On Monday 01 October 2007 03:56:33 pm Juan Piernas Canovas wrote:
>   
>> I have set up a small Lustre file system with 1 MDS and 8 OSS/OST.
>> The particularity of our system is that every OSS is also a client of
>> the file system (there are 8 clients altogether).
>>     
>
> I''m not sure if that''s related, but I recall reading in
the Lustre
> manual that running a client and an OST on the same machine could lead 
> to a whole range of unexpected results, including deadlocks:
>
http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-26-1.html#wp1072362
>
> Cheers,
>

Andreas Dilger

2007-Oct-02 06:01 UTC

head link

[Lustre-discuss] Performance problems with Lustre 1.6.1

On Oct 01, 2007  15:56 -0700, Juan Piernas Canovas
wrote:> I have set up a small Lustre file system with 1 MDS and 8 OSS/OST. The 
> particularity of our system is that every OSS is also a client of the 
> file system (there are 8 clients altogether).
> 
> The file system has a 1 GB file striped across all the OSTs. On every 
> OST, there is a process which reads the file chunks stored locally, 
> e.g., in its own OST (since the processes have the striping information 
> of the file, each one knows which portions of the file are stored in its 
> OST).
> 
> The problem that I have is that, when the stripe size is 1MB (what means 
> that there are 1024 chunks in total, or 128 chunks per OST), it takes 
> more than 400 seconds to read the file, and the network traffic is very 
> high. However, if the stripe size is 128 MB (8 chunks altogether, one 
> per OST), it takes only around 100 seconds to read the file, and the 
> network traffic is 1/10th the previous one. Note that, in both cases, 
> the data I/O operations are local and that the processes read the same 
> amount of data.
It sounds like the readahead is reading the "unused" parts of the file
on the other OSTs.  Are you also reading data from disk in 1MB chunks,
or in smaller chunks?  You should read at the stripe size for best
performance in this test.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Juan Piernas Canovas

2007-Oct-10 00:13 UTC

head link

[Lustre-discuss] Performance problems with Lustre 1.6.1

Andreas Dilger wrote:> On Oct 01, 2007  15:56 -0700, Juan Piernas Canovas wrote:
>   
>> I have set up a small Lustre file system with 1 MDS and 8 OSS/OST. The 
>> particularity of our system is that every OSS is also a client of the 
>> file system (there are 8 clients altogether).
>>
>> The file system has a 1 GB file striped across all the OSTs. On every 
>> OST, there is a process which reads the file chunks stored locally, 
>> e.g., in its own OST (since the processes have the striping information
>> of the file, each one knows which portions of the file are stored in
its
>> OST).
>>
>> The problem that I have is that, when the stripe size is 1MB (what
means
>> that there are 1024 chunks in total, or 128 chunks per OST), it takes 
>> more than 400 seconds to read the file, and the network traffic is very
>> high. However, if the stripe size is 128 MB (8 chunks altogether, one 
>> per OST), it takes only around 100 seconds to read the file, and the 
>> network traffic is 1/10th the previous one. Note that, in both cases, 
>> the data I/O operations are local and that the processes read the same 
>> amount of data.
>>     
>
> It sounds like the readahead is reading the "unused" parts of the
file
> on the other OSTs.  Are you also reading data from disk in 1MB chunks,
> or in smaller chunks?  You should read at the stripe size for best
> performance in this test.
>
>   Hi Andreas,

Thank you. You are right. One problem was that the readahead made a 
process on an OST read chunks from other OSTs. That explains the network 
traffic. The other problem was the size of the I/O requests, which was 
too "small" (16 KB). The interesting point is that the (small) request
size was the same in both configurations (1 MB and 128 MB stripe sizes) 
but, even then, the times were very different.

Regards,

    Juan.> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>

Andreas Dilger

2007-Oct-10 23:25 UTC

head link

[Lustre-discuss] Performance problems with Lustre 1.6.1

On Oct 09, 2007  17:13 -0700, Juan Piernas Canovas
wrote:> Andreas Dilger wrote:
> >On Oct 01, 2007  15:56 -0700, Juan Piernas Canovas wrote:
> >>The problem that I have is that, when the stripe size is 1MB (what
means
> >>that there are 1024 chunks in total, or 128 chunks per OST), it
takes
> >>more than 400 seconds to read the file, and the network traffic is
very
> >>high. However, if the stripe size is 128 MB (8 chunks altogether,
one
> >>per OST), it takes only around 100 seconds to read the file, and
the
> >>network traffic is 1/10th the previous one. Note that, in both
cases,
> >>the data I/O operations are local and that the processes read the
same
> >>amount of data.
> >
> >It sounds like the readahead is reading the "unused" parts of
the file
> >on the other OSTs.  Are you also reading data from disk in 1MB chunks,
> >or in smaller chunks?  You should read at the stripe size for best
> >performance in this test.
> 
> Thank you. You are right. One problem was that the readahead made a 
> process on an OST read chunks from other OSTs. That explains the network 
> traffic. The other problem was the size of the I/O requests, which was 
> too "small" (16 KB). The interesting point is that the (small)
request
> size was the same in both configurations (1 MB and 128 MB stripe sizes) 
> but, even then, the times were very different.
In the 128MB stripe case, the readahead is mostly reading within the
"local" stripe, so the overhead is minimal (some overflow into the
next
stripe but not so much).  In the 1MB stripe case, the majority of the
readahead will be in irrelevant parts of the filesystem and will slow
down the overall performance.

If you read with read size == stripe size you will get "random" read
heuristics (i.e. no readahead) and performance should be good.  Of
course, Lustre _should_ implement smarter strided readahead, but it
doesn''t yet (patches welcome :-).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Oct 2007 - Performance problems with Lustre 1.6.1

[Lustre-discuss] Performance problems with Lustre 1.6.1

[Lustre-discuss] Performance problems with Lustre 1.6.1

[Lustre-discuss] Performance problems with Lustre 1.6.1

[Lustre-discuss] Performance problems with Lustre 1.6.1

[Lustre-discuss] Performance problems with Lustre 1.6.1

[Lustre-discuss] Performance problems with Lustre 1.6.1