thr3ads.net - Lustre discuss - [Lustre-discuss] noatime or atime

If this information is useful, please help other people find it:
Share via:

Grigory Shamov

2012-Dec-06 19:06 UTC

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Hi,

On our cluster, when there is a load on Lustre FS, at some points it slows down
precipitously, and there are very very many "slow IO " and "slow
setattr" messages on the OSS servers:

======[2988758.408968] Lustre: scratch-OST0004: slow i_mutex 51s due to heavy IO
load
[2988758.408974] Lustre: Skipped 276 previous similar messages
[2988760.309388] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO load
[2988822.617865] Lustre: scratch-OST0004: slow setattr 62s due to heavy IO load
[2988822.689819] Lustre: scratch-OST0004: slow journal start 48s due to heavy IO
load
[2988822.690627] Lustre: scratch-OST0004: slow journal start 56s due to heavy IO
load
[2988823.125410] Lustre: scratch-OST0004: slow parent lock 55s due to heavy IO
load
[2988823.125419] Lustre: Skipped 1 previous similar message
[2988823.125432] Lustre: scratch-OST0004: slow preprw_write setup 55s due to
heavy IO load
[2988856.236914] Lustre: scratch-OST0004: slow direct_io 33s due to heavy IO
load
[2988856.236922] Lustre: Skipped 323 previous similar messages
[2988892.543942] Lustre: scratch-OST0004: slow i_mutex 48s due to heavy IO load
[2988892.543950] Lustre: Skipped 280 previous similar messages
[2988892.545310] Lustre: scratch-OST0004: slow setattr 55s due to heavy IO load
[2988892.547328] Lustre: scratch-OST0004: slow parent lock 42s due to heavy IO
load
[2988892.547334] Lustre: Skipped 4 previous similar messages
[2988958.306720] Lustre: scratch-OST0004: slow setattr 52s due to heavy IO load
[2988958.306724] Lustre: Skipped 1 previous similar message
[2988958.310818] Lustre: scratch-OST0004: slow parent lock 59s due to heavy IO
load
[2989040.406738] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO load
========
I wonder if mounting it on clients with "noatime" and/or changing the
atime_diff would help to rid off of these Lustre slowdowns? Right now we have: 
/proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS server is 60.

I''ve tried to Google it first, and found that apparently "noatime
" is not supported for 1.8, and changing atime_diff is the preferred way?

Could you please advise me, which way is better/possible, and how does one
change atime_diff?  Will it help? Does it require, say, client''s
remount, etc.?

Any ideas and advice would be greatly appreciated! Thank you very much in
advance.


--
Grigory Shamov
HPC Analyst, Westgrid/Compute Canada
E2-588 EITC Building, University of Manitoba
(204) 474-9625

Colin Faber

2012-Dec-06 19:28 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Hi,

The messages indicate overloaded backend storage. You could try this, 
another option may be to statically set the maximum number of threads on 
the OSS, this should reduce load to the system and push the backlogs to 
your clients (hopefully)

-cf


On 12/06/2012 12:06 PM, Grigory Shamov wrote:> Hi,
>
> On our cluster, when there is a load on Lustre FS, at some points it slows
down precipitously, and there are very very many "slow IO " and
"slow setattr" messages on the OSS servers:
>
> ======> [2988758.408968] Lustre: scratch-OST0004: slow i_mutex 51s due
to heavy IO load
> [2988758.408974] Lustre: Skipped 276 previous similar messages
> [2988760.309388] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO
load
> [2988822.617865] Lustre: scratch-OST0004: slow setattr 62s due to heavy IO
load
> [2988822.689819] Lustre: scratch-OST0004: slow journal start 48s due to
heavy IO load
> [2988822.690627] Lustre: scratch-OST0004: slow journal start 56s due to
heavy IO load
> [2988823.125410] Lustre: scratch-OST0004: slow parent lock 55s due to heavy
IO load
> [2988823.125419] Lustre: Skipped 1 previous similar message
> [2988823.125432] Lustre: scratch-OST0004: slow preprw_write setup 55s due
to heavy IO load
> [2988856.236914] Lustre: scratch-OST0004: slow direct_io 33s due to heavy
IO load
> [2988856.236922] Lustre: Skipped 323 previous similar messages
> [2988892.543942] Lustre: scratch-OST0004: slow i_mutex 48s due to heavy IO
load
> [2988892.543950] Lustre: Skipped 280 previous similar messages
> [2988892.545310] Lustre: scratch-OST0004: slow setattr 55s due to heavy IO
load
> [2988892.547328] Lustre: scratch-OST0004: slow parent lock 42s due to heavy
IO load
> [2988892.547334] Lustre: Skipped 4 previous similar messages
> [2988958.306720] Lustre: scratch-OST0004: slow setattr 52s due to heavy IO
load
> [2988958.306724] Lustre: Skipped 1 previous similar message
> [2988958.310818] Lustre: scratch-OST0004: slow parent lock 59s due to heavy
IO load
> [2989040.406738] Lustre: scratch-OST0004: slow setattr 50s due to heavy IO
load
> ========>
> I wonder if mounting it on clients with "noatime" and/or changing
the atime_diff would help to rid off of these Lustre slowdowns? Right now we
have:  /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS server is 60.
>
> I''ve tried to Google it first, and found that apparently
"noatime " is not supported for 1.8, and changing atime_diff is the
preferred way?
>
> Could you please advise me, which way is better/possible, and how does one
change atime_diff?  Will it help? Does it require, say, client''s
remount, etc.?
>
> Any ideas and advice would be greatly appreciated! Thank you very much in
advance.
>
>
> --
> Grigory Shamov
> HPC Analyst, Westgrid/Compute Canada
> E2-588 EITC Building, University of Manitoba
> (204) 474-9625
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Dilger, Andreas

2012-Dec-06 19:41 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

On 12/6/12 12:06 PM, "Grigory Shamov" <gas5x at yahoo.com>
wrote:
>Hi,
>
>On our cluster, when there is a load on Lustre FS, at some points it
>slows down precipitously, and there are very very many "slow IO "
and
>"slow setattr" messages on the OSS servers:
>
>======>[2988758.408968] Lustre: scratch-OST0004: slow i_mutex 51s due to
heavy
>IO load
>[2988758.408974] Lustre: Skipped 276 previous similar messages
>[2988760.309388] Lustre: scratch-OST0004: slow setattr 50s due to heavy
>IO load
>[2988822.617865] Lustre: scratch-OST0004: slow setattr 62s due to heavy
>IO load
>[2988822.689819] Lustre: scratch-OST0004: slow journal start 48s due to
>heavy IO load
>[2988822.690627] Lustre: scratch-OST0004: slow journal start 56s due to
>heavy IO load
>[2988823.125410] Lustre: scratch-OST0004: slow parent lock 55s due to
>heavy IO load
>[2988823.125419] Lustre: Skipped 1 previous similar message
>[2988823.125432] Lustre: scratch-OST0004: slow preprw_write setup 55s due
>to heavy IO load
>[2988856.236914] Lustre: scratch-OST0004: slow direct_io 33s due to heavy
>IO load
>[2988856.236922] Lustre: Skipped 323 previous similar messages
>[2988892.543942] Lustre: scratch-OST0004: slow i_mutex 48s due to heavy
>IO load
>[2988892.543950] Lustre: Skipped 280 previous similar messages
>[2988892.545310] Lustre: scratch-OST0004: slow setattr 55s due to heavy
>IO load
>[2988892.547328] Lustre: scratch-OST0004: slow parent lock 42s due to
>heavy IO load
>[2988892.547334] Lustre: Skipped 4 previous similar messages
>[2988958.306720] Lustre: scratch-OST0004: slow setattr 52s due to heavy
>IO load
>[2988958.306724] Lustre: Skipped 1 previous similar message
>[2988958.310818] Lustre: scratch-OST0004: slow parent lock 59s due to
>heavy IO load
>[2989040.406738] Lustre: scratch-OST0004: slow setattr 50s due to heavy
>IO load
>========>
>I wonder if mounting it on clients with "noatime" and/or changing
the
>atime_diff would help to rid off of these Lustre slowdowns? Right now we
>have:  /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS server
>is 60.
No atime updates are ever written to disk on the OSTs, and at most only
once every 10 minutes on the MDT.  This is very likely due to small IO
from the client or similar.  Check "lctl get_param
obdfilter.*.brw_stats"
to see what kind of IO pattern the clients are sending.
>I''ve tried to Google it first, and found that apparently
"noatime " is
>not supported for 1.8, and changing atime_diff is the preferred way?
>
>Could you please advise me, which way is better/possible, and how does
>one change atime_diff?  Will it help? Does it require, say,
client''s
>remount, etc.?
>
>Any ideas and advice would be greatly appreciated! Thank you very much in
>advance.
>
>
>--
>Grigory Shamov
>HPC Analyst, Westgrid/Compute Canada
>E2-588 EITC Building, University of Manitoba
>(204) 474-9625
>
>
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Grigory Shamov

2012-Dec-06 19:45 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Dear Colin,

Thanks for the reply!

We have reduced the number of OST threads earlier, from the original DDN setting
of 256 to 160. Looks like it made things better, but still, the problem
persists. Reducing number of OST threads to a number that is lesser than number
of clients seems to cause problems too..

Also, do you know if having OSS servers in active-active failover configuration
affects the Lustre performance? Could it be that it forces sync on all I/O, or
something of this sort to happen?



--
Grigory Shamov


--- On Thu, 12/6/12, Colin Faber <colin_faber at xyratex.com> wrote:
> From: Colin Faber <colin_faber at xyratex.com>
> Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
> To: "Grigory Shamov" <gas5x at yahoo.com>
> Cc: lustre-discuss at lists.lustre.org
> Date: Thursday, December 6, 2012, 11:28 AM
> Hi,
> 
> The messages indicate overloaded backend storage. You could
> try this, 
> another option may be to statically set the maximum number
> of threads on 
> the OSS, this should reduce load to the system and push the
> backlogs to 
> your clients (hopefully)
> 
> -cf
> 
> 
> On 12/06/2012 12:06 PM, Grigory Shamov wrote:
> > Hi,
> >
> > On our cluster, when there is a load on Lustre FS, at
> some points it slows down precipitously, and there are very
> very many "slow IO " and "slow setattr" messages on the
OSS
> servers:
> >
> > ======> > [2988758.408968] Lustre: scratch-OST0004: slow i_mutex
> 51s due to heavy IO load
> > [2988758.408974] Lustre: Skipped 276 previous similar
> messages
> > [2988760.309388] Lustre: scratch-OST0004: slow setattr
> 50s due to heavy IO load
> > [2988822.617865] Lustre: scratch-OST0004: slow setattr
> 62s due to heavy IO load
> > [2988822.689819] Lustre: scratch-OST0004: slow journal
> start 48s due to heavy IO load
> > [2988822.690627] Lustre: scratch-OST0004: slow journal
> start 56s due to heavy IO load
> > [2988823.125410] Lustre: scratch-OST0004: slow parent
> lock 55s due to heavy IO load
> > [2988823.125419] Lustre: Skipped 1 previous similar
> message
> > [2988823.125432] Lustre: scratch-OST0004: slow
> preprw_write setup 55s due to heavy IO load
> > [2988856.236914] Lustre: scratch-OST0004: slow
> direct_io 33s due to heavy IO load
> > [2988856.236922] Lustre: Skipped 323 previous similar
> messages
> > [2988892.543942] Lustre: scratch-OST0004: slow i_mutex
> 48s due to heavy IO load
> > [2988892.543950] Lustre: Skipped 280 previous similar
> messages
> > [2988892.545310] Lustre: scratch-OST0004: slow setattr
> 55s due to heavy IO load
> > [2988892.547328] Lustre: scratch-OST0004: slow parent
> lock 42s due to heavy IO load
> > [2988892.547334] Lustre: Skipped 4 previous similar
> messages
> > [2988958.306720] Lustre: scratch-OST0004: slow setattr
> 52s due to heavy IO load
> > [2988958.306724] Lustre: Skipped 1 previous similar
> message
> > [2988958.310818] Lustre: scratch-OST0004: slow parent
> lock 59s due to heavy IO load
> > [2989040.406738] Lustre: scratch-OST0004: slow setattr
> 50s due to heavy IO load
> > ========> >
> > I wonder if mounting it on clients with "noatime"
> and/or changing the atime_diff would help to rid off of
> these Lustre slowdowns? Right now we have:?
> /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS
> server is 60.
> >
> > I''ve tried to Google it first, and found that
> apparently "noatime " is not supported for 1.8, and changing
> atime_diff is the preferred way?
> >
> > Could you please advise me, which way is
> better/possible, and how does one change atime_diff??
> Will it help? Does it require, say, client''s remount, etc.?
> >
> > Any ideas and advice would be greatly appreciated!
> Thank you very much in advance.
> >
> >
> > --
> > Grigory Shamov
> > HPC Analyst, Westgrid/Compute Canada
> > E2-588 EITC Building, University of Manitoba
> > (204) 474-9625
> >
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
>

Grigory Shamov

2012-Dec-06 19:58 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Dear Andreas,

Thank you for the reply!

So, on one of our OSS servers the load is now 160. According to collectl, only
one OST does most of the job. (We dont do striping on this FS; unless users to
it manually on their subdirectories). I''ve done the obdfilter stats,
and for disk I/O size I get:


disk I/O size          ios   % cum % |  ios   % cum %
4K:		 282890357  34  34   | 22425884  44  44
8K:		  18651648   2  36   | 503635   0  45
16K:		  31817375   3  40   | 1415935   2  48
32K:		  47552890   5  46   | 308395   0  48
64K:		  61437915   7  53   | 248666   0  49
128K:		  72863407   8  62   | 520857   1  50
256K:		  26320421   3  65   | 1144803   2  52
512K:		  15805554   1  67   | 1703988   3  55
1M:		 264536729  32 100   | 22336867  44 100

Am I looking at the right table? So, does it mean that we have small 4K I/O,
which is 34% for reads and 44 for writes and is the cause of the problem?

--
Grigory Shamov


--- On Thu, 12/6/12, Dilger, Andreas <andreas.dilger at intel.com> wrote:
> From: Dilger, Andreas <andreas.dilger at intel.com>
> Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
> To: "Grigory Shamov" <gas5x at yahoo.com>
> Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at
lists.lustre.org>
> Date: Thursday, December 6, 2012, 11:41 AM
> On 12/6/12 12:06 PM, "Grigory Shamov"
> <gas5x at yahoo.com>
> wrote:
> 
> >Hi,
> >
> >On our cluster, when there is a load on Lustre FS, at
> some points it
> >slows down precipitously, and there are very very many
> "slow IO " and
> >"slow setattr" messages on the OSS servers:
> >
> >======> >[2988758.408968] Lustre: scratch-OST0004: slow i_mutex
> 51s due to heavy
> >IO load
> >[2988758.408974] Lustre: Skipped 276 previous similar
> messages
> >[2988760.309388] Lustre: scratch-OST0004: slow setattr
> 50s due to heavy
> >IO load
> >[2988822.617865] Lustre: scratch-OST0004: slow setattr
> 62s due to heavy
> >IO load
> >[2988822.689819] Lustre: scratch-OST0004: slow journal
> start 48s due to
> >heavy IO load
> >[2988822.690627] Lustre: scratch-OST0004: slow journal
> start 56s due to
> >heavy IO load
> >[2988823.125410] Lustre: scratch-OST0004: slow parent
> lock 55s due to
> >heavy IO load
> >[2988823.125419] Lustre: Skipped 1 previous similar
> message
> >[2988823.125432] Lustre: scratch-OST0004: slow
> preprw_write setup 55s due
> >to heavy IO load
> >[2988856.236914] Lustre: scratch-OST0004: slow direct_io
> 33s due to heavy
> >IO load
> >[2988856.236922] Lustre: Skipped 323 previous similar
> messages
> >[2988892.543942] Lustre: scratch-OST0004: slow i_mutex
> 48s due to heavy
> >IO load
> >[2988892.543950] Lustre: Skipped 280 previous similar
> messages
> >[2988892.545310] Lustre: scratch-OST0004: slow setattr
> 55s due to heavy
> >IO load
> >[2988892.547328] Lustre: scratch-OST0004: slow parent
> lock 42s due to
> >heavy IO load
> >[2988892.547334] Lustre: Skipped 4 previous similar
> messages
> >[2988958.306720] Lustre: scratch-OST0004: slow setattr
> 52s due to heavy
> >IO load
> >[2988958.306724] Lustre: Skipped 1 previous similar
> message
> >[2988958.310818] Lustre: scratch-OST0004: slow parent
> lock 59s due to
> >heavy IO load
> >[2989040.406738] Lustre: scratch-OST0004: slow setattr
> 50s due to heavy
> >IO load
> >========> >
> >I wonder if mounting it on clients with "noatime" and/or
> changing the
> >atime_diff would help to rid off of these Lustre
> slowdowns? Right now we
> >have:?
> /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS
> server
> >is 60.
> 
> No atime updates are ever written to disk on the OSTs, and
> at most only
> once every 10 minutes on the MDT.? This is very likely
> due to small IO
> from the client or similar.? Check "lctl get_param
> obdfilter.*.brw_stats"
> to see what kind of IO pattern the clients are sending.
> 
> >I''ve tried to Google it first, and found that apparently
> "noatime " is
> >not supported for 1.8, and changing atime_diff is the
> preferred way?
> >
> >Could you please advise me, which way is
> better/possible, and how does
> >one change atime_diff?? Will it help? Does it
> require, say, client''s
> >remount, etc.?
> >
> >Any ideas and advice would be greatly appreciated! Thank
> you very much in
> >advance.
> >
> >
> >--
> >Grigory Shamov
> >HPC Analyst, Westgrid/Compute Canada
> >E2-588 EITC Building, University of Manitoba
> >(204) 474-9625
> >
> >
> >_______________________________________________
> >Lustre-discuss mailing list
> >Lustre-discuss at lists.lustre.org
> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> 
> 
>

Colin Faber

2012-Dec-06 20:01 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Hi Grigory,

The active-active failover configuration should make no difference here 
unless you''re running block level replication between hosts (outside
the
scope of lustre).

What tuning do you currently have in place? Also, what kind of client 
work load are you experiencing (large / small file io)?

-cf


On 12/06/2012 12:45 PM, Grigory Shamov wrote:> Dear Colin,
>
> Thanks for the reply!
>
> We have reduced the number of OST threads earlier, from the original DDN
setting of 256 to 160. Looks like it made things better, but still, the problem
persists. Reducing number of OST threads to a number that is lesser than number
of clients seems to cause problems too..
>
> Also, do you know if having OSS servers in active-active failover
configuration affects the Lustre performance? Could it be that it forces sync on
all I/O, or something of this sort to happen?
>
>
>
> --
> Grigory Shamov
>
>
> --- On Thu, 12/6/12, Colin Faber <colin_faber at xyratex.com> wrote:
>
>> From: Colin Faber <colin_faber at xyratex.com>
>> Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
>> To: "Grigory Shamov" <gas5x at yahoo.com>
>> Cc: lustre-discuss at lists.lustre.org
>> Date: Thursday, December 6, 2012, 11:28 AM
>> Hi,
>>
>> The messages indicate overloaded backend storage. You could
>> try this,
>> another option may be to statically set the maximum number
>> of threads on
>> the OSS, this should reduce load to the system and push the
>> backlogs to
>> your clients (hopefully)
>>
>> -cf
>>
>>
>> On 12/06/2012 12:06 PM, Grigory Shamov wrote:
>>> Hi,
>>>
>>> On our cluster, when there is a load on Lustre FS, at
>> some points it slows down precipitously, and there are very
>> very many "slow IO " and "slow setattr" messages on
the OSS
>> servers:
>>> ======>>> [2988758.408968] Lustre: scratch-OST0004: slow
i_mutex
>> 51s due to heavy IO load
>>> [2988758.408974] Lustre: Skipped 276 previous similar
>> messages
>>> [2988760.309388] Lustre: scratch-OST0004: slow setattr
>> 50s due to heavy IO load
>>> [2988822.617865] Lustre: scratch-OST0004: slow setattr
>> 62s due to heavy IO load
>>> [2988822.689819] Lustre: scratch-OST0004: slow journal
>> start 48s due to heavy IO load
>>> [2988822.690627] Lustre: scratch-OST0004: slow journal
>> start 56s due to heavy IO load
>>> [2988823.125410] Lustre: scratch-OST0004: slow parent
>> lock 55s due to heavy IO load
>>> [2988823.125419] Lustre: Skipped 1 previous similar
>> message
>>> [2988823.125432] Lustre: scratch-OST0004: slow
>> preprw_write setup 55s due to heavy IO load
>>> [2988856.236914] Lustre: scratch-OST0004: slow
>> direct_io 33s due to heavy IO load
>>> [2988856.236922] Lustre: Skipped 323 previous similar
>> messages
>>> [2988892.543942] Lustre: scratch-OST0004: slow i_mutex
>> 48s due to heavy IO load
>>> [2988892.543950] Lustre: Skipped 280 previous similar
>> messages
>>> [2988892.545310] Lustre: scratch-OST0004: slow setattr
>> 55s due to heavy IO load
>>> [2988892.547328] Lustre: scratch-OST0004: slow parent
>> lock 42s due to heavy IO load
>>> [2988892.547334] Lustre: Skipped 4 previous similar
>> messages
>>> [2988958.306720] Lustre: scratch-OST0004: slow setattr
>> 52s due to heavy IO load
>>> [2988958.306724] Lustre: Skipped 1 previous similar
>> message
>>> [2988958.310818] Lustre: scratch-OST0004: slow parent
>> lock 59s due to heavy IO load
>>> [2989040.406738] Lustre: scratch-OST0004: slow setattr
>> 50s due to heavy IO load
>>> ========>>>
>>> I wonder if mounting it on clients with "noatime"
>> and/or changing the atime_diff would help to rid off of
>> these Lustre slowdowns? Right now we have:
>> /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS
>> server is 60.
>>> I''ve tried to Google it first, and found that
>> apparently "noatime " is not supported for 1.8, and changing
>> atime_diff is the preferred way?
>>> Could you please advise me, which way is
>> better/possible, and how does one change atime_diff?
>> Will it help? Does it require, say, client''s remount, etc.?
>>> Any ideas and advice would be greatly appreciated!
>> Thank you very much in advance.
>>>
>>> --
>>> Grigory Shamov
>>> HPC Analyst, Westgrid/Compute Canada
>>> E2-588 EITC Building, University of Manitoba
>>> (204) 474-9625
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>

Mohr Jr, Richard Frank (Rick Mohr)

2012-Dec-07 18:49 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote:
> So, on one of our OSS servers the load is now 160. According to collectl,
only one OST does most of the job. (We dont do striping on this FS; unless users
to it manually on their subdirectories).
This sounds similar to situations we see every now and then.  The load on the
oss server climbs until it is roughly equally to the number of oss threads
(which sounds like your case with load=oss_threads=160), but only a single ost
is performing any significant IO.  This seems to arise when parallel jobs access
the same file which has stripe_count=1.  The oss is bombarded with so many
requests to a single ost that they backlog and tie up all the oss threads.  At
that point, all IO to the oss slows to a crawl no matter which ost on the oss is
being used.  This becomes problematic because even a modest sized job can
effectively DOS and oss server.

When you encounter these problems, is the IO to the affected ost primarly
one-way (ie - mostly reads or mostly writes)?  In our cases, we tend to see this
when parallel jobs are reading from a common file.  There are a couple of things
that I have found that help:

1) Increase the file striping a lot.  This helps spread the load over more osts.
We have had success with striping even relatively small files (~10 GB) over 100+
osts.  Not only does it reduce load on the oss, but it usually speeds up the
application significantly.

2) Make sure caching is enabled on the oss.  For us, this seems to help mostly
when lots of processes are reading in the same file.

Not sure if your situation is exactly like what I have seen, but maybe some of
that info can help a bit.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

Mark Day

2012-Dec-08 00:22 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

> 2) Make sure caching is enabled on the oss. 
How do you check/enable for this? Is it not enabled by default? 

Cheers, Mark 

----- Original Message -----

From: "Mohr Jr, Richard Frank (Rick Mohr)" <rmohr at utk.edu> 
To: "Grigory Shamov" <gas5x at yahoo.com> 
Cc: lustre-discuss at lists.lustre.org 
Sent: Saturday, 8 December, 2012 5:19:31 AM 
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7? 

On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote: 
> So, on one of our OSS servers the load is now 160. According to collectl,
only one OST does most of the job. (We dont do striping on this FS; unless users
to it manually on their subdirectories).
This sounds similar to situations we see every now and then. The load on the oss
server climbs until it is roughly equally to the number of oss threads (which
sounds like your case with load=oss_threads=160), but only a single ost is
performing any significant IO. This seems to arise when parallel jobs access the
same file which has stripe_count=1. The oss is bombarded with so many requests
to a single ost that they backlog and tie up all the oss threads. At that point,
all IO to the oss slows to a crawl no matter which ost on the oss is being used.
This becomes problematic because even a modest sized job can effectively DOS and
oss server.

When you encounter these problems, is the IO to the affected ost primarly
one-way (ie - mostly reads or mostly writes)? In our cases, we tend to see this
when parallel jobs are reading from a common file. There are a couple of things
that I have found that help:

1) Increase the file striping a lot. This helps spread the load over more osts.
We have had success with striping even relatively small files (~10 GB) over 100+
osts. Not only does it reduce load on the oss, but it usually speeds up the
application significantly.

2) Make sure caching is enabled on the oss. For us, this seems to help mostly
when lots of processes are reading in the same file.

Not sure if your situation is exactly like what I have seen, but maybe some of
that info can help a bit.

-- 
Rick Mohr 
Senior HPC System Administrator 
National Institute for Computational Sciences 
http://www.nics.tennessee.edu 

_______________________________________________ 
Lustre-discuss mailing list 
Lustre-discuss at lists.lustre.org 
http://lists.lustre.org/mailman/listinfo/lustre-discuss 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121208/b235304e/attachment.html

Spyro Polymiadis

2012-Dec-09 10:29 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Maybe from here? 

https://fs.hlrs.de/projects/craydoc/docs/books/S-0010-31/html-S-0010-31/z1112312952ebishop.html

3.3.6 Disabling OSS Read Cache and Writethrough Cache 

Lustre uses the Linux page cache to provide read-only caching of data on object
storage servers (OSS). This strategy reduces disk access time caused by repeated
reads from an OST. OSS read cache is enabled by default, but you can disable it
by setting /proc parameters. For example, invoke the following on the OSS:
nid00008:~ # lctl set_param obdfilter.*.read_cache_enable 0
Writethrough cache can also be disabled. This prevents file writes from ending
up in the read cache. To disable writethrough cache, invoke the following on the
OSS: nid00008:~ # lctl set_param obdfilter.*.writethrough_cache_enable 0

----- Original Message -----

From: "Mark Day" <mark.day at rsp.com.au> 
To: "Mohr Jr, Richard Frank (Rick Mohr)" <rmohr at utk.edu> 
Cc: lustre-discuss at lists.lustre.org 
Sent: Saturday, 8 December, 2012 10:52:28 AM 
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7? 

> 2) Make sure caching is enabled on the oss. 
How do you check/enable for this? Is it not enabled by default? 

Cheers, Mark 

----- Original Message -----

From: "Mohr Jr, Richard Frank (Rick Mohr)" <rmohr at utk.edu> 
To: "Grigory Shamov" <gas5x at yahoo.com> 
Cc: lustre-discuss at lists.lustre.org 
Sent: Saturday, 8 December, 2012 5:19:31 AM 
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7? 

On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote: 
> So, on one of our OSS servers the load is now 160. According to collectl,
only one OST does most of the job. (We dont do striping on this FS; unless users
to it manually on their subdirectories).
This sounds similar to situations we see every now and then. The load on the oss
server climbs until it is roughly equally to the number of oss threads (which
sounds like your case with load=oss_threads=160), but only a single ost is
performing any significant IO. This seems to arise when parallel jobs access the
same file which has stripe_count=1. The oss is bombarded with so many requests
to a single ost that they backlog and tie up all the oss threads. At that point,
all IO to the oss slows to a crawl no matter which ost on the oss is being used.
This becomes problematic because even a modest sized job can effectively DOS and
oss server.

When you encounter these problems, is the IO to the affected ost primarly
one-way (ie - mostly reads or mostly writes)? In our cases, we tend to see this
when parallel jobs are reading from a common file. There are a couple of things
that I have found that help:

1) Increase the file striping a lot. This helps spread the load over more osts.
We have had success with striping even relatively small files (~10 GB) over 100+
osts. Not only does it reduce load on the oss, but it usually speeds up the
application significantly.

2) Make sure caching is enabled on the oss. For us, this seems to help mostly
when lots of processes are reading in the same file.

Not sure if your situation is exactly like what I have seen, but maybe some of
that info can help a bit.

-- 
Rick Mohr 
Senior HPC System Administrator 
National Institute for Computational Sciences 
http://www.nics.tennessee.edu 

_______________________________________________ 
Lustre-discuss mailing list 
Lustre-discuss at lists.lustre.org 
http://lists.lustre.org/mailman/listinfo/lustre-discuss 

_______________________________________________ 
Lustre-discuss mailing list 
Lustre-discuss at lists.lustre.org 
http://lists.lustre.org/mailman/listinfo/lustre-discuss 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121209/a8276848/attachment.html

Grigory Shamov

2012-Dec-10 16:43 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Dear Richard,

Thank you very much for the reply (somehow my email filter has eaten it so I
knew it only from Mark''s quotation). Yes, it seems that your analysis
is explaining our situation. We do see it when there is predominantly
"reading" activity on one of the OSTs. Actually the volume of reading
was small, thats why we couldn''t even locate an application that does
it. It can be explained then that tehre? was really a blocking sitoation, not a
throughput problem.


The way our system is configured, number of OSTs is small (13). We have zero
load on MDS, and stripe count 1.? The system is running and about 60% full; I
wonder how would be a best strategy to change the striping now. I understand
that if I just change the stripe count on the Lustre root dir, it will affect
only newly created files/directories. Should I copy the user''s files,
stripe their directories and then copy the data back? That sounds somewhat
dangerous, especially if the users do some unusual things with symlinks..

?-- 
Grigory Shamov
HPC Analyst, Westgrid/Compute Canada
E2-588 EITC Building, University of Manitoba
(204) 474-9625


--- On Fri, 12/7/12, Mark Day <mark.day at rsp.com.au> wrote:

From: Mark Day <mark.day at rsp.com.au>
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
To: "Mohr Jr, Richard Frank (Rick Mohr)" <rmohr at utk.edu>
Cc: lustre-discuss at lists.lustre.org, "Grigory Shamov" <gas5x at
yahoo.com>
Date: Friday, December 7, 2012, 4:22 PM

#yiv2002087058 p {margin:0;}> 2) Make sure caching is enabled on the oss.

How do you check/enable for this? Is it not enabled by default?

Cheers, Mark

From: "Mohr Jr, Richard Frank (Rick Mohr)" <rmohr at utk.edu>
To: "Grigory Shamov" <gas5x at yahoo.com>
Cc: lustre-discuss at lists.lustre.org
Sent: Saturday, 8 December, 2012 5:19:31 AM
Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote:
> So, on one of our OSS servers the load is now 160. According to collectl,
only one OST does most of the job. (We dont do striping on this FS; unless users
to it manually on their subdirectories).
This sounds similar to situations we see every now and then. ?The load on the
oss server climbs until it is roughly equally to the number of oss threads
(which sounds like your case with load=oss_threads=160), but only a single ost
is performing any significant IO. ?This seems to arise when parallel jobs access
the same file which has stripe_count=1. ?The oss is bombarded with so many
requests to a single ost that they backlog and tie up all the oss threads. ?At
that point, all IO to the oss slows to a crawl no matter which ost on the oss is
being used. ?This becomes problematic because even a modest sized job can
effectively DOS and oss server.

When you encounter these problems, is the IO to the affected ost primarly
one-way (ie - mostly reads or mostly writes)? ?In our cases, we tend to see this
when parallel jobs are reading from a common file. ?There are a couple of things
that I have found that help:

1) Increase the file striping a lot. ?This helps spread the load over more osts.
?We have had success with striping even relatively small files (~10 GB) over
100+ osts. ?Not only does it reduce load on the oss, but it usually speeds up
the application significantly.

2) Make sure caching is enabled on the oss. ?For us, this seems to help mostly
when lots of processes are reading in the same file.

Not sure if your situation is exactly like what I have seen, but maybe some of
that info can help a bit.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121210/e42d7f84/attachment.html

Mohr Jr, Richard Frank (Rick Mohr)

2012-Dec-10 16:59 UTC

head link

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

On Dec 10, 2012, at 11:43 AM, Grigory Shamov wrote:
> I wonder how would be a best strategy to change the striping now. I
understand that if I just change the stripe count on the Lustre root dir, it
will affect only newly created files/directories. Should I copy the
user''s files, stripe their directories and then copy the data back?
That sounds somewhat dangerous, especially if the users do some unusual things
with symlinks..
In our case, we approached the problem with user training.  We didn''t
make any changes to the file system itself.

In general, these users had small to medium files so we wanted them to use
stripe count 1 for their files.  The primary exception was for these shared
input files that were hit hard by multiple readers.  We contacted the users,
explained the situation, and gave them pointers on using "lfs
setstripe".  Once they started running jobs and seeing 30+% speed up, they
were happy to manage the striping for their files.

I don''t know your situation, so I can''t say if that approach
would work for you.  My experience has been that no matter what you choose for
the default stripe count, someone will create files that don''t work
well with the default.  So there always seems to be some need for user
education.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

Lustre discuss - Dec 2012 - noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?