thr3ads.net - Lustre discuss - [Lustre-discuss] Fwd: max_sectors

If this information is useful, please help other people find it:
Share via:

Erich Focht

2007-Sep-26 15:50 UTC

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

Ooops, this was actually meant for lustre-discuss, not for lustre-devel.

----------  Forwarded Message  ----------

Subject: [Lustre-devel] max_sectors_kb change doesn''t help
Date: Wednesday 26 September 2007 17:41
From: Erich Focht <efocht at hpce.nec.com>
To: lustre-devel at clusterfs.com

Hi,

in /proc/fs/lustre/obdfilter/*/brw_stats I found that the disk I/O is done
in 512K pieces. Following the Lustre manual I changed
/sys/block/DEVICE/queue/max_sectors_kb to 1024 but the I/O sizes in Lustre
didn''t change.

Is there any place in Lustre where I can enforce 1MB I/Os? What else can I do?
My RAID is connected through an mpt fusion driver and I cannot find any place
where the I/O size is limited to 512KB. In fact, without Lustre I am able to
write with larger I/Os, so I suspect it''s not the driver''s
fault...

Regards,
Erich

Andreas Dilger

2007-Sep-26 23:20 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

On Sep 26, 2007  17:50 +0200, Erich Focht wrote:> ----------  Forwarded Message  ----------
> 
> Subject: [Lustre-devel] max_sectors_kb change doesn''t help
> Date: Wednesday 26 September 2007 17:41
> From: Erich Focht <efocht at hpce.nec.com>
> To: lustre-devel at clusterfs.com
> 
> Hi,
> 
> in /proc/fs/lustre/obdfilter/*/brw_stats I found that the disk I/O is done
> in 512K pieces. Following the Lustre manual I changed
> /sys/block/DEVICE/queue/max_sectors_kb to 1024 but the I/O sizes in Lustre
> didn''t change.
> 
> Is there any place in Lustre where I can enforce 1MB I/Os? What else can I
do?
> My RAID is connected through an mpt fusion driver and I cannot find any
place
> where the I/O size is limited to 512KB. In fact, without Lustre I am able
to
> write with larger I/Os, so I suspect it''s not the
driver''s fault...
Did you try remounting the OSTs after changing the setting?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Erich Focht

2007-Sep-27 08:40 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

Hi Andreas,

On Thursday 27 September 2007 01:20, Andreas Dilger
wrote:> On Sep 26, 2007  17:50 +0200, Erich Focht wrote:
> > in 512K pieces. Following the Lustre manual I changed
> > /sys/block/DEVICE/queue/max_sectors_kb to 1024 but the I/O sizes in
Lustre
> > didn''t change.
> 
> Did you try remounting the OSTs after changing the setting?
Yes, actually when I changed the settings the lustre modules were not loaded,
yet. Where is the detection of the maximally allowed I/O size done in Lustre?

Thanks,
Erich

Andreas Dilger

2007-Sep-27 08:52 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

On Sep 27, 2007  10:40 +0200, Erich Focht wrote:> On Thursday 27 September 2007 01:20, Andreas Dilger wrote:
> > On Sep 26, 2007  17:50 +0200, Erich Focht wrote:
> > > in 512K pieces. Following the Lustre manual I changed
> > > /sys/block/DEVICE/queue/max_sectors_kb to 1024 but the I/O sizes
in Lustre
> > > didn''t change.
> > 
> > Did you try remounting the OSTs after changing the setting?
> 
> Yes, actually when I changed the settings the lustre modules were not
loaded,
> yet. Where is the detection of the maximally allowed I/O size done in
Lustre?
In fact there isn''t any such detection in Lustre - it will push pages
into
an IO until the block layer tells it to stop.

Please check /proc/fs/lustre/obdfilter/*/brw_stats to see if the IO requests
coming from the client are 1MB in size (256 pages), and if yes then the issue
would likely be in the block layer.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Erich Focht

2007-Sep-27 09:28 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

On Thursday 27 September 2007 10:52, Andreas Dilger wrote:
> In fact there isn''t any such detection in Lustre - it will push
pages into
> an IO until the block layer tells it to stop.
> 
> Please check /proc/fs/lustre/obdfilter/*/brw_stats to see if the IO
requests
> coming from the client are 1MB in size (256 pages), and if yes then the
issue
> would likely be in the block layer.
The output is below. I see 256 pages per transfer. But I also see "disk
fragmented I/Os". Sounds somehow related, but can I influence the
fragmentation?

BTW: I''m running on a RHEL5 system, with noop I/O scheduler. The disks
are
now connected through Emulex FC controllers, but I see the same behavior
with SAS storage attached through LSI Logic HCAs.

                           read      |     write
pages per bulk r/w     rpcs  % cum % |  rpcs  % cum %
256:                     0   0   0   |  955 100 100

                           read      |     write
discontiguous pages    rpcs  % cum % |  rpcs  % cum %
0:                       0   0   0   |  955 100 100

                           read      |     write
discontiguous blocks   rpcs  % cum % |  rpcs  % cum %
0:                       0   0   0   |  955 100 100

                           read      |     write
disk fragmented I/Os   ios   % cum % |  ios   % cum %
2:                       0   0   0   |  955 100 100

                           read      |     write
disk I/Os in flight    ios   % cum % |  ios   % cum %
1:                       0   0   0   |  216  11  11
2:                       0   0   0   |  220  11  22
3:                       0   0   0   |  194  10  32
4:                       0   0   0   |  198  10  43
5:                       0   0   0   |  166   8  52
6:                       0   0   0   |  165   8  60
7:                       0   0   0   |  122   6  67
8:                       0   0   0   |  121   6  73
9:                       0   0   0   |  116   6  79
10:                      0   0   0   |  115   6  85
11:                      0   0   0   |   95   4  90
12:                      0   0   0   |   94   4  95
13:                      0   0   0   |   35   1  97
14:                      0   0   0   |   32   1  98
15:                      0   0   0   |    9   0  99
16:                      0   0   0   |    9   0  99
17:                      0   0   0   |    2   0  99
18:                      0   0   0   |    1   0 100

                           read      |     write
I/O time (1/1000s)     ios   % cum % |  ios   % cum %
4:                       0   0   0   |    3   0   0
8:                       0   0   0   |   17   1   2
16:                      0   0   0   |   98  10  12
32:                      0   0   0   |  326  34  46
64:                      0   0   0   |  370  38  85
128:                     0   0   0   |  129  13  98
256:                     0   0   0   |   10   1  99
512:                     0   0   0   |    2   0 100

                           read      |     write
disk I/O size          ios   % cum % |  ios   % cum %
512K:                    0   0   0   | 1910 100 100


Thanks,
best regards,
Erich


-- 
Dr. Erich Focht
Solution Architecture Group, Linux R&D
NEC High Performance Computing  Europe

Andreas Dilger

2007-Sep-27 10:34 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

On Sep 27, 2007  11:28 +0200, Erich Focht wrote:> On Thursday 27 September 2007 10:52, Andreas Dilger wrote:
> > In fact there isn''t any such detection in Lustre - it will
push pages into
> > an IO until the block layer tells it to stop.
> > 
> > Please check /proc/fs/lustre/obdfilter/*/brw_stats to see if the IO
requests
> > coming from the client are 1MB in size (256 pages), and if yes then
the issue
> > would likely be in the block layer.
> 
> The output is below. I see 256 pages per transfer. But I also see
"disk
> fragmented I/Os". Sounds somehow related, but can I influence the
> fragmentation?
> 
> pages per bulk r/w     rpcs  % cum % |  rpcs  % cum %
> 256:                     0   0   0   |  955 100 100
> 
>                            read      |     write
> disk fragmented I/Os   ios   % cum % |  ios   % cum %
> 2:                       0   0   0   |  955 100 100
> 
>                            read      |     write
> disk I/O size          ios   % cum % |  ios   % cum %
> 512K:                    0   0   0   | 1910 100 100
This generally points to the underlying layer fragmenting the IO, since the
"disk fragmented I/O" counter is only when we can''t add a
page to the exising
bio (see "frags" in lustre/obdfilter/filter_io_26/filter_do_bio()). 
The
culprit is in "can_be_merged()" or "bio_add_page()".

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Erich Focht

2007-Oct-01 12:02 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

Hi Andreas,

On Thursday 27 September 2007 12:34, Andreas Dilger
wrote:> > disk I/O size          ios   % cum % |  ios   % cum %
> > 512K:                    0   0   0   | 1910 100 100
> 
> This generally points to the underlying layer fragmenting the IO, since the
> "disk fragmented I/O" counter is only when we can''t add
a page to the exising
> bio (see "frags" in
lustre/obdfilter/filter_io_26/filter_do_bio()).  The
> culprit is in "can_be_merged()" or "bio_add_page()".
the Lustre debugging messages look like this:
00002000:00000002:3:1191233501.646369:0:15619:0:(filter_io_26.c:339:filter_do_bio())
bio++ sz 524288 vcnt 128(256) sectors 1024(1024) psg 18(128) hsg 18(64)

and are printed by the code:
                                /* Dang! I have to fragment this I/O */
                                CDEBUG(D_INODE, "bio++ sz %d vcnt %d(%d)
"
                                       "sectors %d(%d) psg %d(%d) hsg
%d(%d)\n",
                                       bio->bi_size,
                                       bio->bi_vcnt, bio->bi_max_vecs,
                                       bio->bi_size >> 9,
q->max_sectors,
                                       bio_phys_segments(q, bio),
                                       q->max_phys_segments,
                                       bio_hw_segments(q, bio),
                                       q->max_hw_segments);

This actually suggests that q->max_sectors is 1024, although
/sys/block/sd*/queue/max_sectors_kb is set to 2048 (i.e. the value should be
4096 sectors).

Could this problem come from multipath? It is "assembling" the dm-*
devices
out of the SCSI devices, presents the SCSI devices as "slaves", but
has no
own settings for the queue parameters in /sys/block/dm-*. I tried increasing
the SCSI member devices'' queue max_sectors_kb before starting the
multipathd,
but it didn''t help. Uhmmm, yes, I am using multipath... forgot to
mention
earlier.

Best regards,
Erich

Andreas Dilger

2007-Oct-02 05:10 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

On Oct 01, 2007  14:02 +0200, Erich Focht wrote:> On Thursday 27 September 2007 12:34, Andreas Dilger wrote:
> > > disk I/O size          ios   % cum % |  ios   % cum %
> > > 512K:                    0   0   0   | 1910 100 100
> > 
> > This generally points to the underlying layer fragmenting the IO,
since the
> > "disk fragmented I/O" counter is only when we can''t
add a page to the exising
> > bio (see "frags" in
lustre/obdfilter/filter_io_26/filter_do_bio()).  The
> > culprit is in "can_be_merged()" or
"bio_add_page()".
> 
> the Lustre debugging messages look like this:
>
00002000:00000002:3:1191233501.646369:0:15619:0:(filter_io_26.c:339:filter_do_bio())
bio++ sz 524288 vcnt 128(256) sectors 1024(1024) psg 18(128) hsg 18(64)
> 
> and are printed by the code:
>                                 /* Dang! I have to fragment this I/O */
>                                 CDEBUG(D_INODE, "bio++ sz %d vcnt
%d(%d) "
>                                        "sectors %d(%d) psg %d(%d) hsg
%d(%d)\n",
>                                        bio->bi_size,
>                                        bio->bi_vcnt,
bio->bi_max_vecs,
>                                        bio->bi_size >> 9,
q->max_sectors,
>                                        bio_phys_segments(q, bio),
>                                        q->max_phys_segments,
>                                        bio_hw_segments(q, bio),
>                                        q->max_hw_segments);
> 
> This actually suggests that q->max_sectors is 1024, although
> /sys/block/sd*/queue/max_sectors_kb is set to 2048 (i.e. the value should
be
> 4096 sectors).
> 
> Could this problem come from multipath? It is "assembling" the
dm-* devices
> out of the SCSI devices, presents the SCSI devices as "slaves",
but has no
> own settings for the queue parameters in /sys/block/dm-*. I tried
increasing
> the SCSI member devices'' queue max_sectors_kb before starting the
multipathd,
> but it didn''t help. Uhmmm, yes, I am using multipath... forgot to
mention
> earlier.
It seems entirely possible, I haven''t looked at the multipath code
myself.
It should really build the q->max_* values as the minimum of all of the
underlying devices instead of using a default value.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Erich Focht

2007-Oct-06 11:23 UTC

head link

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

On Tuesday 02 October 2007 07:10, Andreas Dilger wrote:> > Could this problem come from multipath? It is "assembling"
the dm-* devices
> > out of the SCSI devices, presents the SCSI devices as
"slaves", but has no
> > own settings for the queue parameters in /sys/block/dm-*. I tried
increasing
> > the SCSI member devices'' queue max_sectors_kb before starting
the multipathd,
> > but it didn''t help. Uhmmm, yes, I am using multipath...
forgot to mention
> > earlier.
> 
> It seems entirely possible, I haven''t looked at the multipath code
myself.
> It should really build the q->max_* values as the minimum of all of the
> underlying devices instead of using a default value.
Without multipath (mounting the scsi devices directly) the change of
max_sectors_kb has the expected effect: Lustre stops to fragment and I/Os
are done in 1MB chunks. I recon I was unable to find the place where
max_sectors for the multipath queues is set, will continue to look into this.

Regards,
Erich

Lustre discuss - Sep 2007 - Fwd: max_sectors_kb change doesn't help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help

[Lustre-discuss] Fwd: max_sectors_kb change doesn''t help