thr3ads.net - Btrfs devel - Multi-device update [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Chris Mason

2008-Apr-16 15:34 UTC

Multi-device update

Hello everyone,

I''ve pushed out another set of changes to the unstable trees, and these
include:

* RAID 1+0 support
    mkfs.btrfs -d raid10 -m raid10 /dev/sd...
    4 or more drives required.

* async work queues for checksumming writes
* Better back references in the multi-device data structs

The async work queues include code to checksum data pages without the FS mutex 
held, greatly increasing streaming write throughput.  On my 4 drive system, I 
was getting around 120MB/s writes with checksumming on.  Now I get 180MB/s, 
which is disk speed.

The rest of the week will be spent doing hot add/remove of devices.  Happy 
testing ;)

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andi Kleen

2008-Apr-16 16:14 UTC

head link

Re: Multi-device update

Chris Mason <chris.mason@oracle.com> writes:>
> The async work queues include code to checksum data pages without the FS
mutex
Are they able to distribute work to other cores?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Apr-16 16:54 UTC

head link

Re: Multi-device update

On Wednesday 16 April 2008, Andi Kleen wrote:> Chris Mason <chris.mason@oracle.com> writes:
> > The async work queues include code to checksum data pages without the
FS
> > mutex
>
> Are they able to distribute work to other cores?
Yes, it just uses a workqueue.  The current implemention is pretty simple, it 
surely could be more effective at spreading the work around.

I''m testing a variant that only tosses over to the async queue for
pdflush,
inline reclaim should stay inline.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andi Kleen

2008-Apr-16 17:43 UTC

head link

Re: Multi-device update

Chris Mason <chris.mason@oracle.com> writes:
> On Wednesday 16 April 2008, Andi Kleen wrote:
>> Chris Mason <chris.mason@oracle.com> writes:
>> > The async work queues include code to checksum data pages without
the FS
>> > mutex
>>
>> Are they able to distribute work to other cores?
>
> Yes, it just uses a workqueue. 
Unfortunately work queues don''t do that by default currently. They
tend to process on the current CPU only.
> The current implemention is pretty simple, it 
> surely could be more effective at spreading the work around.
>
> I''m testing a variant that only tosses over to the async queue for
pdflush,
> inline reclaim should stay inline.
Longer term I would hope that write checksum will be basically free by doing
csum-copy at write() time. The only problem is just where to store the
checksum between the write and the final IO? There''s no space in 
struct page.

The same could be also done for read() but that might be a little more
tricky because it would require delayed error reporting and it might
be difficult to do this for partial blocks?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Apr-16 18:04 UTC

head link

Re: Multi-device update

On Wednesday 16 April 2008, Andi Kleen wrote:> Chris Mason <chris.mason@oracle.com> writes:
> > On Wednesday 16 April 2008, Andi Kleen wrote:
> >> Chris Mason <chris.mason@oracle.com> writes:
> >> > The async work queues include code to checksum data pages
without the
> >> > FS mutex
> >>
> >> Are they able to distribute work to other cores?
> >
> > Yes, it just uses a workqueue.
>
> Unfortunately work queues don''t do that by default currently. They
> tend to process on the current CPU only.
Well, I see multiple work queue threads using CPU time, but I haven''t
spent
much time optimizing it.  There''s definitely room for improvement.
>
> > The current implemention is pretty simple, it
> > surely could be more effective at spreading the work around.
> >
> > I''m testing a variant that only tosses over to the async
queue for
> > pdflush, inline reclaim should stay inline.
>
> Longer term I would hope that write checksum will be basically free by
> doing csum-copy at write() time. The only problem is just where to store
> the checksum between the write and the final IO? There''s no space
in
> struct page.
At write time is easier (except for mmap) because I can toss the csum directly 
into the btree inside btrfs_file_write.  The current code avoids that 
complexity and does it all at writeout.

One advantage to the current code is that I''m able to optimize tree
searches
away but checksumming a bunch of pages at a time.  Multiple pages worth of 
checksums get stored in a single btree item, so at least for btree operations 
the current code is fairly optimal.
>
> The same could be also done for read() but that might be a little more
> tricky because it would require delayed error reporting and it might
> be difficult to do this for partial blocks?
Yeah, it doesn''t quite fit with how the kernel does reads.  For now it
is much
easier if the retry-other-mirror operation happens long before copy_to_user.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andi Kleen

2008-Apr-16 18:10 UTC

head link

Re: Multi-device update

Chris Mason wrote:> On Wednesday 16 April 2008, Andi Kleen wrote:
>> Chris Mason <chris.mason@oracle.com> writes:
>>> On Wednesday 16 April 2008, Andi Kleen wrote:
>>>> Chris Mason <chris.mason@oracle.com> writes:
>>>>> The async work queues include code to checksum data pages
without the
>>>>> FS mutex
>>>> Are they able to distribute work to other cores?
>>> Yes, it just uses a workqueue.
>> Unfortunately work queues don''t do that by default currently.
They
>> tend to process on the current CPU only.
> 
> Well, I see multiple work queue threads using CPU time, but I
haven''t spent
> much time optimizing it.  There''s definitely room for improvement.
That''s likely because you submit from multiple CPUs. But with a single
submitter running on a single CPU there shouldn''t be any load balancing
currently.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2008-Apr-16 18:14 UTC

head link

Re: Multi-device update

On Wed, Apr 16 2008, Andi Kleen wrote:> Chris Mason wrote:
> > On Wednesday 16 April 2008, Andi Kleen wrote:
> >> Chris Mason <chris.mason@oracle.com> writes:
> >>> On Wednesday 16 April 2008, Andi Kleen wrote:
> >>>> Chris Mason <chris.mason@oracle.com> writes:
> >>>>> The async work queues include code to checksum data
pages without the
> >>>>> FS mutex
> >>>> Are they able to distribute work to other cores?
> >>> Yes, it just uses a workqueue.
> >> Unfortunately work queues don''t do that by default
currently. They
> >> tend to process on the current CPU only.
> > 
> > Well, I see multiple work queue threads using CPU time, but I
haven''t spent
> > much time optimizing it.  There''s definitely room for
improvement.
> 
> That''s likely because you submit from multiple CPUs. But with a
single
> submitter running on a single CPU there shouldn''t be any load
balancing
> currently.
There have been various implementations of queue_work_on() posted
through the years, I''ve had one version that I''ve used off and
on for a
long time:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=c68c42fd6df96f5b3fb5b8b47c571f233d054c71

then you need some balancing decider on top of that, of course.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andi Kleen

2008-Apr-16 18:24 UTC

head link

Re: Multi-device update

> There have been various implementations of queue_work_on() posted
> through the years, I''ve had one version that I''ve used
off and on for a
> long time:
queue_work_on is the wrong interface I think. You rather
want a pool of non pinned threads that are then load balanced by the
scheduler (who knows best what cpus have cycles available)

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2008-Apr-16 18:26 UTC

head link

Re: Multi-device update

On Wed, Apr 16 2008, Andi Kleen wrote:> 
> > There have been various implementations of queue_work_on() posted
> > through the years, I''ve had one version that I''ve
used off and on for a
> > long time:
> 
> queue_work_on is the wrong interface I think. You rather
> want a pool of non pinned threads that are then load balanced by the
> scheduler (who knows best what cpus have cycles available)
Yeah, that actually sounds like the best interface. What I described
typically ends up trying to be too clever, you really want to leave any
scheduling decisions to the scheduler.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Apr-16 18:28 UTC

head link

Re: Multi-device update

On Wednesday 16 April 2008, Andi Kleen wrote:> > There have been various implementations of queue_work_on() posted
> > through the years, I''ve had one version that I''ve
used off and on for a
> > long time:
>
> queue_work_on is the wrong interface I think. You rather
> want a pool of non pinned threads that are then load balanced by the
> scheduler (who knows best what cpus have cycles available)
Fair enough, I''ll tune things a bit.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Apr 2008 - Multi-device update

Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update

Re: Multi-device update