thr3ads.net - flac dev - [Flac-dev] Proof-of-concept multithreaded FLAC encoder [May 2008]

If this information is useful, please help other people find it:
Share via:

Frederick Akalin

2008-May-06 18:31 UTC

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

Hey FLAC devs,

I managed to hack out a proof-of-concept multithreaded FLAC encoder
based on the example libFLAC one.  It turned out to be fairly
straightforward to get near-linear speedup; I can encode a 636 MB wave
file in 6.8s with 8 threads on an 8-core 3.0 GHz Xeon vs. 31.4s with a
single thread.

Basically I mmap() the input file, divide up the mmap()ed region into
nearly equal pieces, parcel them out to encode threads, and write out
the output chunks (asynchronously) in order as soon as they're ready.
I had to hack up libFLAC a bit:

- I exposed update_metadata_() and FLAC__stream_encoder_set_do_md5().
- I added the function FLAC__stream_encoder_set_current_frame_number().

I turned off md5 calculation in the example libFLAC encoder for ease
of comparison and also verification as it interacts badly (crashes)
with FLAC__stream_encoder_set_current_frame_number().  I also zero out
the min/max frame size fields in the metadata, which should be the only
byte difference between the files the multithreaded encoder outputs and
the ones the example libFLAC one outputs (well, and md5).

Patch file for flac 1.2.1 or CVS: http://www.akalin.cx/patch-libFLAC.in
Source file for multithreaded FLAC encoder: http://www.akalin.cx/mt_encode.c

mt_encode.c should compile on gcc 4.x with -Wall -Werror -g -O2 -ansi .
Usage is simply "mt_encode.c input.wav output.flac [num_threads]".

Of course, I'm not suggesting that the patch file above be committed;
it was just to get the proof-of-concept multithreaded encoder working
with a minimum of fuss.  However, looking at the earlier discussion on
parallelizing flac (
http://lists.xiph.org/pipermail/flac-dev/2007-September/002312.html )
I think there are a number of misconceptions floating around.  I don't
think the existing encoding APIs (stream_encoder.h) should be
retrofitted to support multithreading, but maybe a separate file- or
block-based API could be written that shares most of the code with the
existing stream APIs.  I agree that the stream-based API should always
be primary, but there's no reason that it couldn't co-exist with an
alternate parallel-friendly API.  As an end-result I envision a
utility similar to the flac command-line utility (flac-mt?) that
supports multithreading, but with only a subset of the functionality
of flac, according to feasability and demand.

As a first step, a function that simply encodes a block of memory to
an output buffer with a given frame number etc. would be helpful, as
well a simple function to build and write out a streaminfo metadata
block to memory.  Maybe also exposing FLAC's implementation of md5
calculation in the form of a function that md5s a memory block, too.
I might have time to hammer something out for review.  What do you
guys think?

--
Frederick Akalin
http://www.akalin.cx

Brian Willoughby

2008-May-06 21:03 UTC

head link

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

Frederick,

This is great news!  Thanks for your effort.

Your proof-of-concept raises a few questions for me:

1) I know that the ratio of uncompressed to compressed data is  
unpredictable, but I never really considered whether the input block  
size or the output block size is constant.  I'm assuming that if  
you're breaking the uncompressed input file into multiple pieces,  
then the uncompressed block size must be constant, while the output  
block size becomes variable.  I suppose the FLAC format allows for  
some variation in block size anyway.  My question (finally) is: Are  
there any anomalies at the points in the stream where you concatenate  
the compressed blocks from the multiple threads?  Do you end up with  
partial blocks in order to fit things together?

2) How hard would it be to enable md5 calculation?

3) Do you accelerate decode as well as encode?  I'm thinking that the  
variable block size would require each thread to scan its block for  
the start of a new block header, and also continue processing past  
the end of its region to grab any partial block skipped by the  
adjacent thread.  Scanning the entire file for block offsets would  
probably expend a significant amount of time, so it might be better  
for each thread to scan within its own region.  There's probably no  
way to weave the output together, though, until all threads have  
completed - at least the ones processing earlier parts of the time line.

I agree that it would be useful to have a parallel file- or block- 
based API.  It would be more effort to maintain, but there is a gain  
for the added work, now that multiple processors is more common, even  
on laptops.  The flac-mt command-line also seems like a good idea.

Brian Willoughby
Sound Consulting

On May 6, 2008, at 11:31, Frederick Akalin wrote:> I managed to hack out a proof-of-concept multithreaded FLAC encoder
> based on the example libFLAC one.  It turned out to be fairly
> straightforward to get near-linear speedup; I can encode a 636 MB wave
> file in 6.8s with 8 threads on an 8-core 3.0 GHz Xeon vs. 31.4s with a
> single thread.
>
> Basically I mmap() the input file, divide up the mmap()ed region into
> nearly equal pieces, parcel them out to encode threads, and write out
> the output chunks (asynchronously) in order as soon as they're ready.
>
> I turned off md5 calculation in the example libFLAC encoder for ease
> of comparison and also verification as it interacts badly (crashes)
> with FLAC__stream_encoder_set_current_frame_number().  I also zero out
> the min/max frame size fields in the metadata, which should be the  
> only
> byte difference between the files the multithreaded encoder outputs  
> and
> the ones the example libFLAC one outputs (well, and md5).
>
> I think there are a number of misconceptions floating around.  I don't
> think the existing encoding APIs (stream_encoder.h) should be
> retrofitted to support multithreading, but maybe a separate file- or
> block-based API could be written that shares most of the code with the
> existing stream APIs.  I agree that the stream-based API should always
> be primary, but there's no reason that it couldn't co-exist with an
> alternate parallel-friendly API.  As an end-result I envision a
> utility similar to the flac command-line utility (flac-mt?) that
> supports multithreading, but with only a subset of the functionality
> of flac, according to feasability and demand.
>
> As a first step, a function that simply encodes a block of memory to
> an output buffer with a given frame number etc. would be helpful, as
> well a simple function to build and write out a streaminfo metadata
> block to memory.  Maybe also exposing FLAC's implementation of md5
> calculation in the form of a function that md5s a memory block, too.
> I might have time to hammer something out for review.  What do you
> guys think?

Frederick Akalin

2008-May-06 21:20 UTC

head link

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

On Tue, May 6, 2008 at 2:03 PM, Brian Willoughby <brianw at sounds.wa.com>
wrote:> Frederick,
>
>  This is great news!  Thanks for your effort.
>
>  Your proof-of-concept raises a few questions for me:
>
>  1) I know that the ratio of uncompressed to compressed data is
> unpredictable, but I never really considered whether the input block size
or
> the output block size is constant.  I'm assuming that if you're
breaking the
> uncompressed input file into multiple pieces, then the uncompressed block
> size must be constant, while the output block size becomes variable.  I
> suppose the FLAC format allows for some variation in block size anyway.  My
> question (finally) is: Are there any anomalies at the points in the stream
> where you concatenate the compressed blocks from the multiple threads?  Do
> you end up with partial blocks in order to fit things together?
Yes, the input block size is constant (for regular wavs at least)
whereas the output block size is not.  There are no anomalies in the
concatenation as I make sure the input file is divided up on block
boundaries.
>  2) How hard would it be to enable md5 calculation?
Not hard, I don't think.  Since md5 is not really parallelizable I
think it suffices to spawn off a separate thread to calculate the md5
for the whole file at the same time as the encode, then wait for that
thread too and fire off a separate write request for the metadata.
>  3) Do you accelerate decode as well as encode?  I'm thinking that the
> variable block size would require each thread to scan its block for the
> start of a new block header, and also continue processing past the end of
> its region to grab any partial block skipped by the adjacent thread.
> Scanning the entire file for block offsets would probably expend a
> significant amount of time, so it might be better for each thread to scan
> within its own region.  There's probably no way to weave the output
> together, though, until all threads have completed - at least the ones
> processing earlier parts of the time line.
I looked into it briefly, but haven't had time to get it working.  The
tricky part is dividing up the input file equally given the variable
block size and the (potential) lack of a seektable block.  The best
solution might just be to scan the output file and push (frame number,
byte offset, byte size) onto a queue which the decoding threads would
pop off, but that's a little more complicated.  I'm still mulling it
over. :)
>  I agree that it would be useful to have a parallel file- or block-based
> API.  It would be more effort to maintain, but there is a gain for the
added
> work, now that multiple processors is more common, even on laptops.  The
> flac-mt command-line also seems like a good idea.
Cool!  I'll start working on something then.

-- 
Frederick Akalin
http://www.akalin.cx

Frederick Akalin

2008-May-06 23:32 UTC

head link

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

On Tue, May 6, 2008 at 3:51 PM, Harry Sack <tranzedude at gmail.com>
wrote:> to Brian: i remember you were laughing at me when i told multiple
>  threads could speed it up a lot. you even called it 'nearly no
>  speedup'
>
>  here is the proof now ;)
>
>  i think this proves how much you know about audio encoding ...
>
>  ps: i had been waiting for this day to tell you you were wrong based
>  on someone's multi-threaded encoder ;)
>
>  Harry
>
Okay, I really don't want to turn this into a pissing contest,
especially since Brian ended up agreeing with you, didn't he?  Let's
try to keep personal attacks out of this.

-- 
Frederick Akalin
http://www.akalin.cx

Seemingly Similar Threads

Search for more maybe matching threads

flac dev - May 2008 - Proof-of-concept multithreaded FLAC encoder

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

[Flac-dev] Proof-of-concept multithreaded FLAC encoder

Seemingly Similar Threads