thr3ads.net - ogg dev - [ogg-dev] OggPCM2: channel map [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Erik de Castro Lopo

2005-Nov-17 02:00 UTC

[ogg-dev] OggPCM2 : chunked vs interleaved data

Sampo Syreeni wrote:
> Secondly, I'd like to see the channel map fleshed out in more detail. 
Sampo,

I did flesh out the wiki a **little** more. Is the intent clearer now?
> (Beware of the pet peeve...)
What is that pet peeve?
> IMO the mapping should cover at least the 
> channel assignments possible in WAVE files, the most common Ambisonic 
> ones, and perhaps some added channel interpretations like
"surround"
> which are commonly used but lacking in most file formats.
I haven't enumerated them all, but we should be able to without too
much trouble,

 (For example, > THX does not treat surround as a directional source, so the correct 
> semantics cannot be captured e.g. by WAVE files.
Do you have any more info about THX? I've searched the web and found
little of any worth.
> (As a further idea prompted by ambisonic compatibility encodings, I'd 
> also like to explore the possibility of multiple tagging. For example, 
> Dolby Surround, Circle Surround, Logic 7 and ambisonic BHJ are all 
> designed to be stereo compatible so that a legacy decoder can play them 
> as-is.
Does the Channel Conversion Header cover this?

Cheers,
Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"The lusers I know are so clueless, that if they were dipped in
 clue musk and dropped in the middle of pack of horny clues, on
 clue prom night during clue happy hour, they still couldn't get
 a clue."   --Michael Girdwood, in the monastery

Sampo Syreeni

2005-Nov-17 04:55 UTC

head link

[ogg-dev] OggPCM2: channel map

On 2005-11-17, Erik de Castro Lopo wrote:
> I did flesh out the wiki a **little** more. Is the intent clearer now?
Yes. Channel map type tells us what the primary interpretation of the 
stored signals is. Channel definitions are there to tell which stored 
channel corresponds to which abstract channel in the type. Channel 
conversions define downmixes to secondary formats, as they do in MLP, 
and might end up being ignored unlike the channel map.

In theory the channel conversion header suffices for compatibility 
coding, but in practice I'm not quite sure that the primary target of 
such codings -- legacy players -- will implement the feature. In that 
case the compatibility might prove illusory.

I'm also not entirely sure that the coding chosen for the channel 
definitions is the best one. Typically we'd expect each type of channel 
map to contain all and nothing but the channel definitions typically 
used with that map type, in some order. For example L, C, R, Ls, Rs and 
LFE for 5.1. If so, all we're really trying to encode is the 
interleaving order. After that we have to ask whether that option is 
really necessary (fixed channel orders are a real possibility, 
especially since we're not encapsulating an existing format but defining 
a new streamable one which will necessitate some copying around in any 
case, and because some unnecessary options were already dropped for 
simplicity; plus of course the channel conversion headers enable channel 
permutations as well) and whether this is the best encoding for it 
(permutations can be coded with less redundancy and room for error). If 
the idea is to enable subsetting (e.g. 5.1 with a missing LFE equals 
5.0) then something like WAVE's channel mask seems a better alternative. 
The format also doesn't stop us from defining two left channels for 
stereo, while it does seem to be trying to limit possibilities of error 
by defining the channel types separately for each map (e.g. no 
OGG_CHANNEL_AMBISONIC_W inside a stereo channel map). Unfortunately, in 
the process it could end up with combinatorial explosion in the channel 
type enumerations (i.e. we might end up redefining L, R, C, etc. for 
each multichannel map type, of which there are a lot).

So, how about a slight change in emphasis? Currently we have two types 
of channel semantics headers, one for the primary interpretation of the 
stream and one for downmixing to secondary formats. Why not redefine 
them so that both are bona fide channel maps and many such maps are 
allowed (say, in descending preferential order), but only one type comes 
with a conversion matrix (can't handle linear algebra? just skip to the 
next map; matrices can also implement arbitrary channel selections and 
permutations so in this case a separate channel map is not needed) and 
each map carries an assignment array with the precise number of channels 
the map expects (6 for 5.1, 2 for stereo, etc.), used to refer to the 
physical channels by order number. In pseudocode,

#def MAP_MONO:=1;
#def N_MONO:=1;
#def MONO_FRONT:=1;

#def MAP_STEREO:=2;
#def N_STEREO:=2;
#def STEREO_L:=1;
#def STEREO_R:=2;

#def MAP_UHJ:=3;
#def N_UHJ:=4;
#def UHJ_SIGMA:=1;
#def UHJ_DELTA:=2;
#def UHJ_T:=3;
#def UHJ_Q:=4;

header:  (n_chn:=3;                    // three channels are stored
   maps:    (
     (map_type:=simple;         // no matrix, the most preferred choice
      channel_type:=MAP_UHJ;    // implies that the map has N_UHJ==4 entries
      map[UHJ_SIGMA]:=1;        // SIGMA is stored in the first physical channel
      map[UHJ_DELTA]:=2;
      map[UHJ_T]:=3;
      map[UHJ_Q]:=0;),          // the fourth channel is not physically present
     (map_type:=complex;        // matrixing needed to go from m/s to l/r
      channel_type:=MAP_STEREO;
      matrix:=                  // dimensions come from n_chn and N_STEREO
       (1,  1,                  // run of the mill sum/diff matrix
        1, -1,
        0,  0)),
     (map_type:=simple;
      channel_type:=MAP_MONO;   // mono fallback; support could be mandatory
      map[MONO_FRONT]:=1;)      // seems stupid but comes in handy if the
                                // single mono compatible channel happens
                                // to not be the first one stored
    )
  )

Decoding such a structure is trivial: just skip to the first map you 
understand. Simple decoders need to know nothing about matrices, but 
compatibility encodings will still work. Some stupid assignments can 
still be made, but not as easily. If we want to be even stricter, we can 
drop the channel map from simple encodings and require a fixed channel 
order in this case; this would ease up implementation (cf. your comments 
on generating code on the fly). No functionality is lost, evenwhile it 
can be argued that the structure is simplified conceptually. Unknown 
primary interpretations (say, channel maps with angle-elevation 
specified sources) can be added without compromising compatibility (they 
will simply be skipped whereas in the current format they would cause 
na?ve decoders to reject the file). How does it sound?
>> (Beware of the pet peeve...)
>
> What is that pet peeve?
Umm... Roughly file formats which prove rigid in practice even after 
they've been declared extensible.
> I haven't enumerated them all, but we should be able to without too
> much trouble,
Want me to start a list in the Wiki?
> Do you have any more info about THX? I've searched the web and found 
> little of any worth.
I used to have, but I may have misplaced the specs. The main idea is 
that in THX, the surround channel is supposed to be spatially diffuse. 
It is not recreated with directional sources at the back, but by 
utilizing dipole speakers, room reflections, multiple sources or even 
explicit allpass decorrelators. But I'm not quite sure what the overall 
spatial distribution of the surround field is supposed to be. Hopefully 
I can find something concrete on it.
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Jean-Marc Valin

2005-Nov-17 21:20 UTC

head link

[ogg-dev] OggPCM2: channel map

> Yes. Channel map type tells us what the primary interpretation of the 
> stored signals is. Channel definitions are there to tell which stored 
> channel corresponds to which abstract channel in the type. Channel 
> conversions define downmixes to secondary formats, as they do in MLP, 
> and might end up being ignored unlike the channel map.
I think the channel conversion will not be used "most of the times",
but
it's probably going to be useful for at least a couple apps. The fact
that it's optional means that it doesn't hurt anyway and those who
don't
understand it will probably ignore it instead of putting wrong data in
it.
> In theory the channel conversion header suffices for compatibility 
> coding, but in practice I'm not quite sure that the primary target of 
> such codings -- legacy players -- will implement the feature. In that 
> case the compatibility might prove illusory.
It's not that much about "legacy players". It can be useful for
any
player. Say you have a 5.1 PCM OggPCM stream. If that file has a
conversion header, it means that xmms (or any other player) will be able
to play it in stereo without doing anything stupid.
> I'm also not entirely sure that the coding chosen for the channel 
> definitions is the best one. Typically we'd expect each type of channel
> map to contain all and nothing but the channel definitions typically 
> used with that map type, in some order. For example L, C, R, Ls, Rs and 
> LFE for 5.1. If so, all we're really trying to encode is the 
> interleaving order. After that we have to ask whether that option is 
> really necessary (fixed channel orders are a real possibility, 
> especially since we're not encapsulating an existing format but
defining
> a new streamable one which will necessitate some copying around in any 
> case, and because some unnecessary options were already dropped for 
> simplicity; plus of course the channel conversion headers enable channel 
> permutations as well) and whether this is the best encoding for it 
> (permutations can be coded with less redundancy and room for error). If 
> the idea is to enable subsetting (e.g. 5.1 with a missing LFE equals 
> 5.0) then something like WAVE's channel mask seems a better
alternative.
> The format also doesn't stop us from defining two left channels for 
> stereo, while it does seem to be trying to limit possibilities of error 
> by defining the channel types separately for each map (e.g. no 
> OGG_CHANNEL_AMBISONIC_W inside a stereo channel map). Unfortunately, in 
> the process it could end up with combinatorial explosion in the channel 
> type enumerations (i.e. we might end up redefining L, R, C, etc. for 
> each multichannel map type, of which there are a lot).
One idea I had for this is that there should be a default mapping for
all channel mappings and a default channel mapping for the most used
number of channels. For example, we would say that unless you include a
mapping header, then a 2-channel file is stereo with left channel
interleaved before right. I've added the idea at the bottom of
http://wiki.xiph.org/index.php/OggPCM2 . Any thoughts on this?

	Jean-Marc

Possibly Parallel Threads

Search for more maybe matching threads

ogg dev - Nov 2005 - OggPCM2: channel map

[ogg-dev] OggPCM2 : chunked vs interleaved data

[ogg-dev] OggPCM2: channel map

[ogg-dev] OggPCM2: channel map

Possibly Parallel Threads