When we talk about real-time we need real-time decoder but non-real-time encoder (rather much faster) so optimization is needed. Since the future is in parallelism developers should count this in. The referential code should be written rather with parallel instructions which can be easily ported to arbitrary scalar or vector processor. When it's in integers we use MMX, when it's in floats we use SSE. When there's no vector instructions available we use scalar instruction on each item. This keeps high maintainability thus the optimization code branches are not needed. Optimization means only adding some funtions which are written in asm and are few lines long. The rest (reference code) will stay in C. If the code was written without thinking of parallel processing then the code should be rewritten in C and after that asm optimization steps in. Scalar asm optimization should be handled only by compiler. Spirit