> I got the usage of --abr wrong before, (28 = 28 bits per second :) > I'm also assuming quality=0/comp=0 produces a reasonable output.Actually, --abr overrides --quality. Otherwise, --quality 0 would be bad quality.> 3 mins, 39 seconds is still a way off realtime, for a 60 second clip, > but it's a lot closer than 1.1.1 got. > > What still worries me though, is that even if I forget the 2 mins spent > in the kernel, emulating the floating point, there's 1 min 29 secs of > userspace time, which is still greater than 60 secs clip. Are compiler > optimisations going to shave off those 29 (or more) secs?Well, once the kernel time is removed, we're only slightly too slow. Then I'm sure there are many possible optimizations. For example, it wouldn't be that hard to convert the code to using the multiply-add functions provided by the chip. Also, the code is made to only use 16-bit multiplication, so all 32 bit times 16 bit multiplications are "emulated" (using two 16 bit mult. and one add). This could be changed to make use of the wider multiplier. At this point, if you want to help, the best way would probably to try tracking done what part of the code is responsible for the high system. Once this is identified, we'll have a much better idea.> I will of course try without ffast-math and funroll-loops, as they can > decrease speed in some circumstances, but i'm open for further suggestions.I don't think it's worth playing with the compile switches yet. Still lots of other stuff to do.> The machine it's running on has an XScale-PXA255 processor at 400MHz, > with a 200MHz bus.I managed to log on the XSCALE 400 on handhelds.org. It helped, but I can't do everything with it (and by attempts at profiling failed). Jean-Marc -- Jean-Marc Valin, M.Sc.A., ing. jr. LABORIUS (http://www.gel.usherb.ca/laborius) Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Ceci est une partie de message numériquement signée Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20031107/1a0b9a3c/signature-0001.pgp
Jean-Marc Valin wrote:> Hi, > > I have replaced most (but not all) of the float operations by integer > operations, but it seems like the remaining ones take a long long time > when emulated in kernel space (hence high system time). The other > problem is that I don't have access to an ARM-based device (anyone wants > to send me one?), so I'm doing all this blind... If you'd like to help, > it can also accelerate things. > > The other thing is that you're probably pushing a bit too much with 44.1 > kHz, which would probably require around 250 MIPS even with the > fixed-point completed. It could probably run in real-time but it would > take lots of CPU and require asm optimizations. I suggest you try > something like 16 kHz. I'm sure the quality will still be enough for you > (much better than MP3 at 24 kbps anyway :) > > Last thing, can you try the code that's in CVS right now? I removed many > float ops since 1.1.1, so it may already work better.Current CVS, on a 16kHz, 16 bit, mono, 60 second sample: # time speexenc -w --quality 0 --abr 28800 --comp 0 -V test-16kHz-60sec.wav test-16kHz-60sec.spx Encoding 16000 Hz audio using wideband (sub-band CELP) mode (mono) Bitrate is use: 35800 bps (average 28564 bps) real 3m39.017s user 1m29.130s sys 2m0.910s I got the usage of --abr wrong before, (28 = 28 bits per second :) I'm also assuming quality=0/comp=0 produces a reasonable output. 3 mins, 39 seconds is still a way off realtime, for a 60 second clip, but it's a lot closer than 1.1.1 got. What still worries me though, is that even if I forget the 2 mins spent in the kernel, emulating the floating point, there's 1 min 29 secs of userspace time, which is still greater than 60 secs clip. Are compiler optimisations going to shave off those 29 (or more) secs? My current configure command is: CFLAGS="-O3 -funroll-loops -ffast-math" CC=arm-linux-gcc CPP=arm-linux-cpp LD=arm-linux-ld RANLIB=arm-linux-ranlib STRIP=arm-linux-strip ./autogen.sh --host=arm-linux --prefix=/usr --with-ogg-dir=/opt/arcom/arm-linux --with-ogg-includes=/opt/arcom/arm-linux/include --with-ogg-lib=/opt/arcom/arm-linux/lib --enable-fixed-point I will of course try without ffast-math and funroll-loops, as they can decrease speed in some circumstances, but i'm open for further suggestions. The machine it's running on has an XScale-PXA255 processor at 400MHz, with a 200MHz bus. Regards, MAL --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin wrote:> At this point, if you want to help, the best way would probably to try > tracking done what part of the code is responsible for the high system. > Once this is identified, we'll have a much better idea. > > I managed to log on the XSCALE 400 on handhelds.org. It helped, but I > can't do everything with it (and by attempts at profiling failed).I don't know how to profile code (yet), but i'm about to go find out. Is it possible to profile the code on my x86 workstation, or does it absolutely have to be run on the machine? ARM emulator anyone? :) MAL --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.