search for: oc_state_loop_filter_frag_rows

Displaying 8 results from an estimated 8 matches for "oc_state_loop_filter_frag_rows".

2005 Aug 17
2
MMX loop filter for theora-exp
Hello, I would like to announce the semi-optimized oc_state_loop_filter_frag_rows It gains like 7% speedup. Unfortunately it has some issues: 1) wont compile on 64bit (I will fix it later hopefully) 2) is not yet fully optimized (instruction stalls) Here are the results. CPU: Athlon, speed 1466.91 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state)...
2008 Jul 07
2
GSoC - Theora multithread decoder
...tation. Let T1 be a video decoded with the parallel implementation. T1 should be at most 0.66To. I will use the pthread implementation to try a pipelined version and see if we obtain more gains. These version will run the functions (c_dec_dc_unpredict_mcu_plane + oc_dec_frags_recon_mcu_plane) and (oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) in parallel. The upper bound for the gain is 60%, that is, let T2 be a video decoded with the pipelined implementation. T2 should be at most 0.4To. Here is the branch for the OpenMP implementation: http://svn.xiph.org/branches/theora_multithread_decode_omp/ Here is th...
2008 Aug 15
1
GSoC - Theora multithread decoder
Hi, This email is to inform what I have been doing since the mid-term. After the mid-term I worked on a pipeline implementation with OpenMP. As I said before I did a pipelined implementation of these functions: (c_dec_dc_unpredict_mcu_plane + oc_dec_frags_recon_mcu_plane) and (oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) as explained in my previous email. But the results were not good. They were equal the implementation without pipeline. http://lampiao.lsc.ic.unicamp.br/~piga/gsoc_2008/comparison.png http://lampiao.lsc.ic.unicamp.br/~piga/gsoc_2008/speedup.png http://lampiao.lsc.ic.un...
2008 Mar 25
0
No subject
...gt;> The results above show that it is not the case. For coarse grain jobs they >> are equivalent >> >>> >>> >>> > These version will run the functions (c_dec_dc_unpredict_mcu_plane + >>> > oc_dec_frags_recon_mcu_plane) and >>> > (oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) in >>> > parallel. The upper bound for the gain is 60%, that is, let T2 be a >>> > video decoded with the pipelined implementation. T2 should be at most >>> > 0.4To. >>> >>> I think you mean "at least". Le...
2005 Aug 20
0
[PATCH] remove some FZIGZAG
...g_recon)(oc_theora_state *_state,const oc_fragment *_frag, - int _pli,ogg_int16_t _dct_coeffs[64],int _last_zzi,int _ncoefs, + int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,int _ncoefs, ogg_uint16_t _dc_iquant,const ogg_uint16_t _ac_iquant[64]); void (*restore_fpu)(void); void (*oc_state_loop_filter_frag_rows)(oc_theora_state *_state,int *_bv, @@ -409,7 +409,7 @@ void oc_state_frag_copy(const oc_theora_state *_state,const int *_fragis, int _nfragis,int _dst_frame,int _src_frame,int _pli); void oc_state_frag_recon(oc_theora_state *_state,const oc_fragment *_frag, - int _pli,ogg_int16_t _dct_coeffs[64...
2005 Mar 23
0
[PATCH]
...tate,const int *_fragis, + +/*void oc_state_frag_copy(const oc_theora_state *_state,const int *_fragis, int _nfragis,int _dst_frame,int _src_frame,int _pli); +*/ int oc_state_loop_filter_init(oc_theora_state *_state,int *_bv); void oc_state_loop_filter(oc_theora_state *_state,int _frame); void oc_state_loop_filter_frag_rows(oc_theora_state *_state,int *_bv, @@ -379,4 +405,45 @@ const char *_suf); #endif +void oc_frag_recon_intra__c(unsigned char *_dst,int _dst_ystride, + const ogg_int16_t *_residue); +void oc_frag_recon_inter__c(unsigned char *_dst,int _dst_ystride, + const unsigned char *_src,int _src_ystride,co...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
...tate,const int *_fragis, + +/*void oc_state_frag_copy(const oc_theora_state *_state,const int *_fragis, int _nfragis,int _dst_frame,int _src_frame,int _pli); +*/ int oc_state_loop_filter_init(oc_theora_state *_state,int *_bv); void oc_state_loop_filter(oc_theora_state *_state,int _frame); void oc_state_loop_filter_frag_rows(oc_theora_state *_state,int *_bv, @@ -379,4 +405,45 @@ const char *_suf); #endif +void oc_frag_recon_intra__c(unsigned char *_dst,int _dst_ystride, + const ogg_int16_t *_residue); +void oc_frag_recon_inter__c(unsigned char *_dst,int _dst_ystride, + const unsigned char *_src,int _src_ystride,co...
2005 Jul 20
1
MMX IDCT for theora-exp
...nt 2000 samples % samples % image name symbol name 124337 22.0173 91089 23.4683 dump theora_decode_packetin 83446 14.7764 114246 29.4345 libc-2.3.2.so (no symbols) 74011 13.1057 33746 8.6944 dump oc_state_loop_filter_frag_rows 57706 10.2185 9204 2.3713 libogg.so.0.5.2 (no symbols) 39182 6.9383 10146 2.6140 dump oc_state_frag_recon_mmx 31095 5.5062 38650 9.9578 dump oc_frag_recon_inter2_mmx 24133 4.2734 12945 3.3352 dump...