Displaying 8 results from an estimated 8 matches for "oc_state_loop_filter_frag_row".
Did you mean:
oc_state_loop_filter_frag_rows
2005 Aug 17
2
MMX loop filter for theora-exp
Hello,
I would like to announce the semi-optimized oc_state_loop_filter_frag_rows
It gains like 7% speedup. Unfortunately it has some issues:
1) wont compile on 64bit (I will fix it later hopefully)
2) is not yet fully optimized (instruction stalls)
Here are the results.
CPU: Athlon, speed 1466.91 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state)...
2008 Jul 07
2
GSoC - Theora multithread decoder
...tation. Let T1 be a video decoded with the parallel implementation.
T1 should be at most 0.66To.
I will use the pthread implementation to try a pipelined version and see if
we obtain more gains.
These version will run the functions (c_dec_dc_unpredict_mcu_plane +
oc_dec_frags_recon_mcu_plane) and
(oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) in parallel.
The upper bound for the gain is 60%, that is, let T2 be a video decoded with
the pipelined implementation. T2 should be at most 0.4To.
Here is the branch for the OpenMP implementation:
http://svn.xiph.org/branches/theora_multithread_decode_omp/
Here is t...
2008 Aug 15
1
GSoC - Theora multithread decoder
Hi,
This email is to inform what I have been doing since the mid-term.
After the mid-term I worked on a pipeline implementation with OpenMP.
As I said before I did a pipelined implementation of these functions:
(c_dec_dc_unpredict_mcu_plane + oc_dec_frags_recon_mcu_plane) and
(oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) as
explained in my previous email.
But the results were not good. They were equal the implementation
without pipeline.
http://lampiao.lsc.ic.unicamp.br/~piga/gsoc_2008/comparison.png
http://lampiao.lsc.ic.unicamp.br/~piga/gsoc_2008/speedup.png
http://lampiao.lsc.ic.u...
2008 Mar 25
0
No subject
...gt;> The results above show that it is not the case. For coarse grain jobs they
>> are equivalent
>>
>>>
>>>
>>> > These version will run the functions (c_dec_dc_unpredict_mcu_plane +
>>> > oc_dec_frags_recon_mcu_plane) and
>>> > (oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) in
>>> > parallel. The upper bound for the gain is 60%, that is, let T2 be a
>>> > video decoded with the pipelined implementation. T2 should be at most
>>> > 0.4To.
>>>
>>> I think you mean "at least". L...
2005 Aug 20
0
[PATCH] remove some FZIGZAG
...g_recon)(oc_theora_state *_state,const oc_fragment *_frag,
- int _pli,ogg_int16_t _dct_coeffs[64],int _last_zzi,int _ncoefs,
+ int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,int _ncoefs,
ogg_uint16_t _dc_iquant,const ogg_uint16_t _ac_iquant[64]);
void (*restore_fpu)(void);
void (*oc_state_loop_filter_frag_rows)(oc_theora_state *_state,int *_bv,
@@ -409,7 +409,7 @@
void oc_state_frag_copy(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli);
void oc_state_frag_recon(oc_theora_state *_state,const oc_fragment *_frag,
- int _pli,ogg_int16_t _dct_coeffs[6...
2005 Mar 23
0
[PATCH]
...tate,const int *_fragis,
+
+/*void oc_state_frag_copy(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli);
+*/
int oc_state_loop_filter_init(oc_theora_state *_state,int *_bv);
void oc_state_loop_filter(oc_theora_state *_state,int _frame);
void oc_state_loop_filter_frag_rows(oc_theora_state *_state,int *_bv,
@@ -379,4 +405,45 @@
const char *_suf);
#endif
+void oc_frag_recon_intra__c(unsigned char *_dst,int _dst_ystride,
+ const ogg_int16_t *_residue);
+void oc_frag_recon_inter__c(unsigned char *_dst,int _dst_ystride,
+ const unsigned char *_src,int _src_ystride,c...
2005 Mar 23
3
[PATCH] promised MMX patches rc1
...tate,const int *_fragis,
+
+/*void oc_state_frag_copy(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli);
+*/
int oc_state_loop_filter_init(oc_theora_state *_state,int *_bv);
void oc_state_loop_filter(oc_theora_state *_state,int _frame);
void oc_state_loop_filter_frag_rows(oc_theora_state *_state,int *_bv,
@@ -379,4 +405,45 @@
const char *_suf);
#endif
+void oc_frag_recon_intra__c(unsigned char *_dst,int _dst_ystride,
+ const ogg_int16_t *_residue);
+void oc_frag_recon_inter__c(unsigned char *_dst,int _dst_ystride,
+ const unsigned char *_src,int _src_ystride,c...
2005 Jul 20
1
MMX IDCT for theora-exp
...nt 2000
samples % samples % image name symbol name
124337 22.0173 91089 23.4683 dump theora_decode_packetin
83446 14.7764 114246 29.4345 libc-2.3.2.so (no symbols)
74011 13.1057 33746 8.6944 dump oc_state_loop_filter_frag_rows
57706 10.2185 9204 2.3713 libogg.so.0.5.2 (no symbols)
39182 6.9383 10146 2.6140 dump oc_state_frag_recon_mmx
31095 5.5062 38650 9.9578 dump oc_frag_recon_inter2_mmx
24133 4.2734 12945 3.3352 dump...