Stéphane Letz
2007-Jun-16 20:00 UTC
[LLVMdev] Runtime optimization of C++ code with virtual functions
Let's say we have the following scheme using C++ and virtual functions: class DSP { public: DSP() {} virtual ~DSP() {} virtual int Compute(int count, float** in, float** out) = 0; }; class CONCRETE_DSP : public DSP { public: CONCRETE_DSP():fValue() {} virtual ~CONCRETE_DSP() {} virtual int Compute(int count, float** in, float** out) { DoSomeProcess(); } }; class SEQ_DDSP : public DSP { private: DSP* fArg1; DSP* fArg2; public: SEQ_DDSP(DSP* a1, DSP* a2):fArg1(a1), fArg2(a2) {} virtual~SEQ_DDSP() {delete fArg1; delete fArg2;} virtual int Compute(int count, float** in, float** out) { // Some code that uses: fArg1->Compute(count, in, out); fArg2->Compute(count, in, out); } }; class PAR_DSP : public DSP { private: DSP* fArg1; DSP* fArg2; public: PAR_DSP(DSP* a1, DSP* a2):fArg1(a1), fArg2(a2) {} virtual~PAR_DSP() {delete fArg1; delete fArg2;} virtual int Compute(int count, float** in, float** out) { // Some code that uses: fArg1->Compute(count, in, out); fArg2->Compute(count, in, out); } }; void ProcessGraph (float** in, float** out) { DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new CONCRETE_DSP()), new CONCRETE_DSP()); graph->Compute(512, in, out); delete graph; } At runtime after a graph is created, one could imagine optimizing by resolving call to "virtual Compute" and possibly get a more efficient Compute method for the entire graph, so that we could write: DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new CONCRETE_DSP()), new CONCRETE_DSP()); graph->Optimize(); graph->Compute(512, in, out); possibly a lot of time. Is there any possible method using LLVM that would help in this case? Thanks Stephane Letz
Chris Lattner
2007-Jun-19 05:43 UTC
[LLVMdev] Runtime optimization of C++ code with virtual functions
On Sat, 16 Jun 2007, [ISO-8859-1] St�phane Letz wrote:> At runtime after a graph is created, one could imagine optimizing by > resolving call to "virtual Compute" and possibly get a more > efficient Compute method for the entire graph, so that we could write: > > DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new > CONCRETE_DSP()), new CONCRETE_DSP()); > > graph->Optimize(); > > graph->Compute(512, in, out); possibly a lot of time. > > Is there any possible method using LLVM that would help in this case?LLVM won't help in this case. However, I'd strongly recommend dropping the virtual functions and using template instantiation to get this. That way you'd do something like: PAR_DSP<SEQ_DDSP<CONCRETE_DSP, CONCRETE_DSP>, CONCRETE_DSP> X; X->Compute(512, in, out); This will be efficient even when statically compiled. -Chris -- http://nondot.org/sabre/ http://llvm.org/
Maurizio Vitale
2007-Jun-20 23:01 UTC
[LLVMdev] Runtime optimization of C++ code with virtual functions
On Jun 19, 2007, at 1:43 AM, Chris Lattner wrote:> On Sat, 16 Jun 2007, [ISO-8859-1] Stéphane Letz wrote: >> At runtime after a graph is created, one could imagine optimizing by >> resolving call to "virtual Compute" and possibly get a more >> efficient Compute method for the entire graph, so that we could >> write: >> >> DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new >> CONCRETE_DSP()), new CONCRETE_DSP()); >> >> graph->Optimize(); >> >> graph->Compute(512, in, out); possibly a lot of time. >> >> Is there any possible method using LLVM that would help in this case? > > LLVM won't help in this case.Is that so or it means that LLVM wouldn't have a prebuilt solution? I'm asking because (without having ever looked seriously into LLVM) I was thinking to experiment along these lines: class Source { void send (T data) { invoke_jit_magic(); transport (data); } } transport() would be a virtual method like the original posting. In my case send() would be part of the framework, so it is not a problem to add the invoke_jit_magic. In other case it might be trickier. On the first call, invoke_jit_magic gains control, traverse the binary converting (a subset of) what it finds to LLVM IR, until it gets to the concrete target. It may have to do a bit of work to understand how parameters are passed to the transport code (it is a virtual function call and might be messy in presence of multiple/virtual inheritance. After that LLVM jit can be used to replace the original binary fragment with something faster. I agree with the suggestion of using templates when possible. In my case it is not doable because transport would be propietary and the code containing it distributed only as binary. I understand that the disassemblying portion need to be rewritten. Is there anything else that would prevent this approach from working? Again, haven't looked into LLVM yet, so I can immagine there might be problems in describing physical registers in the IR and at some point stuff must be exactly where the pre-existing code expects it. I don;t want to take your time, but if you could elaborate a bit it might prevent me from going down the wrong path. Best regards, Maurizio