thr3ads.net - llvm dev - [LLVMdev] Runtime optimization of C++ code with virtual functions [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Stéphane Letz

2007-Jun-16 20:00 UTC

[LLVMdev] Runtime optimization of C++ code with virtual functions

Let's say we have the following scheme using C++ and virtual functions:


class DSP {

  public:

	DSP() {}
	virtual ~DSP() {}

	virtual int Compute(int count, float** in, float** out) = 0;
};

class CONCRETE_DSP : public DSP {

  public:

	CONCRETE_DSP():fValue() {}
	virtual ~CONCRETE_DSP() {}

	virtual int Compute(int count, float** in, float** out)
	{
		DoSomeProcess();
	}
};
			
		
class SEQ_DDSP : public DSP {

   private:
	
	DSP* 	fArg1;
	DSP* 	fArg2;
	
   public:

	SEQ_DDSP(DSP* a1, DSP* a2):fArg1(a1), fArg2(a2) {}
	virtual~SEQ_DDSP() {delete fArg1; delete fArg2;}
	
	virtual int Compute(int count, float** in, float** out)
	{
		// Some code that uses:
		fArg1->Compute(count, in, out);
		fArg2->Compute(count, in, out);
	}
};

class PAR_DSP : public DSP {

   private:
	
	DSP* 	fArg1;
	DSP* 	fArg2;
	
   public:

	PAR_DSP(DSP* a1, DSP* a2):fArg1(a1), fArg2(a2) {}
	virtual~PAR_DSP() {delete fArg1; delete fArg2;}
	
	virtual int Compute(int count, float** in, float** out)
	{
		// Some code that uses:
		fArg1->Compute(count, in, out);
		fArg2->Compute(count, in, out);
	}

};


void ProcessGraph (float** in, float** out)
{
	DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new  
CONCRETE_DSP()), new CONCRETE_DSP());
	graph->Compute(512, in, out);
	delete graph;
}

At runtime after a graph is created, one could imagine optimizing by   
resolving call to "virtual  Compute" and possibly get a more  
efficient Compute method for the entire graph, so that we could write:

DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new  
CONCRETE_DSP()), new CONCRETE_DSP());

graph->Optimize();

graph->Compute(512, in, out); possibly a lot of time.

Is there any possible method using LLVM that would help in this case?

Thanks

Stephane Letz

Chris Lattner

2007-Jun-19 05:43 UTC

head link

[LLVMdev] Runtime optimization of C++ code with virtual functions

On Sat, 16 Jun 2007, [ISO-8859-1] St�phane Letz wrote:> At runtime after a graph is created, one could imagine optimizing by
> resolving call to "virtual  Compute" and possibly get a more
> efficient Compute method for the entire graph, so that we could write:
>
> DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new
> CONCRETE_DSP()), new CONCRETE_DSP());
>
> graph->Optimize();
>
> graph->Compute(512, in, out); possibly a lot of time.
>
> Is there any possible method using LLVM that would help in this case?
LLVM won't help in this case.  However, I'd strongly recommend dropping 
the virtual functions and using template instantiation to get this.  That 
way you'd do something like:

   PAR_DSP<SEQ_DDSP<CONCRETE_DSP, CONCRETE_DSP>, CONCRETE_DSP> X;
   X->Compute(512, in, out);

This will be efficient even when statically compiled.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Maurizio Vitale

2007-Jun-20 23:01 UTC

head link

[LLVMdev] Runtime optimization of C++ code with virtual functions

On Jun 19, 2007, at 1:43 AM, Chris Lattner wrote:
> On Sat, 16 Jun 2007, [ISO-8859-1] Stéphane Letz wrote:
>> At runtime after a graph is created, one could imagine optimizing by
>> resolving call to "virtual  Compute" and possibly get a more
>> efficient Compute method for the entire graph, so that we could  
>> write:
>>
>> DSP* graph = new PAR_DSP(new SEQ_DDSP(new CONCRETE_DSP(), new
>> CONCRETE_DSP()), new CONCRETE_DSP());
>>
>> graph->Optimize();
>>
>> graph->Compute(512, in, out); possibly a lot of time.
>>
>> Is there any possible method using LLVM that would help in this case?
>
> LLVM won't help in this case.
Is that so or it means that LLVM wouldn't have a prebuilt solution?
I'm asking because (without having ever looked seriously into LLVM) I  
was
thinking to experiment along these lines:

class Source {
	void send (T data) {
		invoke_jit_magic();
		transport (data);
         }
}

transport() would be a virtual method like the original posting. In  
my  case send() would be
part of the framework, so it is not a problem to add the  
invoke_jit_magic. In other case it might be trickier.

On the first call, invoke_jit_magic gains control, traverse the  
binary converting (a subset of) what it finds
to LLVM IR, until it gets to the concrete target. It may have to do a  
bit of work to understand how parameters
are passed to the transport code (it is a virtual function call and  
might be messy in presence of multiple/virtual inheritance.
After that LLVM jit can be used to replace the original binary  
fragment with something faster.

I agree with the suggestion of using templates when possible. In my  
case it is not doable because transport would be
propietary and the code containing it distributed only as binary.

I understand that the disassemblying portion need to be rewritten. Is  
there anything else that would prevent this
approach from working?
Again, haven't looked into LLVM yet, so I can immagine there might be  
problems in describing physical registers in the
IR and at some point stuff must be exactly where the pre-existing  
code expects it.
I don;t want to take your time, but if you could elaborate a bit it  
might prevent me from going down the wrong path.

Best regards,

		Maurizio

llvm dev - Jun 2007 - [LLVMdev] Runtime optimization of C++ code with virtual functions

[LLVMdev] Runtime optimization of C++ code with virtual functions

[LLVMdev] Runtime optimization of C++ code with virtual functions

[LLVMdev] Runtime optimization of C++ code with virtual functions

Apparently Analagous Threads