OvermindDL1
2009-Aug-26 11:20 UTC
[LLVMdev] [Stackless] [C++-sig] [Boost] Trouble optimizing Boost.Python integration for game development (it seems too slow)
On Wed, Aug 26, 2009 at 4:41 AM, Dan Sanduleac<sanduleac.dan at gmail.com> wrote:> Oh, I see. Didn't think of this, thanks! > > So, just to be clear, there's no binding overhead in Cython because the > functions defined there are pure python, right? (The function objects I > mean). Whereas the ones I defined in Boost are more expensive to call. > > On Wed, Aug 26, 2009 at 1:02 PM, OvermindDL1 <overminddl1 at gmail.com> wrote: >> >> On Wed, Aug 26, 2009 at 3:31 AM, Dan Sanduleac<sanduleac.dan at gmail.com> >> wrote: >> > Hi, >> > >> > I'm trying to compare different Python-C wrapping techniques to see >> > which >> > would be faster and also more suited to game development. >> > I'm using Stackless Python 2.6.2 at 74550 / GCC 4.3.3, and boost 1.39.0 on >> > Ubuntu 9.04. I implemented a simple Vec3 (3D vector) class in C++ and >> > wrapped it with boost::python. All it does is multiplications and >> > additions, >> > so it implements just two operators for python. >> > The thing is, it proves to be kind of slow compared to an equivalent >> > Cython/Pyrex code. I think it should run faster than Cython code. >> > (Note: Cython is not an abbreviation for C/Python API) >> > I compiled the python library from Boost, in release mode, and then >> > linked >> > the vec3 module, whose code is provided below, to the compiled >> > boost::python >> > library. (I used -O2 when compiling the vec3 module) >> > >> > The testing goes like this: each "tick", 10000 objects update their >> > position, according to their velocity and timedelta since last "tick", >> > and >> > I'm measuring the average time a tick takes to complete. >> > On my machine doing this with Cython takes ~0.026 sec/tick, while doing >> > it >> > with boost.python takes like 0.052 sec/tick >> > (The overhead introduced by python's iterating through the list of >> > objects >> > each tick is about 0.01 sec) >> > During one tick, for each object, python runs this: "self.position +>> > self.velocity * time_delta", >> > where position and velocity are instances of Vec3. >> > >> > I was hoping for better results than with Cython, by using Boost. Am I >> > doing >> > something wrong? >> > >> > >> > >> > Source code: >> > vec3.cpp >> > =========>> > #include <boost/python.hpp> >> > using namespace boost::python; >> > >> > class Vec3 { >> > >> > float x, y, z; >> > >> > public: >> > Vec3(float x, float y, float z); >> > Vec3 &operator*=(float scalar); >> > Vec3 operator*(float scalar) const; >> > Vec3 &operator+=(const Vec3 &who); >> > // that `const Vec3` is REALLY needed, unless you want error monsoon >> > to >> > come down >> > }; >> > >> > // === boost:python wrapper ==>> > // publish just += and * to python >> > >> > BOOST_PYTHON_MODULE(vec3) >> > { >> > class_<Vec3>("Vec3", init<float, float, float>()) >> > .def(self += self) >> > .def(self * float()) >> > ; >> > } >> > >> > // === implementation ==>> > >> > Vec3::Vec3(float x, float y, float z) { >> > this->x = x; >> > this->y = y; >> > this->z = z; >> > } >> > >> > Vec3 & Vec3::operator*=(float scalar) { >> > this->x *= scalar; >> > this->y *= scalar; >> > thiz->z *= scalar;a >> > } >> > >> > Vec3 Vec3::operator*(float scalar) const { >> > return Vec3(*this) *= scalar; >> > } >> > >> > Vec3 & Vec3::operator+=(const Vec3 &who) { >> > this->x += who.x; >> > this->y += who.y; >> > this->z += who.z; >> > return *this; >> > } >> > >> > =============================================>> > vec3.pyx (cython code, for reference) >> > ==========================>> > cdef class Vec3: >> > >> > cdef readonly double x, y, z >> > >> > def __cinit__(Vec3 self, double x, double y, double z): >> > self.x, self.y, self.z = x, y, z >> > >> > # operator * >> > def __mul__(Vec3 self, double arg): >> > return Vec3(self.x*arg, self.y*arg, self.z*arg) >> > >> > # operator +>> > def __iadd__(Vec3 self, Vec3 arg): >> > #if not isinstance(arg, Vec3): >> > # return NotImplemented >> > self.x += arg.x >> > self.y += arg.y >> > self.z += arg.z >> > return self >> >> That is because Boost.Python is designed to be an easy to use, safe, >> and powerful binder, not fast. It is good to bind things that are >> long lived, not something as quick as simple operations like an >> addition and so forth, the overhead of the call will be so much higher >> then the actual operation. Something like Cython does not have that >> limitation, although it produced slower code, the binding overhead >> does not exist.Boost.Python's binding layer includes an exception mechanism (that converts between C++ and Python exceptions), and a register system (to ensure everything is type-safe), those both have some overhead that more pure calls will not have. Basically, if something that is called through Boost.Python has a longer lifespan, it is good to use, certainly easier, but if it is only a short call like the above, it is not the best thing to use. Everything has its place. :)