Lang Hames
2014-Jul-18 00:07 UTC
[LLVMdev] [cfe-dev] Use of Smart Pointers in LLVM Projects
Hi Alp, A clear place that smart pointers *shouldn't* be used are 'create'> functions. > > There's already a strong convention in the thousands of 'create' functions > across the codebase and they have the clear, well-defined behaviour of > returning a new pointer. > > We all know what to expect from these functions, much like new/delete in > C++ and malloc/free. They can be easily adopted by std::shared_ptr or > std::unique_ptr as needed, so there's no benefit in enforcing smart pointer > usage at creation. >I strongly disagree with this: If knowing that new'ed things must be deleted was enough we would never have to worry about memory leaks. It's a huge benefit to have the compiler catch our mistakes for us - it's one less thing to worry about.> Apart from the complexity and visual clutter, it's pretty annoying to have > to decide whether to get(), release() or somehow else acquire the object > you just created. The result of using smart pointers as returns isn't > pretty -- you end up with awkward code like createFoo().get()->func() or > other.reset(createFoo().release()). >I think if you were ever writing createFoo()->func() before you were leaking memory. And actually you can now safely use the simpler syntax with std::unique_ptr: createFoo()->func() will work fine, there's no need for the .get(). Ditto other.reset(createFoo.release()). I think this would just be: other createFoo(), (or a = std::move(b);). There may be occasions where you need to call .get() to pass a pointer as an argument to a function that isn't taking ownership, but that doesn't seem odious to me. Cheers, Lang. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140717/7bc2545a/attachment.html>
Alp Toker
2014-Jul-18 00:21 UTC
[LLVMdev] [cfe-dev] Use of Smart Pointers in LLVM Projects
On 18/07/2014 03:07, Lang Hames wrote:> Hi Alp, > > A clear place that smart pointers *shouldn't* be used are 'create' > functions. > > There's already a strong convention in the thousands of 'create' > functions across the codebase and they have the clear, > well-defined behaviour of returning a new pointer. > > We all know what to expect from these functions, much like > new/delete in C++ and malloc/free. They can be easily adopted by > std::shared_ptr or std::unique_ptr as needed, so there's no > benefit in enforcing smart pointer usage at creation. > > > I strongly disagree with this: If knowing that new'ed things must be > deleted was enough we would never have to worry about memory leaks. > It's a huge benefit to have the compiler catch our mistakes for us - > it's one less thing to worry about.I don't remember the last time we had a leak because of a 'create' function. Tricky ownership is going to continue to be tricky either way so I'd tend towards keeping the status quo for these functions. The bad patterns seem come about mostly when smart pointers are used as return values. We already avoid returning structures by value, so perhaps any policy can be expressed as a continuation of that?> Apart from the complexity and visual clutter, it's pretty annoying > to have to decide whether to get(), release() or somehow else > acquire the object you just created. The result of using smart > pointers as returns isn't pretty -- you end up with awkward code > like createFoo().get()->func() or other.reset(createFoo().release()). >Creating an object for single use and having it destroyed implicitly doesn't sound like a great pattern. I almost wonder if it isn't better to write the ownership explicitly in such cases: unique_ptr<T> t = createFoo(); t->func(); A createFoo() statement with an unused pointer new return can be identified by leak detectors, whereas an unused smart pointer return will be created and deleted silently, which isn't that desirable. If you really want to delete an object as soon as it's created that seems to be worth keeping explicit.> > I think if you were ever writing createFoo()->func() before you were > leaking memory. And actually you can now safely use the simpler syntax > with std::unique_ptr: createFoo()->func() will work fine, there's no > need for the .get(). > > Ditto other.reset(createFoo.release()). I think this would just be: > other = createFoo(), (or a = std::move(b);).The nice thing about a traditional 'create' function is that there's one clear way to do that -- by assignment, instead of deciding between two or three. It should be equally safe.> > There may be occasions where you need to call .get() to pass a pointer > as an argument to a function that isn't taking ownership, but that > doesn't seem odious to me. > Cheers, > Lang.-- http://www.nuanti.com the browser experts
David Blaikie
2014-Jul-18 00:35 UTC
[LLVMdev] [cfe-dev] Use of Smart Pointers in LLVM Projects
On Thu, Jul 17, 2014 at 5:21 PM, Alp Toker <alp at nuanti.com> wrote:> > On 18/07/2014 03:07, Lang Hames wrote: >> >> Hi Alp, >> >> A clear place that smart pointers *shouldn't* be used are 'create' >> functions. >> >> There's already a strong convention in the thousands of 'create' >> functions across the codebase and they have the clear, >> well-defined behaviour of returning a new pointer. >> >> We all know what to expect from these functions, much like >> new/delete in C++ and malloc/free. They can be easily adopted by >> std::shared_ptr or std::unique_ptr as needed, so there's no >> benefit in enforcing smart pointer usage at creation. >> >> >> I strongly disagree with this: If knowing that new'ed things must be >> deleted was enough we would never have to worry about memory leaks. It's a >> huge benefit to have the compiler catch our mistakes for us - it's one less >> thing to worry about. > > > I don't remember the last time we had a leak because of a 'create' function. > Tricky ownership is going to continue to be tricky either way so I'd tend > towards keeping the status quo for these functions.I've explained a way to make pretty simple code even simpler. It doesn't seem like factory functions returning raw pointers is helping us.> The bad patterns seem come about mostly when smart pointers are used as > return values.Which bad patterns are you referring to?> We already avoid returning structures by value, so perhaps > any policy can be expressed as a continuation of that? > > > > >> Apart from the complexity and visual clutter, it's pretty annoying >> to have to decide whether to get(), release() or somehow else >> acquire the object you just created. The result of using smart >> pointers as returns isn't pretty -- you end up with awkward code >> like createFoo().get()->func() or other.reset(createFoo().release()). >> > > Creating an object for single use and having it destroyed implicitly doesn't > sound like a great pattern. I almost wonder if it isn't better to write the > ownership explicitly in such cases: > > unique_ptr<T> t = createFoo();This would have to be: unique_ptr<T> t(createFoo()); if createFoo returns a raw pointer.> t->func(); > > A createFoo() statement with an unused pointer new return can be identified > by leak detectors,Seems like a bit of a stretch to detect unused variables by using a leak checker.> whereas an unused smart pointer return will be created > and deleted silently, which isn't that desirable. If you really want to > delete an object as soon as it's created that seems to be worth keeping > explicit. > > >> >> I think if you were ever writing createFoo()->func() before you were >> leaking memory. And actually you can now safely use the simpler syntax with >> std::unique_ptr: createFoo()->func() will work fine, there's no need for the >> .get(). >> >> Ditto other.reset(createFoo.release()). I think this would just be: other >> = createFoo(), (or a = std::move(b);). > > > The nice thing about a traditional 'create' function is that there's one > clear way to do that -- by assignment, instead of deciding between two or > three. It should be equally safe.Actually there isn't and at least one (raw pointer) is casually unsafe: 1) std::unique_ptr<T> foo(createFoo()); 2) foo.reset(createFoo()); 3) T *foo = createFoo(); ... delete foo; Just for the initialization - for assignment with raw pointers, that's where it gets scarier - if you're just using raw assignment, now you've got two raw pointers pointing to the same object, one of which, presumably, owns the object. with unique_ptr, it's more like: 1) std::unique_ptr<T> foo = createFoo(); 2) foo = createFoo(); and optionally: auto foo = createFoo(); - David
Lang Hames
2014-Jul-18 01:36 UTC
[LLVMdev] [cfe-dev] Use of Smart Pointers in LLVM Projects
Hi Alp,> I don't remember the last time we had a leak because of a 'create' function.I'm not sure I follow you here. Are you suggesting that we've never leaked memory returned by a create function? I'll wager we have. I'm also certain that we've double deleted such memory, which is something unique_ptr should also prevent.> The bad patterns seem come about mostly when smart pointers are used as return values. We already avoid returning structures by value, so perhaps any policy can be expressed as a continuation of that?Is there an official injunction against returning structs by value? Perhaps we should reconsider that in light of move semantics? In any case I think the only reason to avoid returning structs by value is the overhead of the copies, which doesn't apply to unique_ptr. Any policy regarding smart pointers use in return types should be separate from the struct usage policy.> Creating an object for single use and having it destroyed implicitly doesn't sound like a great pattern. I almost wonder if it isn't better to write the ownership explicitly in such cases: > > unique_ptr<T> t = createFoo(); > t->func(); > > A createFoo() statement with an unused pointer new return can be identified by leak detectors, whereas an unused smart pointer return will be created and deleted silently, which isn't that desirable. If you really want to delete an object as soon as it's created that seems to be worth keeping explicit.I only mentioned it since you offered it as an example of something that would be uncomfortable to write, which it turns out not to be. As you say - this is very unlikely to come up at all. The idea of using memory leaks to diagnose dead code sounds less than ideal.>> >> I think if you were ever writing createFoo()->func() before you were leaking memory. And actually you can now safely use the simpler syntax with std::unique_ptr: createFoo()->func() will work fine, there's no need for the .get(). >> >> Ditto other.reset(createFoo.release()). I think this would just be: other = createFoo(), (or a = std::move(b);). > > The nice thing about a traditional 'create' function is that there's one clear way to do that -- by assignment, instead of deciding between two or three. It should be equally safe.The only difference between regular pointers and smart pointers is precisely the difference you want: Smart pointers keep the ownership explicit. Consider: Foo *A = createFoo(); Foo *B = A; // Who owns the object now? vs std::unique_ptr<Foo> A = createFoo(); std::unique_ptr<Foo> B = std::move(A); // Clear transfer of ownership. David mentioned other smart pointer types (e.g. shared_ptr) in his email, and I should be clear: I'm advocating for std::unique_ptr usage where appropriate, as that's what I have experience with. I'm less familiar with other smart pointers, and happy to hear from the experts on those. Cheers, Lang.
Hi All, I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. Regards, Lang.