Duncan P. N. Exon Smith via llvm-dev
2015-Oct-08 00:57 UTC
[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)
I've been digging into some undefined behaviour stemming from how ilist is typically configured. r247937, r247944, and r247978 caused a UBSan failure to start firing on our Green Dragon bots, and after an IRC conversation between David and Nick and Mehdi, we added a blacklist: -- $echo "src:$WORKSPACE/llvm/include/llvm/CodeGen/MachineFunction.h" >> sanitize.blacklist -- ilist/iplist is a pretty strange list, and the more I dig into it (to fix the UB) the more broken I think it is. I want to change a few things about it, but it'll be somewhat intrusive (pun not really intended), so I want to get some buy-in before really diving in. I've CC'ed the people in the IRC conversation and a couple of others that seem to care about ADT and/or UB. "Normal" intrusive lists ======================= First, here's a simple ("normal") intrusive doubly-linked list: struct ListNodeBase { ListNodeBase *next = nullptr; ListNodeBase *prev = nullptr; }; struct ListBase { struct iterator { ListNodeBase *I = nullptr; iterator &operator++() { I = I->next; return *this; } iterator &operator--() { I = I->prev; return *this; } ListNodeBase &operator*() const { return *I; } }; ListNodeBase L; ListBase() { clear(); } void clear() { L.next = L.prev = &L; } iterator begin() { return iterator{L.next}; } iterator end() { return iterator{&L}; } bool empty() const { return begin() == end(); } void insert(iterator P, ListNodeBase &N) { assert(!N.next && !N.prev); N.next = &*P; N.prev = P->prev; N.next->prev = &N; N.prev->next = &N; } void erase(iterator N) { assert(N != end()); N->next->prev = N->prev; N->prev->next = N->next; N->next = N->prev = nullptr; } }; template <class T> class List : ListBase { public: class iterator : ListBase::iterator { friend class List; typedef ListBase::iterator Super; iterator &operator++() { Super::operator++(); return *this; } iterator &operator--() { Super::operator--(); return *this; } T &operator*() const { static_cast<T &>(Super::operator*()); } }; using ListBase::clear; using ListBase::empty; iterator begin() { return iterator(ListBase::begin()); } iterator end() { return iterator(ListBase::end()); } void insert(iterator P, ListNodeBase &N) { ListBase::insert(P, N); } void erase(iterator N) { ListBase::erase(N); } }; In case it's not clear, to use `List<SomeType>`, `SomeType` has to inherit from `ListNodeBase`. There are a few nice properties here. 1. Traversal logic requires no knowledge of the downcast nodes. 2. `List<T>` owns none of the nodes (clear ownership semantics). 3. There are no branches (outside of asserts). 4. We never touch the heap. As a result it's fairly simple. It's also fairly easy to wrap it to provide extra features if desired (e.g., adding ownership semantics, or the 32 variations of `insert()` that we seem to like having). Our ilist/iplist =============== Our ilist/iplist implementation is far more complicated, has none of the above properties (except sometimes #4, and only with extra configuration that exposes UB), and provides a few extra features that I don't think are really worth paying for. The default configuration effectively does the following: template <class T> ListNode { T *prev, T *next; }; template <class T> struct List { T *Head = nullptr; void ensureHead() { if (!Head) { Head = new T(); Head->next = nullptr; Head->prev = Head; } } // complex insertion/removal logic. }; Every list operation (even begin()/end()) starts by somehow calling `ensureHead()`. The key structural differences here: - Instead of the list containing the sentinel, it contains a pointer to the head of the list, and the sentinel is created on demand. - The sentinel is the full `T` (rather than ListNodeBase). - While "prev" pointers are circular, the sentinel's "next" pointer points at `nullptr` (it's "snipped off"). - All pointers are to the downcast type, `T`. (If you look at the code, you'll see it's a little more complicated. There are also a number of arbitrary "hooks" for managing external state. Yay?) Benefits -------- What do we get in return? - Nodes "know" if they are the sentinel, since `next == nullptr`. This lets you do cute things like `NodeTy *getNextNode()` and `NodeTy *getPrevNode()`, which do the obvious, returning nullptr instead of the sentinel. A naive list would need to compare against `end()`. - Iterating forward will eventually dereference `nullptr` (can't infinite loop, even in release builds). - A whole lot of branches :/. UB from sentinel hack --------------------- The memory management is particularly wasteful, so it's not surprising that almost every user overrides it. They effectively do this: template <class T> ListNode { T *prev, *next; }; template <class T> struct List { ListNode<T> Sentinel; T *Head = static_cast<T *>(&Sentinel); void ensureHead() {} }; Because of iterators have pointers to `T` instead of `ListNodeBase` -- this exposes UB. It's *probably* harmless? More UB, broken code from half-node hack ---------------------------------------- It gets worse though :(. This still wastes a pointer compared to the naive list. Many uses of iplist/ilist make a further optimization to win back that pointer, splitting `ListNode<T>` into two, and using only half the node for the sentinel. template <class T> struct ListHalfNode { T *prev; }; template <class T> struct ListNode : ListHalfNode<T> { T *next; }; template <class T> struct List { ListHalfNode<T> Sentinel; T *Head = static_cast<T *>(&Sentinel); void ensureHead() {} }; Besides exposing the same UB as above, this drops the "next" pointer from the sentinel. This is cute, but it's horribly wrong. In particular, the getPrevNode/getNextNode() methods, which use `N->next == nullptr` to check for sentinels, are instead poking into "Head" in the list itself (which is never null). For these lists, `getNextNode()` will never return `nullptr`, because the "snipped" head has been "reattached". Ironically, this means we now have a "naive" doubly-linked circular list in memory, but we have a whack-ton of logic that has no idea. And we have broken API. What's broken? -------------- Unless I'm misreading the code horribly, BasicBlock/Instruction/Argument::getPrevNode/getNextNode() will *never* return `nullptr`. Instead they'll just return the sentinel. Which has been `static_cast`ed to a completely different type. And if someone dereferences it, *bang*. There are other cases, too, but these are the obviously scary ones. Why am I writing this email instead of just fixing it? ===================================================== I don't feel totally comfortable just going ahead and fixing this without buy-in, since it'll touch some things. Here's my basic plan: - Fix `getNextNode()` and `getPrevNode()` by taking in the parent list and checking against `end()` instead of checking for `nullptr`. - Change the lists to be normal circular linked lists (possibly I'll stick an "is-sentinel" bit in the prev pointer in asserts builds so we can have assertions instead of infinite loops). I'll layer our weird partial ownership semantics on top to keep this part NFC. - Re-layer the external state hooks so that they're on top of the basic list instead of inside of it (so that someone in the future can understand the code with far less studying). Anyone want to back me on this before I dive in? Even better, did I miss something fundamental?
Justin Bogner via llvm-dev
2015-Oct-08 18:00 UTC
[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)
"Duncan P. N. Exon Smith" <dexonsmith at apple.com> writes:> I don't feel totally comfortable just going ahead and fixing this > without buy-in, since it'll touch some things. Here's my basic plan: > > - Fix `getNextNode()` and `getPrevNode()` by taking in the parent list > and checking against `end()` instead of checking for `nullptr`. > - Change the lists to be normal circular linked lists (possibly I'll > stick an "is-sentinel" bit in the prev pointer in asserts builds so > we can have assertions instead of infinite loops). I'll layer our > weird partial ownership semantics on top to keep this part NFC. > - Re-layer the external state hooks so that they're on top of the > basic list instead of inside of it (so that someone in the future > can understand the code with far less studying). > > Anyone want to back me on this before I dive in? Even better, did I > miss something fundamental?+1 Other than having to update pretty well the entire code base, I don't see any drawbacks to replacing ilist/iplist with something sane. The current thing is complicated, tries to do too many things (and not many of them well), and broken in several ways. In addition to the UB and re-capitation you mention, I've also heard of issues using the reverse iterators.
escha via llvm-dev
2015-Oct-08 20:56 UTC
[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)
> On Oct 8, 2015, at 11:00 AM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > "Duncan P. N. Exon Smith" <dexonsmith at apple.com> writes: >> I don't feel totally comfortable just going ahead and fixing this >> without buy-in, since it'll touch some things. Here's my basic plan: >> >> - Fix `getNextNode()` and `getPrevNode()` by taking in the parent list >> and checking against `end()` instead of checking for `nullptr`. >> - Change the lists to be normal circular linked lists (possibly I'll >> stick an "is-sentinel" bit in the prev pointer in asserts builds so >> we can have assertions instead of infinite loops). I'll layer our >> weird partial ownership semantics on top to keep this part NFC. >> - Re-layer the external state hooks so that they're on top of the >> basic list instead of inside of it (so that someone in the future >> can understand the code with far less studying). >> >> Anyone want to back me on this before I dive in? Even better, did I >> miss something fundamental? > > +1 > > Other than having to update pretty well the entire code base, I don't > see any drawbacks to replacing ilist/iplist with something sane. The > current thing is complicated, tries to do too many things (and not many > of them well), and broken in several ways. In addition to the UB and > re-capitation you mention, I've also heard of issues using the reverse > iterators.Specifically, I discovered (the hard and painful way) last night that if you reverse-iterator over a MachineBasicBlock, removing/moving instructions as you go — even pre-incrementing the iterator at the start of the loop isn’t enough to prevent the list from being corrupted in unexpected ways. A regular iterator, used backwards (-- instead of ++) works correctly, nothing else different. —escha
Duncan P. N. Exon Smith via llvm-dev
2015-Oct-20 18:13 UTC
[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)
> On 2015-Oct-07, at 17:57, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > I've been digging into some undefined behaviour stemming from how ilist > is typically configured. r247937, r247944, and r247978 caused a UBSan > failure to start firing on our Green Dragon bots, and after an IRC > conversation between David and Nick and Mehdi, we added a blacklist: > -- > $echo "src:$WORKSPACE/llvm/include/llvm/CodeGen/MachineFunction.h" >> sanitize.blacklist > -- > > ilist/iplist is a pretty strange list, and the more I dig into it (to > fix the UB) the more broken I think it is. > > I want to change a few things about it, but it'll be somewhat > intrusive (pun not really intended), so I want to get some buy-in before > really diving in. I've CC'ed the people in the IRC conversation and a > couple of others that seem to care about ADT and/or UB.A quick update on this. The first problem I hit was that there are callers that *rely* on `getNextNode()` returning the sentinel instead of `nullptr`. If you look at the implementation of `getNextNode()`, that's kind of insane. The only way I could think to root out all the similar issues was to look at every call to the implicit conversions and confirm they aren't doing anything strange. Most easily done by applying the attached patch, and getting this compiling again. I have some more commentary in, e.g., r249767 and r249782. Some of the problems I've uncovered include r249758, r249763, r249764, and more scary cases like r249925 and r250211. I've almost finished LLVM proper, but I haven't touched clang yet, or other LLVM projects. Is there any contention about this? Do we eventually want to commit this patch, or should we go back to our old implicit ways once I've cleaned up ilist and iplist? (Basically, should we make clang clean too and commit this patch, or should I just fix the bugs?) -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-WIP-HACK-NOT-READY-DO-NOT-COMMIT-explicit-ilist.patch Type: application/octet-stream Size: 1120 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151020/e59d1025/attachment.obj> -------------- next part --------------
Reid Kleckner via llvm-dev
2015-Oct-20 18:23 UTC
[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)
I think the implicit iterator conversions are much less important now that we have range based for loops, but I still like having them. On Tue, Oct 20, 2015 at 11:13 AM, Duncan P. N. Exon Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On 2015-Oct-07, at 17:57, Duncan P. N. Exon Smith <dexonsmith at apple.com> > wrote: > > > > I've been digging into some undefined behaviour stemming from how ilist > > is typically configured. r247937, r247944, and r247978 caused a UBSan > > failure to start firing on our Green Dragon bots, and after an IRC > > conversation between David and Nick and Mehdi, we added a blacklist: > > -- > > $echo "src:$WORKSPACE/llvm/include/llvm/CodeGen/MachineFunction.h" >> > sanitize.blacklist > > -- > > > > ilist/iplist is a pretty strange list, and the more I dig into it (to > > fix the UB) the more broken I think it is. > > > > I want to change a few things about it, but it'll be somewhat > > intrusive (pun not really intended), so I want to get some buy-in before > > really diving in. I've CC'ed the people in the IRC conversation and a > > couple of others that seem to care about ADT and/or UB. > > A quick update on this. > > The first problem I hit was that there are callers that *rely* on > `getNextNode()` returning the sentinel instead of `nullptr`. If you > look at the implementation of `getNextNode()`, that's kind of insane. > > The only way I could think to root out all the similar issues was to > look at every call to the implicit conversions and confirm they aren't > doing anything strange. Most easily done by applying the attached > patch, and getting this compiling again. I have some more commentary > in, e.g., r249767 and r249782. > > Some of the problems I've uncovered include r249758, r249763, r249764, > and more scary cases like r249925 and r250211. > > I've almost finished LLVM proper, but I haven't touched clang yet, or > other LLVM projects. > > Is there any contention about this? Do we eventually want to commit > this patch, or should we go back to our old implicit ways once I've > cleaned up ilist and iplist? (Basically, should we make clang clean > too and commit this patch, or should I just fix the bugs?) > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151020/f9a42202/attachment.html>
Duncan P. N. Exon Smith via llvm-dev
2015-Nov-11 02:34 UTC
[llvm-dev] ilist/iplist are broken (maybe I'll fix them?)
> On 2015-Oct-07, at 17:57, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > What's broken? > -------------- > > Unless I'm misreading the code horribly, > BasicBlock/Instruction/Argument::getPrevNode/getNextNode() will *never* > return `nullptr`. Instead they'll just return the sentinel. Which has > been `static_cast`ed to a completely different type. And if someone > dereferences it, *bang*. There are other cases, too, but these are the > obviously scary ones. >This part -- the actively broken code -- should be fixed as of r252694. Once that bakes for a day or so, I'll follow up and fix the rest of the UB (I think the hard part is done).