Mehdi AMINI via llvm-dev
2020-Nov-17 00:47 UTC
[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com> wrote:> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote: > > > > > > > > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com> > wrote: > >> > >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com> > wrote: > >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com> > wrote: > >> >> > >> >> I will say I'm not a huge fan of adding even more names for things in > >> >> this fairly core/common use case (now we'll have even more vector > >> >> names to pick from) - can we use default template arguments so we can > >> >> write SmallVector<T> instead of SmallVector<T, N> and would that > >> >> address some of the use cases proposed here? > >> > > >> > > >> > I won't claim it is perfect, but the added names are a compromise > over rounds of reviews with the folks in the revision. In particular I'm > quite concerned that a default value for N on the SmallVector does not > carry the intent the same way, and is too easy to miss in review (or while > reading code). > >> > >> I'm not quite following here - what sort of problems do you anticipate > >> by readers missing the default value for N? > > > > > > Reading code `SmallVector<SomeType>` does not express that it is > *intentional* to leave of the value for N. It can easily be just forgotten, > and it could easily be implicitly `0`, and as a reviewer (or code reader > later) I don't know if this is was intentional or not. This is why I am > quite opposed to "loosen" the current requirement on SmallVector: a > different name for a different intent is better fitting to me. > > I don't know why it being implicitly zero is noteworthy - anymore than > it being implicitly 1 or 3, etc. > > As for whether it's intentional: I think if we're moving in this > direction that's proposed, the idea is that by default we don't want > to be explicit about the N - so in general we won't be. And sometimes > someone will want to be explicit and add an N, but otherwise the > reasonable default is not to have an N. I think the intent is clear > enough - consider other default template parameters like customized > deleters on a unique_ptr: I don't wonder if someone intended to have a > customized deleter on every unique_ptr, one assumes that wasn't > required/intended unless it's added. It seems like the idea is for > non-explicit N to be a safe/common default, and explicit N to be > noteworthy/require some justification or scrutiny. >I get your point, but I don't share the conclusion here: I don't believe that universally not specifying the N on non-trivial types is a good default. It only makes sense to me for small types, hence it is error prone.> > >> > To me the drawbacks are outweighing the benefits too much. > >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, > without an easy way to figure it out that there is no actual inline storage > while reading the code. > >> > >> Why would it be necessary to figure that out? If we generally feel the > >> right default inline storage happens to be zero in that case and that > >> SmallVector will pick a good default (including possibly 0) that seems > >> like a good thing to me. (why would we call out zero specially, if we > >> want to avoid calling out other values explicitly) > > > > > > I think this is an API that is "easy to misuse", I don't see any > advantage to it while it has drawbacks: why would we do that? > > What kind of misuse do you have in mind?It does not convey the intent, it is too easy to misuse: i.e. you expect a small storage but you don't have one. This is enough to me to outweigh the benefits of the direction and prefer the status quo.> To me it seems like it would > be consistent with the idea that we have built rules about what good > default inline sizes are - and there's no need for us to think about > it on every use, we just let the default do what it's meant to do. > > >> > An alternative was to reserve the default to only "small object" so > that N isn't zero, but there isn't a platform independent way of doing that > and keep the code portable I believe. So `SVec<T>` is really saying: "I am > willing to pay for some limite inline storage if possible but I don't have > a N in mind". > >> > >> That quoted statement still sounds like it could encapsulate the > >> possibility of 0 too, though. "limited inline storage if possible" > >> could be "and the answer to that constraint is zero in this case - but > >> if you happen to make the T smaller, it could become non-zero > >> organically/without the need to touch this code" (as the N could vary > >> organically between non-zero values as struct sizes change too) > > > > Yes the quoted statement says exactly that! > > Sorry, it seems we're jumbling up the two issues: Whether the type > name should be different when asking for implicit default inline > storage length and, separately, whether explicit zero length storage > should use a different name. The discussion above, I thought, was > about the latter, not the former - but your response seeems to be > about the former rather than the latter.You're answering my quote about SVec, which is the type that is about implicit N, Vec (no S!) is about the N=0 case. The quote is also about the inline storage implicit N, it isn't clear to me how you link this to the 0 case (which does not have inline storage)?>> Two separate discussions/threads/patches may help keep the > communication more clear. > > > This is why I don't like having this behavior on SmallVector: this is a > different semantics / intent that the programmers have and this is > something that I like being able to read differently. > > SVec does not accept a `N`: if I read SVec<Type> there can't be a > possible accidental omission of N here. It has to be a choice between > `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not > `SmallVector<SomeType>`. > > To discuss this issue of whether accepting a default size versus > having an explicit size is a different semantic - as above, I think > it's worth considering/comparing this to other templates and their > default template arguments. Things like std::vector's customizable > allocators (you. don't have to use a different name for the container > when you specify a custom allocation scheme for std::vector - but we > don't wonder every time we see std::vector<T> whether the user meant > to customize the allocator - we accept the default is usualy intended > (as the default buffer size would be usually intended) unless it's > specified - if during review we think a custom allocator (or > non-default buffer size) might be merited, we could ask about it - we > probably would even if the user had written SVec<T> the same way we > ask about other data structures today "did you mean SmallVector<T, > 3>"?), unique_ptr's custom deleters, etc. >I don't necessarily connect with the custom allocator / custom delete analogy right now, they seem too different to me. I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and `std::unordered_map` which have different class names and aren't just differentiated with a trait passed as template argument.> > >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` > is because it is frequently preferable to `std::vector` but still isn't > readable or immediately intuitive and so is rarely used in practice (see > https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h > for the documented points on N=0). > >> > >> I'm not sure llvm::Vector/Vec would necessarily make it more often > >> used in practice. Maybe a bunch of cleanup and more specific wording > >> in the style guide that would ban std::vector might help more there. > >> > >> Though I agree that it may be suitable to have a clear name for > >> SmallVector<T, 0> since the "Small" is slightly misleading (though not > >> entirely - it's important for the "SmallVectorImpl" benefits of > >> SmallVector - so renaming it to Vector and still using it via > >> SmallVectorImpl<T>& might also be confusing). > >> > >> >> Got any sense of the total value here? Major savings to be had? > >> >> > >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that > leaves > >> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0> > >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm > >> >> opportunities? Though that's rarely needed compared to ArrayRef.) > >> >> If SmallVector<T> would suffice then maybe Vec<T> could be > >> >> ZeroSmallVector<T>? Not sure. > >> > > >> > > >> > ZeroSmallVector<T> does not really address your "more vector names to > pick from" concerns, > >> > >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we > >> have one fewer name and less convention to base the Vec<T> on (since > >> it won't have the SVec<T> sibling), so picking a name closer to the > >> existing SmallVector might be more viable/helpful. > >> > >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the > "default case" to be the easiest to reach / most intuitive to pick? > `llvm::Vec` looks like "just a vector". > >> > >> Somewhat - but Vec is an abbreviation that isn't really used otherwise > >> (if we consider SVec as well, I'd still push back on the introduction > >> of the abbreviation for both cases) > > > > One aspect of the naming is to avoid one not being a prefix of the other > (IDE completion, etc.) or them being too close to differentiate. > > > >> and llvm::Vector would be only a > >> case separation away form a standard name (when considering the > >> unqualified name - which is likely to come up a fair bit, as we'll see > >> "Vector" used unqualified a lot). > > > > Note: this is why we converged with llvm::Vec and not llvm::Vector in > the revision > > Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity > with standard names, etc). Perhaps a bit of a discussion of what other > libraries do around this (no doubt there are a bunch of C++ libraries > that have "here's the default vector you shuold use instead of > std::vector for these reasons" situations). >I had looked into this when the revision was sent, but I couldn't find a library with a "vector with inlined storage" and a default N value. Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a default: https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71 Folly has a default hard-coded to 1: https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430 Folly has also another class, FBVector, which is really an alternative to std::vector with a different memory management strategy (growth factor, etc.). I don't think Abseil has anything else. Do you have other ideas of related work to look for?> > >> I wouldn't entirely object to SmallVector<T> getting a smart default > >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I > >> don't feel /super/ great about it, but yeah, if we're going to push > >> the Programmer's Manual encouragement further in terms of > >> reducing/removing use of std::vector, then maybe llvm::Vector isn't > >> the worst way to do that. > >> > >> (it might be that this would benefit from being two separate > >> discussions, though - one on how to provide smart defaults for > >> SmallVector<T> and one on if/how to push std::vector deprecation in > >> favor of SmallVector<T, 0> (semantically speaking - no matter how it's > >> spelled) further along) > > > > > > For historical purpose: the review was actually originally only adding a > default for SmallVector and nothing else :) > > We only converged to the current proposal after a few rounds of reviews > trying to tradeoff amongst folks in the review. > > Perhaps you could summarize some of those discussions/decisions in > more detail here? As the RFC stands, these are my comments - I know > it's always a tradeoff of which audiences are brought in at what > stages (though often folks send ADT/Support changes my way during > review - I should probably just set up a Herald rule for these > things). > > Sounds like maybe I am in agreement with the original version of the > review, then. >Likely: the review evolved because I opposed to changing the status quo in this direction I guess. Best, -- Mehdi> > - Dave > > > >> > > >> > Cheers, > >> > > >> > -- > >> > Mehdi > >> > > >> > > >> >> > >> >> > >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev > >> >> <llvm-dev at lists.llvm.org> wrote: > >> >> > > >> >> > We've pretty happy now with a patch that adds two wrappers around > SmallVector that make it 1) more convenient to use and 2) will tend to > mitigate misuse of SmallVector. We think it's ready for wider discussion: > https://reviews.llvm.org/D90884 > >> >> > > >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N chosen > automatically to keep its size under 64 Bytes (that heuristic is easy to > change though). The recommendation in the patch is to use this "on the > stack, where a "small" number of elements are expected". > >> >> > > >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us > get the (little-known?) benefits of SmallVector even when it has no inline > elements (see > https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The > recommendation in the patch is to use this when the SmallVector is on the > heap. > >> >> > > >> >> > A lot of this is boiled out from the discussion in > https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1 > >> >> > > >> >> > The goals here are twofold: > >> >> > > >> >> > 1. convenience: not having to read/write "N", or do an extra > edit/recompile cycle if you forgot it > >> >> > > >> >> > 2. avoiding pathological cases: The choice of N is usually > semi-arbitrary in our experience, and if one isn't careful, can result in > sizeof(SmallVector) becoming huge, especially in the case of nested > SmallVectors. This patch avoids pathological cases in two ways: > >> >> > A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which > prevents pathological size amplifications like in `SmallVector<struct > {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply > together. Of course, users can always write SmallVector<T, N> explicitly to > bypass this, but the need for that seems rare. > >> >> > B. SmallVector<T, 0> feels "weird to write" for most folks, even > though it is frequently the right choice. Vec<T> mitigates that by "looking > natural". > >> >> > > >> >> > I'm surfacing this as an RFC to get feedback on a couple > higher-level points: > >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to have? > >> >> > - get wider consensus around suggesting these as "defaults" (see > my updates to ProgrammersManual.rst in the patch) > >> >> > - how much we want to bulk migrate code vs let it organically > grow. Replacing SmallVector<T, 0> with Vec<T> should be completely > mechanical. Replacing SmallVector<T, N> for general N would be a lot more > work. > >> >> > - of course: naming. SVector/Vector were floated in the patch as > well and seem ok. SmallVec was rejected as it was a prefix of SmallVector > (messes up autocomplete). > >> >> > > >> >> > Looking forward to a world with fewer guessed SmallVector sizes, > >> >> > > >> >> > -- Sean Silva > >> >> > _______________________________________________ > >> >> > LLVM Developers mailing list > >> >> > llvm-dev at lists.llvm.org > >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/119b711e/attachment.html>
Sean Silva via llvm-dev
2020-Nov-17 02:14 UTC
[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
On Mon, Nov 16, 2020 at 4:48 PM Mehdi AMINI <joker.eph at gmail.com> wrote:> > > On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com> wrote: > >> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote: >> > >> > >> > >> > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com> >> wrote: >> >> >> >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com> >> wrote: >> >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com> >> wrote: >> >> >> >> >> >> I will say I'm not a huge fan of adding even more names for things >> in >> >> >> this fairly core/common use case (now we'll have even more vector >> >> >> names to pick from) - can we use default template arguments so we >> can >> >> >> write SmallVector<T> instead of SmallVector<T, N> and would that >> >> >> address some of the use cases proposed here? >> >> > >> >> > >> >> > I won't claim it is perfect, but the added names are a compromise >> over rounds of reviews with the folks in the revision. In particular I'm >> quite concerned that a default value for N on the SmallVector does not >> carry the intent the same way, and is too easy to miss in review (or while >> reading code). >> >> >> >> I'm not quite following here - what sort of problems do you anticipate >> >> by readers missing the default value for N? >> > >> > >> > Reading code `SmallVector<SomeType>` does not express that it is >> *intentional* to leave of the value for N. It can easily be just forgotten, >> and it could easily be implicitly `0`, and as a reviewer (or code reader >> later) I don't know if this is was intentional or not. This is why I am >> quite opposed to "loosen" the current requirement on SmallVector: a >> different name for a different intent is better fitting to me. >> >> I don't know why it being implicitly zero is noteworthy - anymore than >> it being implicitly 1 or 3, etc. >> >> As for whether it's intentional: I think if we're moving in this >> direction that's proposed, the idea is that by default we don't want >> to be explicit about the N - so in general we won't be. And sometimes >> someone will want to be explicit and add an N, but otherwise the >> reasonable default is not to have an N. I think the intent is clear >> enough - consider other default template parameters like customized >> deleters on a unique_ptr: I don't wonder if someone intended to have a >> customized deleter on every unique_ptr, one assumes that wasn't >> required/intended unless it's added. It seems like the idea is for >> non-explicit N to be a safe/common default, and explicit N to be >> noteworthy/require some justification or scrutiny. >> > > I get your point, but I don't share the conclusion here: I don't believe > that universally not specifying the N on non-trivial types is a good > default. It only makes sense to me for small types, hence it is error prone. > > > >> >> >> > To me the drawbacks are outweighing the benefits too much. >> >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, >> without an easy way to figure it out that there is no actual inline storage >> while reading the code. >> >> >> >> Why would it be necessary to figure that out? If we generally feel the >> >> right default inline storage happens to be zero in that case and that >> >> SmallVector will pick a good default (including possibly 0) that seems >> >> like a good thing to me. (why would we call out zero specially, if we >> >> want to avoid calling out other values explicitly) >> > >> > >> > I think this is an API that is "easy to misuse", I don't see any >> advantage to it while it has drawbacks: why would we do that? >> >> What kind of misuse do you have in mind? > > > It does not convey the intent, it is too easy to misuse: i.e. you expect a > small storage but you don't have one. This is enough to me to outweigh the > benefits of the direction and prefer the status quo. > > > >> To me it seems like it would >> be consistent with the idea that we have built rules about what good >> default inline sizes are - and there's no need for us to think about >> it on every use, we just let the default do what it's meant to do. >> >> >> > An alternative was to reserve the default to only "small object" so >> that N isn't zero, but there isn't a platform independent way of doing that >> and keep the code portable I believe. So `SVec<T>` is really saying: "I am >> willing to pay for some limite inline storage if possible but I don't have >> a N in mind". >> >> >> >> That quoted statement still sounds like it could encapsulate the >> >> possibility of 0 too, though. "limited inline storage if possible" >> >> could be "and the answer to that constraint is zero in this case - but >> >> if you happen to make the T smaller, it could become non-zero >> >> organically/without the need to touch this code" (as the N could vary >> >> organically between non-zero values as struct sizes change too) >> > >> > Yes the quoted statement says exactly that! >> >> Sorry, it seems we're jumbling up the two issues: Whether the type >> name should be different when asking for implicit default inline >> storage length and, separately, whether explicit zero length storage >> should use a different name. The discussion above, I thought, was >> about the latter, not the former - but your response seeems to be >> about the former rather than the latter. > > > You're answering my quote about SVec, which is the type that is about > implicit N, Vec (no S!) is about the N=0 case. The quote is also about the > inline storage implicit N, it isn't clear to me how you link this to the 0 > case (which does not have inline storage)? > > > >> > > >> Two separate discussions/threads/patches may help keep the >> communication more clear. >> >> > This is why I don't like having this behavior on SmallVector: this is >> a different semantics / intent that the programmers have and this is >> something that I like being able to read differently. >> > SVec does not accept a `N`: if I read SVec<Type> there can't be a >> possible accidental omission of N here. It has to be a choice between >> `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not >> `SmallVector<SomeType>`. >> >> To discuss this issue of whether accepting a default size versus >> having an explicit size is a different semantic - as above, I think >> it's worth considering/comparing this to other templates and their >> default template arguments. Things like std::vector's customizable >> allocators (you. don't have to use a different name for the container >> when you specify a custom allocation scheme for std::vector - but we >> don't wonder every time we see std::vector<T> whether the user meant >> to customize the allocator - we accept the default is usualy intended >> (as the default buffer size would be usually intended) unless it's >> specified - if during review we think a custom allocator (or >> non-default buffer size) might be merited, we could ask about it - we >> probably would even if the user had written SVec<T> the same way we >> ask about other data structures today "did you mean SmallVector<T, >> 3>"?), unique_ptr's custom deleters, etc. >> > > I don't necessarily connect with the custom allocator / custom delete > analogy right now, they seem too different to me. > I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and > `std::unordered_map` which have different class names and aren't just > differentiated with a trait passed as template argument. > > >> >> >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T, >> 0>` is because it is frequently preferable to `std::vector` but still isn't >> readable or immediately intuitive and so is rarely used in practice (see >> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h >> for the documented points on N=0). >> >> >> >> I'm not sure llvm::Vector/Vec would necessarily make it more often >> >> used in practice. Maybe a bunch of cleanup and more specific wording >> >> in the style guide that would ban std::vector might help more there. >> >> >> >> Though I agree that it may be suitable to have a clear name for >> >> SmallVector<T, 0> since the "Small" is slightly misleading (though not >> >> entirely - it's important for the "SmallVectorImpl" benefits of >> >> SmallVector - so renaming it to Vector and still using it via >> >> SmallVectorImpl<T>& might also be confusing). >> >> >> >> >> Got any sense of the total value here? Major savings to be had? >> >> >> >> >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that >> leaves >> >> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0> >> >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm >> >> >> opportunities? Though that's rarely needed compared to ArrayRef.) >> >> >> If SmallVector<T> would suffice then maybe Vec<T> could be >> >> >> ZeroSmallVector<T>? Not sure. >> >> > >> >> > >> >> > ZeroSmallVector<T> does not really address your "more vector names >> to pick from" concerns, >> >> >> >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we >> >> have one fewer name and less convention to base the Vec<T> on (since >> >> it won't have the SVec<T> sibling), so picking a name closer to the >> >> existing SmallVector might be more viable/helpful. >> >> >> >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the >> "default case" to be the easiest to reach / most intuitive to pick? >> `llvm::Vec` looks like "just a vector". >> >> >> >> Somewhat - but Vec is an abbreviation that isn't really used otherwise >> >> (if we consider SVec as well, I'd still push back on the introduction >> >> of the abbreviation for both cases) >> > >> > One aspect of the naming is to avoid one not being a prefix of the >> other (IDE completion, etc.) or them being too close to differentiate. >> > >> >> and llvm::Vector would be only a >> >> case separation away form a standard name (when considering the >> >> unqualified name - which is likely to come up a fair bit, as we'll see >> >> "Vector" used unqualified a lot). >> > >> > Note: this is why we converged with llvm::Vec and not llvm::Vector in >> the revision >> >> Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity >> with standard names, etc). Perhaps a bit of a discussion of what other >> libraries do around this (no doubt there are a bunch of C++ libraries >> that have "here's the default vector you shuold use instead of >> std::vector for these reasons" situations). >> > > I had looked into this when the revision was sent, but I couldn't find a > library with a "vector with inlined storage" and a default N value. > Both Abseil and Folly have a SmallVector equivalent, Abseil does not have > a default: > https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71 > Folly has a default hard-coded to 1: > https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430 > Folly has also another class, FBVector, which is really an alternative to > std::vector with a different memory management strategy (growth factor, > etc.). I don't think Abseil has anything else. > > Do you have other ideas of related work to look for? >Actually, the reason I originally pursued this was that I used a SmallDenseMap and happily found that it had a default number of inline elements: https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877 I thought "why not SmallVector, since I use that a lot more often"? But based on historical misuse of SmallVector, using a hardcoded number like "4" (like SmallDenseMap does) as the default doesn't make much sense, so I pursued a bounded-sizeof approach.> > > >> >> >> I wouldn't entirely object to SmallVector<T> getting a smart default >> >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I >> >> don't feel /super/ great about it, but yeah, if we're going to push >> >> the Programmer's Manual encouragement further in terms of >> >> reducing/removing use of std::vector, then maybe llvm::Vector isn't >> >> the worst way to do that. >> >> >> >> (it might be that this would benefit from being two separate >> >> discussions, though - one on how to provide smart defaults for >> >> SmallVector<T> and one on if/how to push std::vector deprecation in >> >> favor of SmallVector<T, 0> (semantically speaking - no matter how it's >> >> spelled) further along) >> > >> > >> > For historical purpose: the review was actually originally only adding >> a default for SmallVector and nothing else :) >> > We only converged to the current proposal after a few rounds of reviews >> trying to tradeoff amongst folks in the review. >> >> Perhaps you could summarize some of those discussions/decisions in >> more detail here? As the RFC stands, these are my comments - I know >> it's always a tradeoff of which audiences are brought in at what >> stages (though often folks send ADT/Support changes my way during >> review - I should probably just set up a Herald rule for these >> things). >> >> Sounds like maybe I am in agreement with the original version of the >> review, then. >> > > Likely: the review evolved because I opposed to changing the status quo in > this direction I guess. >It sounds like there is pretty good consensus here to just let `SmallVector<T>` "just work" and choose a default N. The one exception seems to be that Mehdi has a strong objection having the default N possibly be 0 (to ensure the bounded-sizeof property). So let's shift the discussion with a laser focus to specifically that point. Responding to the arguments I've seen presented so far: 1. I'm not convinced that SmallVector<T> with defaulted N==0 is really error prone. Mehdi, I like your definition of "I am willing to pay for some limite inline storage if possible but I don't have a N in mind" and I think we can apply it to SmallVector<T>; I don't think adding SVec<T> is needed for that. 2. I don't understand the concern about "forgetting the N". In the "new world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be treated as "I profiled this code and this explicit N is the right choice". That is, adding an explicit `N` is going to be rare and deliberate (possibly with reviewers insisting on a comment!) -- this is not something that one can "forget". For point 2., I think we will have some transition period where reviewers will organically start to propagate the new advice (and patch authors enjoying not having to write N), but I think the direction is effectively what I stated. -- Sean Silva> > Best, > > -- > Mehdi > > > > >> >> - Dave >> >> >> >> > >> >> > Cheers, >> >> > >> >> > -- >> >> > Mehdi >> >> > >> >> > >> >> >> >> >> >> >> >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev >> >> >> <llvm-dev at lists.llvm.org> wrote: >> >> >> > >> >> >> > We've pretty happy now with a patch that adds two wrappers around >> SmallVector that make it 1) more convenient to use and 2) will tend to >> mitigate misuse of SmallVector. We think it's ready for wider discussion: >> https://reviews.llvm.org/D90884 >> >> >> > >> >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N >> chosen automatically to keep its size under 64 Bytes (that heuristic is >> easy to change though). The recommendation in the patch is to use this "on >> the stack, where a "small" number of elements are expected". >> >> >> > >> >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us >> get the (little-known?) benefits of SmallVector even when it has no inline >> elements (see >> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). >> The recommendation in the patch is to use this when the SmallVector is on >> the heap. >> >> >> > >> >> >> > A lot of this is boiled out from the discussion in >> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1 >> >> >> > >> >> >> > The goals here are twofold: >> >> >> > >> >> >> > 1. convenience: not having to read/write "N", or do an extra >> edit/recompile cycle if you forgot it >> >> >> > >> >> >> > 2. avoiding pathological cases: The choice of N is usually >> semi-arbitrary in our experience, and if one isn't careful, can result in >> sizeof(SmallVector) becoming huge, especially in the case of nested >> SmallVectors. This patch avoids pathological cases in two ways: >> >> >> > A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which >> prevents pathological size amplifications like in `SmallVector<struct >> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply >> together. Of course, users can always write SmallVector<T, N> explicitly to >> bypass this, but the need for that seems rare. >> >> >> > B. SmallVector<T, 0> feels "weird to write" for most folks, >> even though it is frequently the right choice. Vec<T> mitigates that by >> "looking natural". >> >> >> > >> >> >> > I'm surfacing this as an RFC to get feedback on a couple >> higher-level points: >> >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to have? >> >> >> > - get wider consensus around suggesting these as "defaults" (see >> my updates to ProgrammersManual.rst in the patch) >> >> >> > - how much we want to bulk migrate code vs let it organically >> grow. Replacing SmallVector<T, 0> with Vec<T> should be completely >> mechanical. Replacing SmallVector<T, N> for general N would be a lot more >> work. >> >> >> > - of course: naming. SVector/Vector were floated in the patch as >> well and seem ok. SmallVec was rejected as it was a prefix of SmallVector >> (messes up autocomplete). >> >> >> > >> >> >> > Looking forward to a world with fewer guessed SmallVector sizes, >> >> >> > >> >> >> > -- Sean Silva >> >> >> > _______________________________________________ >> >> >> > LLVM Developers mailing list >> >> >> > llvm-dev at lists.llvm.org >> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/8ae25c3b/attachment-0001.html>
Mehdi AMINI via llvm-dev
2020-Nov-17 02:44 UTC
[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
On Mon, Nov 16, 2020 at 6:14 PM Sean Silva <chisophugis at gmail.com> wrote:> > > On Mon, Nov 16, 2020 at 4:48 PM Mehdi AMINI <joker.eph at gmail.com> wrote: > >> >> >> On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com> wrote: >> >>> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote: >>> > >>> > >>> > >>> > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com> >>> wrote: >>> >> >>> >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com> >>> wrote: >>> >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com> >>> wrote: >>> >> >> >>> >> >> I will say I'm not a huge fan of adding even more names for things >>> in >>> >> >> this fairly core/common use case (now we'll have even more vector >>> >> >> names to pick from) - can we use default template arguments so we >>> can >>> >> >> write SmallVector<T> instead of SmallVector<T, N> and would that >>> >> >> address some of the use cases proposed here? >>> >> > >>> >> > >>> >> > I won't claim it is perfect, but the added names are a compromise >>> over rounds of reviews with the folks in the revision. In particular I'm >>> quite concerned that a default value for N on the SmallVector does not >>> carry the intent the same way, and is too easy to miss in review (or while >>> reading code). >>> >> >>> >> I'm not quite following here - what sort of problems do you anticipate >>> >> by readers missing the default value for N? >>> > >>> > >>> > Reading code `SmallVector<SomeType>` does not express that it is >>> *intentional* to leave of the value for N. It can easily be just forgotten, >>> and it could easily be implicitly `0`, and as a reviewer (or code reader >>> later) I don't know if this is was intentional or not. This is why I am >>> quite opposed to "loosen" the current requirement on SmallVector: a >>> different name for a different intent is better fitting to me. >>> >>> I don't know why it being implicitly zero is noteworthy - anymore than >>> it being implicitly 1 or 3, etc. >>> >>> As for whether it's intentional: I think if we're moving in this >>> direction that's proposed, the idea is that by default we don't want >>> to be explicit about the N - so in general we won't be. And sometimes >>> someone will want to be explicit and add an N, but otherwise the >>> reasonable default is not to have an N. I think the intent is clear >>> enough - consider other default template parameters like customized >>> deleters on a unique_ptr: I don't wonder if someone intended to have a >>> customized deleter on every unique_ptr, one assumes that wasn't >>> required/intended unless it's added. It seems like the idea is for >>> non-explicit N to be a safe/common default, and explicit N to be >>> noteworthy/require some justification or scrutiny. >>> >> >> I get your point, but I don't share the conclusion here: I don't believe >> that universally not specifying the N on non-trivial types is a good >> default. It only makes sense to me for small types, hence it is error prone. >> >> >> >>> >>> >> > To me the drawbacks are outweighing the benefits too much. >>> >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, >>> without an easy way to figure it out that there is no actual inline storage >>> while reading the code. >>> >> >>> >> Why would it be necessary to figure that out? If we generally feel the >>> >> right default inline storage happens to be zero in that case and that >>> >> SmallVector will pick a good default (including possibly 0) that seems >>> >> like a good thing to me. (why would we call out zero specially, if we >>> >> want to avoid calling out other values explicitly) >>> > >>> > >>> > I think this is an API that is "easy to misuse", I don't see any >>> advantage to it while it has drawbacks: why would we do that? >>> >>> What kind of misuse do you have in mind? >> >> >> It does not convey the intent, it is too easy to misuse: i.e. you expect >> a small storage but you don't have one. This is enough to me to outweigh >> the benefits of the direction and prefer the status quo. >> >> >> >>> To me it seems like it would >>> be consistent with the idea that we have built rules about what good >>> default inline sizes are - and there's no need for us to think about >>> it on every use, we just let the default do what it's meant to do. >>> >>> >> > An alternative was to reserve the default to only "small object" so >>> that N isn't zero, but there isn't a platform independent way of doing that >>> and keep the code portable I believe. So `SVec<T>` is really saying: "I am >>> willing to pay for some limite inline storage if possible but I don't have >>> a N in mind". >>> >> >>> >> That quoted statement still sounds like it could encapsulate the >>> >> possibility of 0 too, though. "limited inline storage if possible" >>> >> could be "and the answer to that constraint is zero in this case - but >>> >> if you happen to make the T smaller, it could become non-zero >>> >> organically/without the need to touch this code" (as the N could vary >>> >> organically between non-zero values as struct sizes change too) >>> > >>> > Yes the quoted statement says exactly that! >>> >>> Sorry, it seems we're jumbling up the two issues: Whether the type >>> name should be different when asking for implicit default inline >>> storage length and, separately, whether explicit zero length storage >>> should use a different name. The discussion above, I thought, was >>> about the latter, not the former - but your response seeems to be >>> about the former rather than the latter. >> >> >> You're answering my quote about SVec, which is the type that is about >> implicit N, Vec (no S!) is about the N=0 case. The quote is also about the >> inline storage implicit N, it isn't clear to me how you link this to the 0 >> case (which does not have inline storage)? >> >> >> >>> >> >> >>> Two separate discussions/threads/patches may help keep the >>> communication more clear. >>> >>> > This is why I don't like having this behavior on SmallVector: this is >>> a different semantics / intent that the programmers have and this is >>> something that I like being able to read differently. >>> > SVec does not accept a `N`: if I read SVec<Type> there can't be a >>> possible accidental omission of N here. It has to be a choice between >>> `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not >>> `SmallVector<SomeType>`. >>> >>> To discuss this issue of whether accepting a default size versus >>> having an explicit size is a different semantic - as above, I think >>> it's worth considering/comparing this to other templates and their >>> default template arguments. Things like std::vector's customizable >>> allocators (you. don't have to use a different name for the container >>> when you specify a custom allocation scheme for std::vector - but we >>> don't wonder every time we see std::vector<T> whether the user meant >>> to customize the allocator - we accept the default is usualy intended >>> (as the default buffer size would be usually intended) unless it's >>> specified - if during review we think a custom allocator (or >>> non-default buffer size) might be merited, we could ask about it - we >>> probably would even if the user had written SVec<T> the same way we >>> ask about other data structures today "did you mean SmallVector<T, >>> 3>"?), unique_ptr's custom deleters, etc. >>> >> >> I don't necessarily connect with the custom allocator / custom delete >> analogy right now, they seem too different to me. >> I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and >> `std::unordered_map` which have different class names and aren't just >> differentiated with a trait passed as template argument. >> >> >>> >>> >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T, >>> 0>` is because it is frequently preferable to `std::vector` but still isn't >>> readable or immediately intuitive and so is rarely used in practice (see >>> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h >>> for the documented points on N=0). >>> >> >>> >> I'm not sure llvm::Vector/Vec would necessarily make it more often >>> >> used in practice. Maybe a bunch of cleanup and more specific wording >>> >> in the style guide that would ban std::vector might help more there. >>> >> >>> >> Though I agree that it may be suitable to have a clear name for >>> >> SmallVector<T, 0> since the "Small" is slightly misleading (though not >>> >> entirely - it's important for the "SmallVectorImpl" benefits of >>> >> SmallVector - so renaming it to Vector and still using it via >>> >> SmallVectorImpl<T>& might also be confusing). >>> >> >>> >> >> Got any sense of the total value here? Major savings to be had? >>> >> >> >>> >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that >>> leaves >>> >> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0> >>> >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm >>> >> >> opportunities? Though that's rarely needed compared to ArrayRef.) >>> >> >> If SmallVector<T> would suffice then maybe Vec<T> could be >>> >> >> ZeroSmallVector<T>? Not sure. >>> >> > >>> >> > >>> >> > ZeroSmallVector<T> does not really address your "more vector names >>> to pick from" concerns, >>> >> >>> >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we >>> >> have one fewer name and less convention to base the Vec<T> on (since >>> >> it won't have the SVec<T> sibling), so picking a name closer to the >>> >> existing SmallVector might be more viable/helpful. >>> >> >>> >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the >>> "default case" to be the easiest to reach / most intuitive to pick? >>> `llvm::Vec` looks like "just a vector". >>> >> >>> >> Somewhat - but Vec is an abbreviation that isn't really used otherwise >>> >> (if we consider SVec as well, I'd still push back on the introduction >>> >> of the abbreviation for both cases) >>> > >>> > One aspect of the naming is to avoid one not being a prefix of the >>> other (IDE completion, etc.) or them being too close to differentiate. >>> > >>> >> and llvm::Vector would be only a >>> >> case separation away form a standard name (when considering the >>> >> unqualified name - which is likely to come up a fair bit, as we'll see >>> >> "Vector" used unqualified a lot). >>> > >>> > Note: this is why we converged with llvm::Vec and not llvm::Vector in >>> the revision >>> >>> Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity >>> with standard names, etc). Perhaps a bit of a discussion of what other >>> libraries do around this (no doubt there are a bunch of C++ libraries >>> that have "here's the default vector you shuold use instead of >>> std::vector for these reasons" situations). >>> >> >> I had looked into this when the revision was sent, but I couldn't find a >> library with a "vector with inlined storage" and a default N value. >> Both Abseil and Folly have a SmallVector equivalent, Abseil does not have >> a default: >> https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71 >> Folly has a default hard-coded to 1: >> https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430 >> Folly has also another class, FBVector, which is really an alternative to >> std::vector with a different memory management strategy (growth factor, >> etc.). I don't think Abseil has anything else. >> >> Do you have other ideas of related work to look for? >> > > Actually, the reason I originally pursued this was that I used a > SmallDenseMap and happily found that it had a default number of inline > elements: > https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877 > I thought "why not SmallVector, since I use that a lot more often"? But > based on historical misuse of SmallVector, using a hardcoded number like > "4" (like SmallDenseMap does) as the default doesn't make much sense, so I > pursued a bounded-sizeof approach. > > >> >> >> >>> >>> >> I wouldn't entirely object to SmallVector<T> getting a smart default >>> >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I >>> >> don't feel /super/ great about it, but yeah, if we're going to push >>> >> the Programmer's Manual encouragement further in terms of >>> >> reducing/removing use of std::vector, then maybe llvm::Vector isn't >>> >> the worst way to do that. >>> >> >>> >> (it might be that this would benefit from being two separate >>> >> discussions, though - one on how to provide smart defaults for >>> >> SmallVector<T> and one on if/how to push std::vector deprecation in >>> >> favor of SmallVector<T, 0> (semantically speaking - no matter how it's >>> >> spelled) further along) >>> > >>> > >>> > For historical purpose: the review was actually originally only adding >>> a default for SmallVector and nothing else :) >>> > We only converged to the current proposal after a few rounds of >>> reviews trying to tradeoff amongst folks in the review. >>> >>> Perhaps you could summarize some of those discussions/decisions in >>> more detail here? As the RFC stands, these are my comments - I know >>> it's always a tradeoff of which audiences are brought in at what >>> stages (though often folks send ADT/Support changes my way during >>> review - I should probably just set up a Herald rule for these >>> things). >>> >>> Sounds like maybe I am in agreement with the original version of the >>> review, then. >>> >> >> Likely: the review evolved because I opposed to changing the status quo >> in this direction I guess. >> > > It sounds like there is pretty good consensus here to just let > `SmallVector<T>` "just work" and choose a default N. The one exception > seems to be that Mehdi has a strong objection having the default N possibly > be 0 (to ensure the bounded-sizeof property). So let's shift the discussion > with a laser focus to specifically that point. > > Responding to the arguments I've seen presented so far: > 1. I'm not convinced that SmallVector<T> with defaulted N==0 is really > error prone. Mehdi, I like your definition of "I am willing to pay for some > limite inline storage if possible but I don't have a N in mind" and I think > we can apply it to SmallVector<T>; I don't think adding SVec<T> is needed > for that. >I understand your point, I still strongly disagree for the reason previously discussed: in particular I don't understand why applying it to SmallVector would be better than what we converged on with SVec.> 2. I don't understand the concern about "forgetting the N". In the "new > world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be > treated as "I profiled this code and this explicit N is the right choice". > That is, adding an explicit `N` is going to be rare and deliberate > (possibly with reviewers insisting on a comment!) -- this is not something > that one can "forget". >> For point 2., I think we will have some transition period where reviewers > will organically start to propagate the new advice (and patch authors > enjoying not having to write N), but I think the direction is effectively > what I stated. > > -- Sean Silva > > >> >> Best, >> >> -- >> Mehdi >> >> >> >> >>> >>> - Dave >>> >>> >>> >> > >>> >> > Cheers, >>> >> > >>> >> > -- >>> >> > Mehdi >>> >> > >>> >> > >>> >> >> >>> >> >> >>> >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev >>> >> >> <llvm-dev at lists.llvm.org> wrote: >>> >> >> > >>> >> >> > We've pretty happy now with a patch that adds two wrappers >>> around SmallVector that make it 1) more convenient to use and 2) will tend >>> to mitigate misuse of SmallVector. We think it's ready for wider >>> discussion: https://reviews.llvm.org/D90884 >>> >> >> > >>> >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N >>> chosen automatically to keep its size under 64 Bytes (that heuristic is >>> easy to change though). The recommendation in the patch is to use this "on >>> the stack, where a "small" number of elements are expected". >>> >> >> > >>> >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us >>> get the (little-known?) benefits of SmallVector even when it has no inline >>> elements (see >>> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). >>> The recommendation in the patch is to use this when the SmallVector is on >>> the heap. >>> >> >> > >>> >> >> > A lot of this is boiled out from the discussion in >>> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1 >>> >> >> > >>> >> >> > The goals here are twofold: >>> >> >> > >>> >> >> > 1. convenience: not having to read/write "N", or do an extra >>> edit/recompile cycle if you forgot it >>> >> >> > >>> >> >> > 2. avoiding pathological cases: The choice of N is usually >>> semi-arbitrary in our experience, and if one isn't careful, can result in >>> sizeof(SmallVector) becoming huge, especially in the case of nested >>> SmallVectors. This patch avoids pathological cases in two ways: >>> >> >> > A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which >>> prevents pathological size amplifications like in `SmallVector<struct >>> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply >>> together. Of course, users can always write SmallVector<T, N> explicitly to >>> bypass this, but the need for that seems rare. >>> >> >> > B. SmallVector<T, 0> feels "weird to write" for most folks, >>> even though it is frequently the right choice. Vec<T> mitigates that by >>> "looking natural". >>> >> >> > >>> >> >> > I'm surfacing this as an RFC to get feedback on a couple >>> higher-level points: >>> >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to >>> have? >>> >> >> > - get wider consensus around suggesting these as "defaults" (see >>> my updates to ProgrammersManual.rst in the patch) >>> >> >> > - how much we want to bulk migrate code vs let it organically >>> grow. Replacing SmallVector<T, 0> with Vec<T> should be completely >>> mechanical. Replacing SmallVector<T, N> for general N would be a lot more >>> work. >>> >> >> > - of course: naming. SVector/Vector were floated in the patch as >>> well and seem ok. SmallVec was rejected as it was a prefix of SmallVector >>> (messes up autocomplete). >>> >> >> > >>> >> >> > Looking forward to a world with fewer guessed SmallVector sizes, >>> >> >> > >>> >> >> > -- Sean Silva >>> >> >> > _______________________________________________ >>> >> >> > LLVM Developers mailing list >>> >> >> > llvm-dev at lists.llvm.org >>> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/3b93a2ad/attachment.html>
Apparently Analagous Threads
- RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
- RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
- RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
- RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.
- RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.