Zachary Turner via llvm-dev
2016-Nov-02 22:54 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
* UDL Syntax is removed in the latest version of the patch <https://reviews.llvm.org/D25587>. * Name changed to `formatv` since `format_string` is too much to type. * Added conversion operators for `std::string` and `llvm::SmallString`. I had some feedback offline (not on this thread, unfortunately) that it might be worth using a printf style syntax instead of this Python-esque syntax. FTR, I actually somewhat object to this, for a couple of reasons: 1) It makes back-reference syntax ugly. "{0} {1} {0}" is much clearer to me than "%0$ %1$ %0$". The latter syntax is also not a very well known feature of printf and so unlikely to be used by people with a printf-style implementation, whereas it's un-missable with the python-style syntax. 2) I don't see why we should need to specify the type of the argument with %d if the compiler knows it's an integer. Even if the we can add compile-time checking to make it error, it seems unnecessary to even encounter this situation in the first place. I believe the compiler should simply format what you give it. 3) One of the most useful aspects of the current approach is the ability to plug in custom formatters for application specific data types. This is not straightforward with a printf-style syntax. You might be able to hook up a template-specialization like mechanic to the processing of %s (similar to my current approach), but it's not obvious how you proceed from there to get custom format strings for individual types. For example, a formatter which can print a TimeSpan in different units depending on style options you pass in. This is especially useful when trying to print ranges where you often want to be able to specify a different separator, or control the formatting of the underlying type. (e.g. it's not clear how you would elegantly format a range of integers in hex using this style of approach). I'm open to feedback here, so if you have an opinion one way or the other, please LMK. On Tue, Nov 1, 2016 at 5:39 AM Zachary Turner <zturner at google.com> wrote:> The big problem i see is that to get compile time checking without the UDL > we're going to have to do something like FORMAT_STRING("{0}") where this is > a macro. It just seems really gross. It is true that it is harder to find > the documentation, but that could be alleviated by putting all of this in > its own namespace like llvm::formatv, then one could search the namespace > On Mon, Oct 31, 2016 at 11:41 PM Sean Silva <chisophugis at gmail.com> wrote: > > On Mon, Oct 31, 2016 at 5:21 PM, Chandler Carruth via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > On Mon, Oct 31, 2016 at 3:46 PM Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > Tentatively final version is up here: https://reviews.llvm.org/D25587 > > It has a verbal LGTM, but I plan to wait a bit longer just in case anyone > has some additional thoughts. It's a large patch, but if you're > interested, one way you can help without doing a full-blown review is to > look at the large comment blocks in FormatVariadic.h and > FormatProviders.h. Here I provide a formal description of the grammar of > the replacement sequences and format syntax. So you can look at this > without looking at the code behind it and see if you have comments just on > the format language. > > Here's a summary of (most) everything contained in this patch: > > 1) UDL Syntax for outputting to a stream or converting to a string. > outs() << "{0}"_fmt.stream(1) > std::string S = "{0}"_fmt.string(1); > > > I continue to have a strong objection to using UDLs for this (or anything > else in LLVM). > > I think this feature is poorly known by many programmers. I think it will > produce error messages that are confusing and hard to debug. I think it > will have a significant negative impact on compile time. I also think that > it will exercise substantially less well tested parts of every host > compiler for LLVM and subject us to an increased rate of mysterious host > compiler bugs. > > I also think it forces programmers to be aware of a "magical" construct > that doesn't really fit with the rest of the language. > > It isn't that any of these issues in isolation cannot be overcome, it is > that I think the value provided by the UDL specifically is substantially > smaller than the cost. > > I would *very strongly* prefer that this is accomplished with "normal" C++ > syntax, and that compile time checking is done with constexpr when > available. I think that will give the overwhelming majority of the benefit > with dramatically lower cost. > > > +1, the UDL seems a bit too automagical. > `format_string("{0}", 1)` is not that much longer than > `"{0}"_fmt.string(1)`, but significantly less magical. > > Simple example: what should I type into a search engine to find the LLVM > doxygen for the UDL? I know to search "llvm format_string" for the format > string, but just from looking at a use of the UDL syntax it might not be > clear that format_string is even called. > > -- Sean Silva > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161102/3f9650a0/attachment.html>
Zachary Turner via llvm-dev
2016-Nov-07 17:05 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
Wanted to ping again on this now that the LLVM Developer Conference is over and people are (presumably) back to normal. chandlerc@: Are your concerns sufficiently addressed? Anyone: Does anyone feel strongly that the proposed syntax is or is not the way to go, based on the arguments outlined previously? On Wed, Nov 2, 2016 at 3:54 PM Zachary Turner <zturner at google.com> wrote:> * UDL Syntax is removed in the latest version of the patch > <https://reviews.llvm.org/D25587>. > * Name changed to `formatv` since `format_string` is too much to type. > * Added conversion operators for `std::string` and `llvm::SmallString`. > > I had some feedback offline (not on this thread, unfortunately) that it > might be worth using a printf style syntax instead of this Python-esque > syntax. FTR, I actually somewhat object to this, for a couple of reasons: > > 1) It makes back-reference syntax ugly. "{0} {1} {0}" is much clearer to > me than "%0$ %1$ %0$". The latter syntax is also not a very well known > feature of printf and so unlikely to be used by people with a printf-style > implementation, whereas it's un-missable with the python-style syntax. > > 2) I don't see why we should need to specify the type of the argument with > %d if the compiler knows it's an integer. Even if the we can add > compile-time checking to make it error, it seems unnecessary to even > encounter this situation in the first place. I believe the compiler should > simply format what you give it. > > 3) One of the most useful aspects of the current approach is the ability > to plug in custom formatters for application specific data types. This is > not straightforward with a printf-style syntax. > > You might be able to hook up a template-specialization like mechanic to > the processing of %s (similar to my current approach), but it's not obvious > how you proceed from there to get custom format strings for individual > types. For example, a formatter which can print a TimeSpan in different > units depending on style options you pass in. This is especially useful > when trying to print ranges where you often want to be able to specify a > different separator, or control the formatting of the underlying type. > (e.g. it's not clear how you would elegantly format a range of integers in > hex using this style of approach). > > I'm open to feedback here, so if you have an opinion one way or the other, > please LMK. > > > On Tue, Nov 1, 2016 at 5:39 AM Zachary Turner <zturner at google.com> wrote: > > The big problem i see is that to get compile time checking without the UDL > we're going to have to do something like FORMAT_STRING("{0}") where this is > a macro. It just seems really gross. It is true that it is harder to find > the documentation, but that could be alleviated by putting all of this in > its own namespace like llvm::formatv, then one could search the namespace > On Mon, Oct 31, 2016 at 11:41 PM Sean Silva <chisophugis at gmail.com> wrote: > > On Mon, Oct 31, 2016 at 5:21 PM, Chandler Carruth via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > On Mon, Oct 31, 2016 at 3:46 PM Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > Tentatively final version is up here: https://reviews.llvm.org/D25587 > > It has a verbal LGTM, but I plan to wait a bit longer just in case anyone > has some additional thoughts. It's a large patch, but if you're > interested, one way you can help without doing a full-blown review is to > look at the large comment blocks in FormatVariadic.h and > FormatProviders.h. Here I provide a formal description of the grammar of > the replacement sequences and format syntax. So you can look at this > without looking at the code behind it and see if you have comments just on > the format language. > > Here's a summary of (most) everything contained in this patch: > > 1) UDL Syntax for outputting to a stream or converting to a string. > outs() << "{0}"_fmt.stream(1) > std::string S = "{0}"_fmt.string(1); > > > I continue to have a strong objection to using UDLs for this (or anything > else in LLVM). > > I think this feature is poorly known by many programmers. I think it will > produce error messages that are confusing and hard to debug. I think it > will have a significant negative impact on compile time. I also think that > it will exercise substantially less well tested parts of every host > compiler for LLVM and subject us to an increased rate of mysterious host > compiler bugs. > > I also think it forces programmers to be aware of a "magical" construct > that doesn't really fit with the rest of the language. > > It isn't that any of these issues in isolation cannot be overcome, it is > that I think the value provided by the UDL specifically is substantially > smaller than the cost. > > I would *very strongly* prefer that this is accomplished with "normal" C++ > syntax, and that compile time checking is done with constexpr when > available. I think that will give the overwhelming majority of the benefit > with dramatically lower cost. > > > +1, the UDL seems a bit too automagical. > `format_string("{0}", 1)` is not that much longer than > `"{0}"_fmt.string(1)`, but significantly less magical. > > Simple example: what should I type into a search engine to find the LLVM > doxygen for the UDL? I know to search "llvm format_string" for the format > string, but just from looking at a use of the UDL syntax it might not be > clear that format_string is even called. > > -- Sean Silva > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/367bc3a5/attachment.html>
Chandler Carruth via llvm-dev
2016-Nov-07 17:38 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
My high-level design concerns are addressed. I've made some nit-picky code comments on the patch. On Mon, Nov 7, 2016 at 9:05 AM Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Wanted to ping again on this now that the LLVM Developer Conference is > over and people are (presumably) back to normal. > > chandlerc@: Are your concerns sufficiently addressed? > Anyone: Does anyone feel strongly that the proposed syntax is or is not > the way to go, based on the arguments outlined previously? > > On Wed, Nov 2, 2016 at 3:54 PM Zachary Turner <zturner at google.com> wrote: > > * UDL Syntax is removed in the latest version of the patch > <https://reviews.llvm.org/D25587>. > * Name changed to `formatv` since `format_string` is too much to type. > * Added conversion operators for `std::string` and `llvm::SmallString`. > > I had some feedback offline (not on this thread, unfortunately) that it > might be worth using a printf style syntax instead of this Python-esque > syntax. FTR, I actually somewhat object to this, for a couple of reasons: > > 1) It makes back-reference syntax ugly. "{0} {1} {0}" is much clearer to > me than "%0$ %1$ %0$". The latter syntax is also not a very well known > feature of printf and so unlikely to be used by people with a printf-style > implementation, whereas it's un-missable with the python-style syntax. > > 2) I don't see why we should need to specify the type of the argument with > %d if the compiler knows it's an integer. Even if the we can add > compile-time checking to make it error, it seems unnecessary to even > encounter this situation in the first place. I believe the compiler should > simply format what you give it. > > 3) One of the most useful aspects of the current approach is the ability > to plug in custom formatters for application specific data types. This is > not straightforward with a printf-style syntax. > > You might be able to hook up a template-specialization like mechanic to > the processing of %s (similar to my current approach), but it's not obvious > how you proceed from there to get custom format strings for individual > types. For example, a formatter which can print a TimeSpan in different > units depending on style options you pass in. This is especially useful > when trying to print ranges where you often want to be able to specify a > different separator, or control the formatting of the underlying type. > (e.g. it's not clear how you would elegantly format a range of integers in > hex using this style of approach). > > I'm open to feedback here, so if you have an opinion one way or the other, > please LMK. > > > On Tue, Nov 1, 2016 at 5:39 AM Zachary Turner <zturner at google.com> wrote: > > The big problem i see is that to get compile time checking without the UDL > we're going to have to do something like FORMAT_STRING("{0}") where this is > a macro. It just seems really gross. It is true that it is harder to find > the documentation, but that could be alleviated by putting all of this in > its own namespace like llvm::formatv, then one could search the namespace > On Mon, Oct 31, 2016 at 11:41 PM Sean Silva <chisophugis at gmail.com> wrote: > > On Mon, Oct 31, 2016 at 5:21 PM, Chandler Carruth via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > On Mon, Oct 31, 2016 at 3:46 PM Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi all, > > Tentatively final version is up here: https://reviews.llvm.org/D25587 > > It has a verbal LGTM, but I plan to wait a bit longer just in case anyone > has some additional thoughts. It's a large patch, but if you're > interested, one way you can help without doing a full-blown review is to > look at the large comment blocks in FormatVariadic.h and > FormatProviders.h. Here I provide a formal description of the grammar of > the replacement sequences and format syntax. So you can look at this > without looking at the code behind it and see if you have comments just on > the format language. > > Here's a summary of (most) everything contained in this patch: > > 1) UDL Syntax for outputting to a stream or converting to a string. > outs() << "{0}"_fmt.stream(1) > std::string S = "{0}"_fmt.string(1); > > > I continue to have a strong objection to using UDLs for this (or anything > else in LLVM). > > I think this feature is poorly known by many programmers. I think it will > produce error messages that are confusing and hard to debug. I think it > will have a significant negative impact on compile time. I also think that > it will exercise substantially less well tested parts of every host > compiler for LLVM and subject us to an increased rate of mysterious host > compiler bugs. > > I also think it forces programmers to be aware of a "magical" construct > that doesn't really fit with the rest of the language. > > It isn't that any of these issues in isolation cannot be overcome, it is > that I think the value provided by the UDL specifically is substantially > smaller than the cost. > > I would *very strongly* prefer that this is accomplished with "normal" C++ > syntax, and that compile time checking is done with constexpr when > available. I think that will give the overwhelming majority of the benefit > with dramatically lower cost. > > > +1, the UDL seems a bit too automagical. > `format_string("{0}", 1)` is not that much longer than > `"{0}"_fmt.string(1)`, but significantly less magical. > > Simple example: what should I type into a search engine to find the LLVM > doxygen for the UDL? I know to search "llvm format_string" for the format > string, but just from looking at a use of the UDL syntax it might not be > clear that format_string is even called. > > -- Sean Silva > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/37608066/attachment.html>
James Y Knight via llvm-dev
2016-Nov-07 21:57 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Wed, Nov 2, 2016 at 3:54 PM, Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> * UDL Syntax is removed in the latest version of the patch > <https://reviews.llvm.org/D25587>. > * Name changed to `formatv` since `format_string` is too much to type. > * Added conversion operators for `std::string` and `llvm::SmallString`. > > I had some feedback offline (not on this thread, unfortunately) that it > might be worth using a printf style syntax instead of this Python-esque > syntax. FTR, I actually somewhat object to this, for a couple of reasons: > > 1) It makes back-reference syntax ugly. "{0} {1} {0}" is much clearer to > me than "%0$ %1$ %0$". The latter syntax is also not a very well known > feature of printf and so unlikely to be used by people with a printf-style > implementation, whereas it's un-missable with the python-style syntax >Don't forget the trailing "s" (or whatever) -- "%0$s %1$s %0$s". (The possibility to forget the trailing letter is a downside of the printf syntax, certainly.) 2) I don't see why we should need to specify the type of the argument with> %d if the compiler knows it's an integer. Even if the we can add > compile-time checking to make it error, it seems unnecessary to even > encounter this situation in the first place. I believe the compiler should > simply format what you give it. >Well, it would seem fine to me if "%s" converted integers/floats/etc to a string for you as well. What %d really gives you is ability to specify the numeric formatting options instead of the string formatting options.> 3) One of the most useful aspects of the current approach is the ability > to plug in custom formatters for application specific data types. This is > not straightforward with a printf-style syntax. >It is true, printf style formatting doesn't allow something like this: formatv("{0:DD/MM/YYYY hh:mm:ss}", std::chrono::system_clock::now()); On the other hand, what printf does have is a single description of what formatting strings are valid. With your proposed system, you need to know what type the argument is, in order to know what the D in "{0:D}" is going to mean. IMO, that the printf syntax is both well-known and NOT extensible seems an advantage to me, not a disadvantage. You might be able to hook up a template-specialization like mechanic to the> processing of %s (similar to my current approach), but it's not obvious how > you proceed from there to get custom format strings for individual types. > For example, a formatter which can print a TimeSpan in different units > depending on style options you pass in. This is especially useful when > trying to print ranges where you often want to be able to specify a > different separator, or control the formatting of the underlying type. > (e.g. it's not clear how you would elegantly format a range of integers in > hex using this style of approach). >It is entirely unclear to me that putting everything in a format string: formatv("Here's a range: {0:$[ + ]@[x]}", range); is better than composing functions with usual function-call syntax, e.g. something like this: format("Here's a range: %s", Join(range, " + ", Formatter("%x"))); I think that's the meat of the disagreement: I'd prefer to just see a safe printf-replacement; something that's able to basically drop-in replace C printf, nothing super fancy. I don't see the justification for being able to specify everything you'd ever want to be able to do directly in a complex format string language. On the other hand, you see value in being able to specify the entirety of the output in the format string, and aren't concerned about the syntax being new and complicated. That said -- it doesn't appear that my point of view is widely held, and that's fine -- this is a matter of opinion, not right or wrong. So, continue on. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/e3322b64/attachment.html>
Justin Bogner via llvm-dev
2016-Nov-07 22:47 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
James Y Knight via llvm-dev <llvm-dev at lists.llvm.org> writes:>> You might be able to hook up a template-specialization like mechanic to the >> processing of %s (similar to my current approach), but it's not obvious how >> you proceed from there to get custom format strings for individual types. >> For example, a formatter which can print a TimeSpan in different units >> depending on style options you pass in. This is especially useful when >> trying to print ranges where you often want to be able to specify a >> different separator, or control the formatting of the underlying type. >> (e.g. it's not clear how you would elegantly format a range of integers in >> hex using this style of approach). >> > > It is entirely unclear to me that putting everything in a format string: > formatv("Here's a range: {0:$[ + ]@[x]}", range); > is better than composing functions with usual function-call syntax, e.g. > something like this: > format("Here's a range: %s", Join(range, " + ", Formatter("%x"))); > > I think that's the meat of the disagreement: I'd prefer to just see a safe > printf-replacement; something that's able to basically drop-in replace C > printf, nothing super fancy. I don't see the justification for being able > to specify everything you'd ever want to be able to do directly in a > complex format string language. > > On the other hand, you see value in being able to specify the entirety of > the output in the format string, and aren't concerned about the syntax > being new and complicated. > > That said -- it doesn't appear that my point of view is widely held, and > that's fine -- this is a matter of opinion, not right or wrong. So, > continue on. :)FWIW, I'm also not entirely sold that we need a complex formatting language here. The printf modifiers are easy to remember and are good enough 90% of the time, whereas with something like this I feel like I'd need to look up the syntax every time I used it. Like James though, I'm fine with conceding to the majority on this one.