Zachary Turner via llvm-dev
2016-Oct-12 01:22 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway: 1. *Not type-safe.* Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have. If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile! 2. *Not security safe. *Functions like sprintf() will happily smash your stack for you if you're not careful. 3. *Not portable (well kinda). *Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh. 4. *Redundant.* If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage. 5. *Not flexible.* How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way. So I've been working on a library that will solve all of these problems and more. The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples: 1. os << format_string("Test"); // writes "test" 2. os << format_string("{0}", 7); // writes "7" Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int. 3. os << format_string("{0} {0}", 7); // writes "7 7" #3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary. 4. os << format_string("{0:X}", 255); // writes "0xFF" 5. os << format_string("{0:X7}", 255); // writes "0x000FF" 6. os << format_string("{0}", foo_object); // fails to compile! Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile. However, you can always define custom formatters for your own types. If you write: namespace llvm { template<> struct format_provider<Foo> { static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) { } }; } Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples: 7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas. 8. os << format_string("{0:P}", 0.76); // Writes "76.00%" You can also left justify and right justify. For example: 9. os << format_string("{0,8:P}", 0.76); // Writes " 76.00%" 10. os << format_string("{0,-8,P}", 0.76); // Writes "76.00% " And you can also format complicated types. For example: 11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}", std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11" I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc. To summarize, the advantages of this approach are: 1) *Safe.* If it can't format your type, it won't even compile. 2) *Concise.* You can re-use parameters multiple times without re-specifying them. 3) *Simple. *You don't have to remember whether to use %llu or PRIx64 or %z, because format specifiers don't exist! 4) *Flexible.* You can format types in a multitude of different ways while still having the nice format-string style syntax. 5) *Extensible.* If you don't like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you'd like to be able to format, you can add formatting support for it in multiple different ways. I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well! Thanks, Zach -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/f47cc270/attachment.html>
Mehdi Amini via llvm-dev
2016-Oct-12 03:59 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
Hi, I On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway: > > 1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have.I’m not very sensitive to the “not all compilers have” argument, however it is worth mentioning that the format may not be a string literal, which defeat the “sanitizer”.> If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!llvm::format now fails to compile as well :) However this does not address other issues, like: `format(“%d”, float_var)`> > 2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful. > > 3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh. > > 4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage. > > 5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.> So I've been working on a library that will solve all of these problems and more.Great! I appreciate the effort, and talking about that with Duncan last week he was mentioning that we should do it :)> > The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples: > > 1. os << format_string("Test"); // writes "test" > 2. os << format_string("{0}", 7); // writes "7" > > Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int. > > 3. os << format_string("{0} {0}", 7); // writes "7 7" > > #3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary.What about: printf(“%0$ %0$”, 7);> > 4. os << format_string("{0:X}", 255); // writes "0xFF" > 5. os << format_string("{0:X7}", 255); // writes "0x000FF" > 6. os << format_string("{0}", foo_object); // fails to compile! > > Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile. > > However, you can always define custom formatters for your own types. If you write: > > namespace llvm { > template<> > struct format_provider<Foo> { > static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) { > } > }; > } > > Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples: > > 7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas.Why add commas? Because of the “:N”? This seems like localization-dependent: how do you handle that? What happens with the following? os << format_string("{0:N}", -123.455);> 8. os << format_string("{0:P}", 0.76); // Writes "76.00%" > > You can also left justify and right justify. For example: > > 9. os << format_string("{0,8:P}", 0.76); // Writes " 76.00%" > 10. os << format_string("{0,-8,P}", 0.76); // Writes "76.00% " > > And you can also format complicated types. For example: > > 11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}", std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11”11 looks pretty cool in terms of flexibility :)> I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc. > > To summarize, the advantages of this approach are: > > 1) Safe. If it can't format your type, it won't even compile. > 2) Concise. You can re-use parameters multiple times without re-specifying them. > 3) Simple. You don't have to remember whether to use %llu or PRIx64 or %z, because format specifiers don't exist! > 4) Flexible. You can format types in a multitude of different ways while still having the nice format-string style syntax. > 5) Extensible. If you don't like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you'd like to be able to format, you can add formatting support for it in multiple different ways. > > I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well!Feel free to add me as a reviewer! — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161011/8e2d639a/attachment.html>
Zachary Turner via llvm-dev
2016-Oct-12 04:18 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Tue, Oct 11, 2016 at 8:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:> Hi, > > I On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > A while back llvm::format() was introduced that made it possible to > combine printf-style formatting with llvm streams. However, this still > comes with all the risks and pitfalls of printf. Everyone is no-doubt > familiar with these problems, but here are just a few anyway: > > 1. *Not type-safe.* Not all compilers warn when you mess up the format > specifier. And when you're writing your own Printf-like functions, you > need to tag them with __attribute__(format, printf) which again not all > compilers have. > > > I’m not very sensitive to the “not all compilers have” argument, however > it is worth mentioning that the format may not be a string literal, which > defeat the “sanitizer”. > > If you change a const char * to a StringRef, it can silently succeed > while passing your StringRef object to printf. It should fail to compile! > > > llvm::format now fails to compile as well :) > > However this does not address other issues, like: `format(“%d”, > float_var)` > > > 2. *Not security safe. *Functions like sprintf() will happily smash your > stack for you if you're not careful. > > 3. *Not portable (well kinda). *Quick, how do you print a size_t? You > probably said %z. Well MSVC didn't even support %z until 2015, which we > aren't even officially requiring yet. So you've gotta write (uint64_t)x > and then use PRIx64. Ugh. > > 4. *Redundant.* If you're giving it an integer, why do you need to > specify %d? It's an integer! We should be able to use the type system to > our advantage. > > 5. *Not flexible.* How do you print a std::chrono::time_point with > llvm::format()? You can't. You have to resort to providing an overloaded > streaming operator or formatting it some other way. > > > It seems to me that there is no silver bullet for that: being for > llvm::format() or your new proposal, there is some sort of glue/helpers > that need to be provided for each and every non-standard type. > > > So I've been working on a library that will solve all of these problems > and more. > > > Great! I appreciate the effort, and talking about that with Duncan last > week he was mentioning that we should do it :) > > > > The high level design of my library is borrowed heavily from C#. But if > you're not familiar with C#, I believe boost has something similar in > spirit. The best way to show it off is with some examples: > > 1. os << format_string("Test"); // writes "test" > 2. os << format_string("{0}", 7); // writes "7" > > Immediately we can see one big difference between this and llvm::format() > / printf. You don't have to specify the type. If you pass in an int, it > formats it as an int. > > 3. os << format_string("{0} {0}", 7); // writes "7 7" > > #3 is an example of something that cannot be done elegantly with printf. > Sure, you can pass it in twice, but if it's expensive to compute, this > means you have to save it into a temporary. > > > What about: printf(“%0$ %0$”, 7); >Well, umm.. I didn't even know about that. And I wonder how many others also don't. How does it choose the type? It seems there is no d in there.> > > 4. os << format_string("{0:X}", 255); // writes "0xFF" > 5. os << format_string("{0:X7}", 255); // writes "0x000FF" > 6. os << format_string("{0}", foo_object); // fails to compile! > > Here is another example of an improvement over traditional formatting > mechanisms. If you pass an object for which it cannot find a formatter, it > fails to compile. > > However, you can always define custom formatters for your own types. If > you write: > > namespace llvm { > template<> > struct format_provider<Foo> { > static void format(raw_ostream &S, const Foo &F, int Align, StringRef > Options) { > } > }; > } > > Then #6 will magically compile, and invoke the function above to do the > formatting. There are other ways to customize the formatting behavior, but > I'll keep going with some more examples: > > > 7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note > the commas. > > > Why add commas? Because of the “:N”? > This seems like localization-dependent: how do you handle that? >Yes, it is localization dependent. That being said, llvm has 0 existing support for localization. We already print floating point numbers with decimals, messages in English, etc. The purpose of this example was to illustrate that each formatter can have its own custom set of options. For the case of integral arithemtic types, those would be: X : Uppercase hex X- : Uppercase hex without the 0x prefix. x : Lowercase hex x- : Lowercase hex without the 0x prefix N : comma grouped digits E : scientific notation with uppercase E e : scientific notation with lowercase e P : percent F : fixed point But for floating point types, a different set of format specifiers would be valid (for example, it doesn't make sense to print a floating point number as hex) If you wrote your own formatter (as described earlier in #6, the field following the : would be passed in as the `Options` parameter, and the implementation is free to use it however it wants. The std::chrono formatter takes strings similar to those described in #11, for example.> > What happens with the following? > > os << format_string("{0:N}", -123.455); >You would get "-123.46" (default precision of floating point types is 2 decimal places). If you had -1234.566 it would print "-1,234.57" (you could change the precision by specifying an integer after the N. So {0:N3} would print "-1,234.566"). For integral types the "precision" is the number of digits, so if it's greater than the length of the number it would pad left with 0s. For floating point types it's the number of decimal places, so it would pad right with 0s. Of course, all these details are open for debate, that's just my initial plan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/f43b2d57/attachment.html>
Zachary Turner via llvm-dev
2016-Oct-12 04:26 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Tue, Oct 11, 2016 at 8:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:> > 5. *Not flexible.* How do you print a std::chrono::time_point with > llvm::format()? You can't. You have to resort to providing an overloaded > streaming operator or formatting it some other way. > > > It seems to me that there is no silver bullet for that: being for > llvm::format() or your new proposal, there is some sort of glue/helpers > that need to be provided for each and every non-standard type. >I only half agree with this. for llvm::format() there is no glue or helpers that can fit into the existing model. It's a wrapper around snprintf, so you get what snprintf gives you. You can go *around* llvm::format() and overload an operator to print your std::chrono::time_point, but there's no way to integrate it into llvm::format. So with my proposed library you could write: os << format_string("Start: {0}, End: {1}, Elapsed: {2:ms}", start, end, start-end); Or you could write: os << "Start: " << format_time_point(start) << ", End: " << format_time_point(end) << ", Elapsed: " << std::chrono::duration_cast<std::chrono::millis>(start-end).count(); -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/01ea6532/attachment.html>
Sean Silva via llvm-dev
2016-Oct-12 06:15 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
This is awesome. +1 Copying a time-tested design like C#'s (and which also Python uses) seems like a really sound approach. Do you have any particular plans w.r.t. converting existing uses of the other formatting constructs? At the very least we can hopefully get rid of format_hex/format_hex_no_prefix since I don't think there are too many uses of those functions. Also, Since the format string already can embed the surrounding literal strings, do you anticipate the use case where you would want to use `OS << format_string(...) << ...something else...`? Would `print(OS, "....", ....)` make more sense? -- Sean Silva On Tue, Oct 11, 2016 at 6:22 PM, Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> A while back llvm::format() was introduced that made it possible to > combine printf-style formatting with llvm streams. However, this still > comes with all the risks and pitfalls of printf. Everyone is no-doubt > familiar with these problems, but here are just a few anyway: > > 1. *Not type-safe.* Not all compilers warn when you mess up the format > specifier. And when you're writing your own Printf-like functions, you > need to tag them with __attribute__(format, printf) which again not all > compilers have. If you change a const char * to a StringRef, it can > silently succeed while passing your StringRef object to printf. It should > fail to compile! > > 2. *Not security safe. *Functions like sprintf() will happily smash your > stack for you if you're not careful. > > 3. *Not portable (well kinda). *Quick, how do you print a size_t? You > probably said %z. Well MSVC didn't even support %z until 2015, which we > aren't even officially requiring yet. So you've gotta write (uint64_t)x > and then use PRIx64. Ugh. > > 4. *Redundant.* If you're giving it an integer, why do you need to > specify %d? It's an integer! We should be able to use the type system to > our advantage. > > 5. *Not flexible.* How do you print a std::chrono::time_point with > llvm::format()? You can't. You have to resort to providing an overloaded > streaming operator or formatting it some other way. > > So I've been working on a library that will solve all of these problems > and more. > > > The high level design of my library is borrowed heavily from C#. But if > you're not familiar with C#, I believe boost has something similar in > spirit. The best way to show it off is with some examples: > > 1. os << format_string("Test"); // writes "test" > 2. os << format_string("{0}", 7); // writes "7" > > Immediately we can see one big difference between this and llvm::format() > / printf. You don't have to specify the type. If you pass in an int, it > formats it as an int. > > 3. os << format_string("{0} {0}", 7); // writes "7 7" > > #3 is an example of something that cannot be done elegantly with printf. > Sure, you can pass it in twice, but if it's expensive to compute, this > means you have to save it into a temporary. > > 4. os << format_string("{0:X}", 255); // writes "0xFF" > 5. os << format_string("{0:X7}", 255); // writes "0x000FF" > > 6. os << format_string("{0}", foo_object); // fails to compile! > > Here is another example of an improvement over traditional formatting > mechanisms. If you pass an object for which it cannot find a formatter, it > fails to compile. > > However, you can always define custom formatters for your own types. If > you write: > > namespace llvm { > template<> > struct format_provider<Foo> { > static void format(raw_ostream &S, const Foo &F, int Align, StringRef > Options) { > } > }; > } > > Then #6 will magically compile, and invoke the function above to do the > formatting. There are other ways to customize the formatting behavior, but > I'll keep going with some more examples: > > 7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note > the commas. > 8. os << format_string("{0:P}", 0.76); // Writes "76.00%" > > You can also left justify and right justify. For example: > > 9. os << format_string("{0,8:P}", 0.76); // Writes " 76.00%" > 10. os << format_string("{0,-8,P}", 0.76); // Writes "76.00% " > > And you can also format complicated types. For example: > > 11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}", > std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11" > > > I already have a working proof of concept that supports most of the > fundamental data types and formatting options such as percents, exponents, > comma grouping, fixed point, hex, etc. > > To summarize, the advantages of this approach are: > > 1) *Safe.* If it can't format your type, it won't even compile. > 2) *Concise.* You can re-use parameters multiple times without > re-specifying them. > 3) *Simple. *You don't have to remember whether to use %llu or PRIx64 or > %z, because format specifiers don't exist! > 4) *Flexible.* You can format types in a multitude of different ways > while still having the nice format-string style syntax. > 5) *Extensible.* If you don't like the behavior of a built-in formatter, > you can override it with your own. If you have your own type which you'd > like to be able to format, you can add formatting support for it in > multiple different ways. > > I am hoping to have something ready for submitting later this week. If > this interests you, please help me out by reviewing my patch! And if you > think this would not be helpful for LLVM and I should not worry about this, > let me know as well! > > Thanks, > Zach > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161011/22513d65/attachment.html>
Chandler Carruth via llvm-dev
2016-Oct-12 06:29 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
I'm generally favorable on the core idea of having a type-safe and friendly format-string-like formatting utility. Somewhat minor comments below: On Tue, Oct 11, 2016 at 6:22 PM Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> The high level design of my library is borrowed heavily from C#. >My only big hesitation here is that the substitution specifier seems heavily influenced by C#. I'd prefer to model this after a format string syntax folks are fairly familiar with. IMO, Python's is probably the best bet here and has had a lot of hammering on it over the years. So I'd suggest that the pattern syntax be mapped to be as similar to Python's as possible or at least built on top of it. 1. os << format_string("Test"); // writes "test"> 2. os << format_string("{0}", 7); // writes "7" >The "<< format_string(..." is ... really verbose for me. It also makes me strongly feel like this produces a string rather than a streamable entity. I'm not a huge fan of streaming, but if we want to go this route, I'd very much like to keep the syntax short and sweet. "format" is pretty great for that. If this is going to fully subsume its use cases, can we eventually get that to be the name? (While I don't like streaming, I'm not trying to fight that battle here...) Also, you should probably look at what is quickly becoming a popular C++ library in this space: https://github.com/fmtlib/fmt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/84f26c2c/attachment.html>
David Chisnall via llvm-dev
2016-Oct-12 08:23 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On 12 Oct 2016, at 07:29, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > I'm generally favorable on the core idea of having a type-safe and friendly format-string-like formatting utilityI’m also generally in favour, but I wonder what the key motivations for designing our own, rather than importing something like FastFormat, fmtlib, or one of the other tried-and-tested C++ typesafe I/O libraries is. Has someone done an analysis of why these designs are a bad fit for LLVM, or are we just reinventing the wheel because we feel like it? David
Aaron Ballman via llvm-dev
2016-Oct-12 13:43 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
>> 1. os << format_string("Test"); // writes "test" >> 2. os << format_string("{0}", 7); // writes "7" > > > The "<< format_string(..." is ... really verbose for me. It also makes me > strongly feel like this produces a string rather than a streamable entity.I wonder if we could use UDLs instead? os << "Test" << "{0}"_fs << 7; ~Aaron> > I'm not a huge fan of streaming, but if we want to go this route, I'd very > much like to keep the syntax short and sweet. "format" is pretty great for > that. If this is going to fully subsume its use cases, can we eventually get > that to be the name? > > (While I don't like streaming, I'm not trying to fight that battle here...) > > Also, you should probably look at what is quickly becoming a popular C++ > library in this space: https://github.com/fmtlib/fmt > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Zachary Turner via llvm-dev
2016-Oct-12 15:53 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Tue, Oct 11, 2016 at 11:15 PM Sean Silva <chisophugis at gmail.com> wrote:> This is awesome. +1 > > Copying a time-tested design like C#'s (and which also Python uses) seems > like a really sound approach. > > Do you have any particular plans w.r.t. converting existing uses of the > other formatting constructs? At the very least we can hopefully get rid of > format_hex/format_hex_no_prefix since I don't think there are too many uses > of those functions. >I can certainly try, although when I did a quick grep I found 1,637 uses of llvm::format(). It's something we can work towards slowly, but I don't imagine I have the capacity to convert all of these by myself. Getting rid of format_hex() could be a worthy first step though.> > Also, Since the format string already can embed the surrounding literal > strings, do you anticipate the use case where you would want to use `OS << > format_string(...) << ...something else...`? > Would `print(OS, "....", ....)` make more sense? >Perhaps. I would argue that the whole reason we use << in the first place is *because* we don't have a real formatting function. And when we do have one -- assuming it's designed correctly -- streaming operators become unnecessary / a thing of the past. I can imagine a couple of different syntaxes. os.format(format_str, args...); // format() is an instance method of raw_ostream. T format_string<T>(format_str, args...); // returns a T (e.g. a std::string, or SmallString<N>) T &formatf(T &t, format_str, args...); // formats to the location specified by T, which could be a stream, std::string, SmallString, etc. In practice this could be implemented by having the raw_ostream overload call os.format(format_str, args); and having the other versions create a raw_string_ostream or raw_svec_ostream and delegating to the stream version. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/46b77382/attachment.html>
Zachary Turner via llvm-dev
2016-Oct-12 16:04 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Tue, Oct 11, 2016 at 11:29 PM Chandler Carruth <chandlerc at google.com> wrote:> I'm generally favorable on the core idea of having a type-safe and > friendly format-string-like formatting utility. Somewhat minor comments > below: > > On Tue, Oct 11, 2016 at 6:22 PM Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > The high level design of my library is borrowed heavily from C#. > > > My only big hesitation here is that the substitution specifier seems > heavily influenced by C#. I'd prefer to model this after a format string > syntax folks are fairly familiar with. IMO, Python's is probably the best > bet here and has had a lot of hammering on it over the years. So I'd > suggest that the pattern syntax be mapped to be as similar to Python's as > possible or at least built on top of it. >A lot of Python's substitution rules only make sense in the context of a language with reflection. For example, you can write "{0.x}".format(obj) in python which means to print obj.x. If you take all of that out of the equation, Python and C#'s formatting syntax is honestly very similar. They both use curly brace delimeters, they both index by number, they both use a : separator. The biggest difference is that Python smashes ALL of the formatting info into a single field (i.e. everything after the colon), whereas C# separates this into two fields as follows: {index[,align][:options]} I prefer this approach because it draws a firm line between the type-specific formatting (e.g. the options field) and universal formatting (e.g. the alignment field). I did find some potentially useful tidbits in Python's specification that seem useful and which C# does not support though. For example, the ability to center a field, and the ability to specify the padding character rather than always using spaces. We could possibly integrate some of those ideas.> > 1. os << format_string("Test"); // writes "test" > 2. os << format_string("{0}", 7); // writes "7" > > > The "<< format_string(..." is ... really verbose for me. It also makes me > strongly feel like this produces a string rather than a streamable entity. > > I'm not a huge fan of streaming, but if we want to go this route, I'd very > much like to keep the syntax short and sweet. "format" is pretty great for > that. If this is going to fully subsume its use cases, can we eventually > get that to be the name? > > (While I don't like streaming, I'm not trying to fight that battle here...) >Just for the record, I'm not a fan either. See my response to Sean Silva for some alternatives.>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/8f090e95/attachment.html>
James Y Knight via llvm-dev
2016-Oct-12 17:13 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Tue, Oct 11, 2016 at 9:22 PM, Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> A while back llvm::format() was introduced that made it possible to > combine printf-style formatting with llvm streams. However, this still > comes with all the risks and pitfalls of printf. Everyone is no-doubt > familiar with these problems, but here are just a few anyway: > > 1. *Not type-safe.* Not all compilers warn when you mess up the format > specifier. And when you're writing your own Printf-like functions, you > need to tag them with __attribute__(format, printf) which again not all > compilers have. If you change a const char * to a StringRef, it can > silently succeed while passing your StringRef object to printf. It should > fail to compile! > > 2. *Not security safe. *Functions like sprintf() will happily smash your > stack for you if you're not careful. > > 3. *Not portable (well kinda). *Quick, how do you print a size_t? You > probably said %z. Well MSVC didn't even support %z until 2015, which we > aren't even officially requiring yet. So you've gotta write (uint64_t)x > and then use PRIx64. Ugh. > > 4. *Redundant.* If you're giving it an integer, why do you need to > specify %d? It's an integer! We should be able to use the type system to > our advantage. > > 5. *Not flexible.* How do you print a std::chrono::time_point with > llvm::format()? You can't. You have to resort to providing an overloaded > streaming operator or formatting it some other way. > > So I've been working on a library that will solve all of these problems > and more. >I wonder what use cases you envision for this? Why does LLVM need a super extensible flexible formatting library? I mean -- if you were developing this as a standalone project, that seems like maybe a nice feature. But I see no rationale as to why LLVM should include it. That is to say: wouldn't a much-simpler printf replacement, implemented with variadic templates instead of C varargs (and which therefore doesn't require size/signedness prefixes on %d) be sufficient for LLVM? You can do that as a drop-in improvement for llvm::format, replacing the call to snprintf inside the implementation with a new implementation that actually uses the type information. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/7dd4e473/attachment.html>
Zachary Turner via llvm-dev
2016-Oct-12 17:28 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Wed, Oct 12, 2016 at 10:13 AM James Y Knight <jyknight at google.com> wrote:> > > I wonder what use cases you envision for this? Why does LLVM need a super > extensible flexible formatting library? I mean -- if you were developing > this as a standalone project, that seems like maybe a nice feature. But I > see no rationale as to why LLVM should include it. >We were discussing this on IRC chat the other night, but I believe many people underestimate the need for string formatting. Here are some data points: 1. There are currently 1,637 calls to llvm::format() across the codebase, and this doesn't include calls to format_hex(), format_decimal(), and the other variants. 2. LLVM consists of a large number (20+ at a minimum) of focused tools (llc, lli, llvm-dwarfdump, llvm-objdump, etc) whose sole purpose is to output formatted text. Consider the use case of printing a verbose disassembly listing which is fed into FileCheck. 3. Even the "flagship" tools such as clang have need for string formatting when writing diagnostic messages. 4. LLDB in particular has this kind of thing *everywhere*. I'm talking about anywhere from 3-50+ times *per function* (and that's not an exaggeration) for logging purposes. That said, LLVM already includes a formatting library. llvm::format(). So what would be the rationale *against* a better, safer, and easier version of the same thing?> > That is to say: wouldn't a much-simpler printf replacement, implemented > with variadic templates instead of C varargs (and which therefore doesn't > require size/signedness prefixes on %d) be sufficient for LLVM? >> You can do that as a drop-in improvement for llvm::format, replacing the > call to snprintf inside the implementation with a new implementation that > actually uses the type information. >How would you format user-defined types using this? I gave an example earlier: Consider you have a start time and an end time in std::chrono types, and you want to print the start, end, and duration. The code to do this using llvm::format() or stream operators is horrible. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/bef65fa9/attachment.html>
Duncan P. N. Exon Smith via llvm-dev
2016-Oct-12 18:50 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
> On 2016-Oct-11, at 18:22, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway: > > 1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have. If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile! > > 2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful. > > 3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh. > > 4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage. > > 5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way. > > So I've been working on a library that will solve all of these problems and more. > > > The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit.Boost.Format: http://www.boost.org/doc/libs/1_62_0/libs/format/doc/format.html I used it extensively in a past gig. IIRC, it's type safe, more convenient than usual operator<<, and faster than printf. I would love for something like this to be in tree... I don't really care which one as long as it's convenient enough that it's "obviously better". (IOW, +1.)> The best way to show it off is with some examples: > > 1. os << format_string("Test"); // writes "test" > 2. os << format_string("{0}", 7); // writes "7" > > Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int. > > 3. os << format_string("{0} {0}", 7); // writes "7 7" > > #3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary. > > 4. os << format_string("{0:X}", 255); // writes "0xFF" > 5. os << format_string("{0:X7}", 255); // writes "0x000FF" > > 6. os << format_string("{0}", foo_object); // fails to compile! > > Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile. > > However, you can always define custom formatters for your own types. If you write: > > namespace llvm { > template<> > struct format_provider<Foo> { > static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) { > } > }; > } > > Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples: > > 7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas. > 8. os << format_string("{0:P}", 0.76); // Writes "76.00%" > > You can also left justify and right justify. For example: > > 9. os << format_string("{0,8:P}", 0.76); // Writes " 76.00%" > 10. os << format_string("{0,-8,P}", 0.76); // Writes "76.00% " > > And you can also format complicated types. For example: > > 11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}", std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11" > > > I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc. > > To summarize, the advantages of this approach are: > > 1) Safe. If it can't format your type, it won't even compile. > 2) Concise. You can re-use parameters multiple times without re-specifying them. > 3) Simple. You don't have to remember whether to use %llu or PRIx64 or %z, because format specifiers don't exist! > 4) Flexible. You can format types in a multitude of different ways while still having the nice format-string style syntax. > 5) Extensible. If you don't like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you'd like to be able to format, you can add formatting support for it in multiple different ways. > > I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well! > > Thanks, > Zach > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Chris Bieneman via llvm-dev
2016-Oct-12 21:44 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
+1 I really like this proposal. Throughout LLVM sub-projects there is a lot of string formatting that we do and it would be great to have a more modern, flexible, portable, and safe string formatting API. -Chris> On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway: > > 1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have. If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile! > > 2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful. > > 3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh. > > 4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage. > > 5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way. > > So I've been working on a library that will solve all of these problems and more. > > > The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples: > > 1. os << format_string("Test"); // writes "test" > 2. os << format_string("{0}", 7); // writes "7" > > Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int. > > 3. os << format_string("{0} {0}", 7); // writes "7 7" > > #3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary. > > 4. os << format_string("{0:X}", 255); // writes "0xFF" > 5. os << format_string("{0:X7}", 255); // writes "0x000FF" > > 6. os << format_string("{0}", foo_object); // fails to compile! > > Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile. > > However, you can always define custom formatters for your own types. If you write: > > namespace llvm { > template<> > struct format_provider<Foo> { > static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) { > } > }; > } > > Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples: > > 7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas. > 8. os << format_string("{0:P}", 0.76); // Writes "76.00%" > > You can also left justify and right justify. For example: > > 9. os << format_string("{0,8:P}", 0.76); // Writes " 76.00%" > 10. os << format_string("{0,-8,P}", 0.76); // Writes "76.00% " > > And you can also format complicated types. For example: > > 11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}", std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11" > > > I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc. > > To summarize, the advantages of this approach are: > > 1) Safe. If it can't format your type, it won't even compile. > 2) Concise. You can re-use parameters multiple times without re-specifying them. > 3) Simple. You don't have to remember whether to use %llu or PRIx64 or %z, because format specifiers don't exist! > 4) Flexible. You can format types in a multitude of different ways while still having the nice format-string style syntax. > 5) Extensible. If you don't like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you'd like to be able to format, you can add formatting support for it in multiple different ways. > > I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well! > > Thanks, > Zach > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/c84d96cd/attachment.html>
Joerg Sonnenberger via llvm-dev
2016-Oct-12 21:50 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On Wed, Oct 12, 2016 at 11:50:25AM -0700, Duncan P. N. Exon Smith via llvm-dev wrote:> Boost.Format: > http://www.boost.org/doc/libs/1_62_0/libs/format/doc/format.htmlIt's quite heavy, e.g.: https://github.com/fmtlib/fmt#compile-time-and-code-bloat I've been using that library for a couple of projects in an older version, I think the newer version would primarily be quite a bit less verbose. It has a modern BSD license. Joerg
Nicolai Hähnle via llvm-dev
2016-Oct-14 07:35 UTC
[llvm-dev] RFC: General purpose type-safe formatting library
On 12.10.2016 05:59, Mehdi Amini via llvm-dev wrote:>> If you change a const char * to a StringRef, it can silently succeed >> while passing your StringRef object to printf. It should fail to compile! > > llvm::format now fails to compile as well :) > > However this does not address other issues, like: `format(“%d”, float_var)`This may be a good time to point at https://reviews.llvm.org/D25018 But if someone ends up doing a full overhaul of the formatting that makes that patch unnecessary, I'm happy too. Cheers, Nicolai