Am 28.08.2011 16:02, schrieb geovanisouza92 at gmail.com:> Well, have you any idea about how I can implement rightly Unicode in C/C++?What do you mean with "implement in C/C++"? If you mean adding libraries to C/C++ that correctly deal with Unicode: that's nothing you do with a compiler infrastructure. And probably duplicate work, since Unicode libraries already exist. If you mean making the C/C++ compiler understand Unicode string literals: either that's in the language standard and implemented by conformant compilers already, or it's not in the language standard, implementing it would deviate from the standard, and it would not be "rightly" implemented. (I'm not a C/C++ guy so I don't know whether it's actually in the standard, but if it isn't, I guess any compiler has extensions already.) Hm. Maybe we're talking at the wrong level here, so: What's the problem/need that you wish to address? Regards, Jo
Hi, Jo! I'm trying create a new programming language, and I want that it have Unicode support (support for read and manipulate rightly the source-code and string literals). But, in addition, my programming language supports "string interpolation" string, and in these interpolations, tiny snippets of code, like expressions, or variable names. So, I need read each char, separating the interpolations. However, if you have another sugestion, I will stay grateful in listen you. []'s 2011/8/28 Joachim Durchholz <jo at durchholz.org>> Am 28.08.2011 16:02, schrieb geovanisouza92 at gmail.com: > > Well, have you any idea about how I can implement rightly Unicode in > C/C++? > > What do you mean with "implement in C/C++"? > > If you mean adding libraries to C/C++ that correctly deal with Unicode: > that's nothing you do with a compiler infrastructure. And probably > duplicate work, since Unicode libraries already exist. > > If you mean making the C/C++ compiler understand Unicode string > literals: either that's in the language standard and implemented by > conformant compilers already, or it's not in the language standard, > implementing it would deviate from the standard, and it would not be > "rightly" implemented. (I'm not a C/C++ guy so I don't know whether it's > actually in the standard, but if it isn't, I guess any compiler has > extensions already.) > > Hm. > > Maybe we're talking at the wrong level here, so: > What's the problem/need that you wish to address? > > Regards, > Jo > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- @geovanisouza92 - Geovani de Souza -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110828/fdb070a0/attachment.html>
LLVM cannot solve these problems for you, it's only a compiler framework. Asking if LLVM can help you support Unicode in your language is like asking if x86 machine code can help you support Unicode. You can use both to generate code that handles Unicode correctly, but it's up to you to generate that code. Reid On Sun, Aug 28, 2011 at 2:02 PM, geovanisouza92 at gmail.com <geovanisouza92 at gmail.com> wrote:> Hi, Jo! > I'm trying create a new programming language, and I want that it have > Unicode support (support for read and manipulate rightly the source-code and > string literals). > But, in addition, my programming language supports "string interpolation" > string, and in these interpolations, tiny snippets of code, like > expressions, or variable names. > So, I need read each char, separating the interpolations. > However, if you have another sugestion, I will stay grateful in listen you. > > []'s > > > 2011/8/28 Joachim Durchholz <jo at durchholz.org> >> >> Am 28.08.2011 16:02, schrieb geovanisouza92 at gmail.com: >> > Well, have you any idea about how I can implement rightly Unicode in >> > C/C++? >> >> What do you mean with "implement in C/C++"? >> >> If you mean adding libraries to C/C++ that correctly deal with Unicode: >> that's nothing you do with a compiler infrastructure. And probably >> duplicate work, since Unicode libraries already exist. >> >> If you mean making the C/C++ compiler understand Unicode string >> literals: either that's in the language standard and implemented by >> conformant compilers already, or it's not in the language standard, >> implementing it would deviate from the standard, and it would not be >> "rightly" implemented. (I'm not a C/C++ guy so I don't know whether it's >> actually in the standard, but if it isn't, I guess any compiler has >> extensions already.) >> >> Hm. >> >> Maybe we're talking at the wrong level here, so: >> What's the problem/need that you wish to address? >> >> Regards, >> Jo >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > @geovanisouza92 - Geovani de Souza > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Am 28.08.2011 20:02, schrieb geovanisouza92 at gmail.com:> Hi, Jo! > > I'm trying create a new programming language, and I want that it have > Unicode support (support for read and manipulate rightly the source-code and > string literals). > > But, in addition, my programming language supports "string interpolation" > string, and in these interpolations, tiny snippets of code, like > expressions, or variable names.As Reid said, this probably isn't the right list to ask questions about the runtime system. Still, it's marginally relevant, and I happen to have done a bit with Unicode lately, so here goes: In that case, you have a multitude of design and implementation choices. You won't be able to properly explore these until you have done some more reading. I'd suggest reading the Unicode standard, available for free at http://unicode.org. You'll have to read the material there more than once, I fear; at least I had to before I was able to roughly determine which parts of the standard were relevant for what I wanted to do. For starters, you'll want to know about the various encodings (UTF-8 and UTF-16 are the most relevant ones), and about surrogate pairs. With that in mind, you can start thinking about writing (or using) a library. For practical usage, I have been sticking with the ICU library. (Be warned that you still need to know a good deal about Unicode before you can properly determine what options of ICU actually do what you want.) Hope this helps, and good luck! Regards, Jo
geovanisouza92 at gmail.com wrote:> I'm trying create a new programming language, and I want that it have > Unicode support (support for read and manipulate rightly the source-code and > string literals).LLVM IR iteself only supports one string ty, which is an array of i8 (8 bit integers). In your compile you can use utf-8 and any utf8 string literal can be stored in an i8 array in the LLVM IR. For example, the LLVM backend for the DDC compiler [0] does this: @str = internal constant [4 x i8] c"bar\00", align 8 HTH, Erik [0] http://disciple.ouroborus.net/ -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/