thr3ads.net - llvm dev - [LLVMdev] LLVM supports Unicode? [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Joachim Durchholz

2011-Aug-28 17:44 UTC

[LLVMdev] LLVM supports Unicode?

Am 28.08.2011 16:02, schrieb geovanisouza92 at
gmail.com:> Well, have you any idea about how I can implement rightly Unicode in C/C++?
What do you mean with "implement in C/C++"?

If you mean adding libraries to C/C++ that correctly deal with Unicode: 
that's nothing you do with a compiler infrastructure. And probably 
duplicate work, since Unicode libraries already exist.

If you mean making the C/C++ compiler understand Unicode string 
literals: either that's in the language standard and implemented by 
conformant compilers already, or it's not in the language standard, 
implementing it would deviate from the standard, and it would not be 
"rightly" implemented. (I'm not a C/C++ guy so I don't know
whether it's
actually in the standard, but if it isn't, I guess any compiler has 
extensions already.)

Hm.

Maybe we're talking at the wrong level here, so:
What's the problem/need that you wish to address?

Regards,
Jo

geovanisouza92 at gmail.com

2011-Aug-28 18:02 UTC

head link

[LLVMdev] LLVM supports Unicode?

Hi, Jo!

I'm trying create a new programming language, and I want that it have
Unicode support (support for read and manipulate rightly the source-code and
string literals).

But, in addition, my programming language supports "string
interpolation"
string, and in these interpolations, tiny snippets of code, like
expressions, or variable names.

So, I need read each char, separating the interpolations.

However, if you have another sugestion, I will stay grateful in listen you.


[]'s



2011/8/28 Joachim Durchholz <jo at durchholz.org>
> Am 28.08.2011 16:02, schrieb geovanisouza92 at gmail.com:
> > Well, have you any idea about how I can implement rightly Unicode in
> C/C++?
>
> What do you mean with "implement in C/C++"?
>
> If you mean adding libraries to C/C++ that correctly deal with Unicode:
> that's nothing you do with a compiler infrastructure. And probably
> duplicate work, since Unicode libraries already exist.
>
> If you mean making the C/C++ compiler understand Unicode string
> literals: either that's in the language standard and implemented by
> conformant compilers already, or it's not in the language standard,
> implementing it would deviate from the standard, and it would not be
> "rightly" implemented. (I'm not a C/C++ guy so I don't
know whether it's
> actually in the standard, but if it isn't, I guess any compiler has
> extensions already.)
>
> Hm.
>
> Maybe we're talking at the wrong level here, so:
> What's the problem/need that you wish to address?
>
> Regards,
> Jo
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 
@geovanisouza92 - Geovani de Souza
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110828/fdb070a0/attachment.html>

Reid Kleckner

2011-Aug-28 18:44 UTC

head link

[LLVMdev] LLVM supports Unicode?

LLVM cannot solve these problems for you, it's only a compiler
framework.  Asking if LLVM can help you support Unicode in your
language is like asking if x86 machine code can help you support
Unicode.  You can use both to generate code that handles Unicode
correctly, but it's up to you to generate that code.

Reid

On Sun, Aug 28, 2011 at 2:02 PM, geovanisouza92 at gmail.com
<geovanisouza92 at gmail.com> wrote:> Hi, Jo!
> I'm trying create a new programming language, and I want that it have
> Unicode support (support for read and manipulate rightly the source-code
and
> string literals).
> But, in addition, my programming language supports "string
interpolation"
> string, and in these interpolations, tiny snippets of code, like
> expressions, or variable names.
> So, I need read each char, separating the interpolations.
> However, if you have another sugestion, I will stay grateful in listen you.
>
> []'s
>
>
> 2011/8/28 Joachim Durchholz <jo at durchholz.org>
>>
>> Am 28.08.2011 16:02, schrieb geovanisouza92 at gmail.com:
>> > Well, have you any idea about how I can implement rightly Unicode
in
>> > C/C++?
>>
>> What do you mean with "implement in C/C++"?
>>
>> If you mean adding libraries to C/C++ that correctly deal with Unicode:
>> that's nothing you do with a compiler infrastructure. And probably
>> duplicate work, since Unicode libraries already exist.
>>
>> If you mean making the C/C++ compiler understand Unicode string
>> literals: either that's in the language standard and implemented by
>> conformant compilers already, or it's not in the language standard,
>> implementing it would deviate from the standard, and it would not be
>> "rightly" implemented. (I'm not a C/C++ guy so I
don't know whether it's
>> actually in the standard, but if it isn't, I guess any compiler has
>> extensions already.)
>>
>> Hm.
>>
>> Maybe we're talking at the wrong level here, so:
>> What's the problem/need that you wish to address?
>>
>> Regards,
>> Jo
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> @geovanisouza92 - Geovani de Souza
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Joachim Durchholz

2011-Aug-28 18:55 UTC

head link

[LLVMdev] LLVM supports Unicode?

Am 28.08.2011 20:02, schrieb geovanisouza92 at
gmail.com:> Hi, Jo!
>
> I'm trying create a new programming language, and I want that it have
> Unicode support (support for read and manipulate rightly the source-code
and
> string literals).
>
> But, in addition, my programming language supports "string
interpolation"
> string, and in these interpolations, tiny snippets of code, like
> expressions, or variable names.
As Reid said, this probably isn't the right list to ask questions about 
the runtime system.
Still, it's marginally relevant, and I happen to have done a bit with 
Unicode lately, so here goes:

In that case, you have a multitude of design and implementation choices. 
You won't be able to properly explore these until you have done some 
more reading.

I'd suggest reading the Unicode standard, available for free at 
http://unicode.org. You'll have to read the material there more than 
once, I fear; at least I had to before I was able to roughly determine 
which parts of the standard were relevant for what I wanted to do.
For starters, you'll want to know about the various encodings (UTF-8 and 
UTF-16 are the most relevant ones), and about surrogate pairs. With that 
in mind, you can start thinking about writing (or using) a library.

For practical usage, I have been sticking with the ICU library.
(Be warned that you still need to know a good deal about Unicode before 
you can properly determine what options of ICU actually do what you want.)

Hope this helps, and good luck!
Regards,
Jo

Erik de Castro Lopo

2011-Aug-28 22:41 UTC

head link

[LLVMdev] LLVM supports Unicode?

geovanisouza92 at gmail.com wrote:
> I'm trying create a new programming language, and I want that it have
> Unicode support (support for read and manipulate rightly the source-code
and
> string literals).
LLVM IR iteself only supports one string ty, which is an array of
i8 (8 bit integers). In your compile you can use utf-8 and any
utf8 string literal can be stored in an i8 array in the LLVM IR.

For example, the LLVM backend for the DDC compiler [0] does this:

   @str = internal constant [4 x i8] c"bar\00", align 8


HTH,
Erik

[0] http://disciple.ouroborus.net/
-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Aug 2011 - [LLVMdev] LLVM supports Unicode?

[LLVMdev] LLVM supports Unicode?

[LLVMdev] LLVM supports Unicode?

[LLVMdev] LLVM supports Unicode?

[LLVMdev] LLVM supports Unicode?

[LLVMdev] LLVM supports Unicode?

Seemingly Similar Threads