thr3ads.net - llvm dev - [LLVMdev] request for windows unicode support [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Török Edwin

2010-Nov-26 14:38 UTC

[LLVMdev] request for windows unicode support

On Fri, 26 Nov 2010 09:28:17 -0500
Michael Spencer <bigcheesegs at gmail.com> wrote:
> On Fri, Nov 26, 2010 at 4:00 AM, Jochen Wilhelmy
> <j.wilhelmy at arcor.de> wrote:
> > No, this post was prompted since I switched to boost::filesystem
> > version 3 in my own code and llvm/clang 2.8
> > was the only lib with no unicode support on windows.
> > Will your code be api compatible to boost::filesystem?
> 
> No. boost::filesystem makes extensive use of exceptions, which LLVM is
> compiled without, and it does a lot of string allocation and copying.
> 
> > The reason for this
> > is that maybe boost::filesystem
> > will become part of the standard and it is possible to imbue() a
> > locale on boost::filesystem.
> > While this feature is not needed on unix/macos it gives you global
> > control whether you want to use ansi or
> > unicode on windows.
> > If you implement your own code with always utf-8 this may break
> > compatibility with windows ansi
> > encoding if you don't take care and why reinvent the wheel? maybe
> > you could even copy/paste the
> > boost implementation and use the #ifdef HAVE_BOOST approach.
> >
> > -Jochen
> 
> The conversion only has to be written once. And while I do like the
> way boost::filesystem handles locale issues, the API is not suited for
> LLVM for the above reasons. However, if you have a better design than
> what I proposed, I'd love to see it. I'm not that familiar with
> dealing with Unicode under Windows.
Can't you just store filenames as UTF8 (like you do on Linux), and
convert UTF8 to widechar just when calling the windows APIs?
Same for converting back directory listings as such, you get widechar,
and convert back to UTF8.
All you would need to do is implement that conversion in System/Win32,
I think MultiByteToWideChar supports UTF8, doesn't it?

Best regards,
--Edwin

Jochen Wilhelmy

2010-Nov-26 14:45 UTC

head link

[LLVMdev] request for windows unicode support

> Can't you just store filenames as UTF8 (like you do on Linux), and
> convert UTF8 to widechar just when calling the windows APIs?
> Same for converting back directory listings as such, you get widechar,
> and convert back to UTF8.
> All you would need to do is implement that conversion in System/Win32,
> I think MultiByteToWideChar supports UTF8, doesn't it?
>    I would think the most efficient approach is to use utf16 (i.e. wchar_t) 
internally on windows
(ohterwise utf8). Then if a path is used multiple times no conversion 
takes place. The conversion only
takes place at creation time when you create a path from utf8.

even if you have reasons not to use it, you should have a look at
www.boost.org/doc/libs/1_45_0/libs/filesystem/v3/doc/index.htm
www.boost.org/doc/libs/1_45_0/libs/filesystem/v3/doc/v3_design.html

-Jochen

Michael Spencer

2010-Nov-26 16:47 UTC

head link

[LLVMdev] request for windows unicode support

2010/11/26 Jochen Wilhelmy <j.wilhelmy at
arcor.de>:>
>> Can't you just store filenames as UTF8 (like you do on Linux), and
>> convert UTF8 to widechar just when calling the windows APIs?
>> Same for converting back directory listings as such, you get widechar,
>> and convert back to UTF8.
>> All you would need to do is implement that conversion in System/Win32,
>> I think MultiByteToWideChar supports UTF8, doesn't it?
>>
>
> I would think the most efficient approach is to use utf16 (i.e. wchar_t)
> internally on windows
> (ohterwise utf8). Then if a path is used multiple times no conversion takes
> place. The conversion only
> takes place at creation time when you create a path from utf8.
The current API is stateless, meaning that the user is responsible for
the storage and format of paths. Thus there is no internal storage.
However, we could cache the conversion using a thread local limited
size LRU cache depending on how long the conversion takes. Storing
string as utf-16 would require changing them to utf-8 whenever the
client wanted to look at them, incurring lots of memory allocations
and copying.
> even if you have reasons not to use it, you should have a look at
> www.boost.org/doc/libs/1_45_0/libs/filesystem/v3/doc/index.htm
> www.boost.org/doc/libs/1_45_0/libs/filesystem/v3/doc/v3_design.html
>
> -Jochen
My design is based exactly off of that.

- Michael Spencer

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Nov 2010 - [LLVMdev] request for windows unicode support

[LLVMdev] request for windows unicode support

[LLVMdev] request for windows unicode support

[LLVMdev] request for windows unicode support

Seemingly Similar Threads