thr3ads.net - llvm dev - [llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Adrian Prantl via llvm-dev

2018-Jan-30 15:41 UTC

[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

> On Jan 30, 2018, at 7:35 AM, Pavel Labath <labath at google.com>
wrote:
> 
> Hello all,
> 
> I am looking for feedback regarding implementation of the case folding
> algorithm for .debug_names hashes.
> 
> Unlike the apple tables, the .debug_names hashes are computed from
> case-folded names (to enable case-insensitive lookups for languages
> where that makes sense). The dwarf5 document specifies that the case
> folding should be done according the the "Caseless matching"
Section
> of the Unicode standard (whose implementation is basically a long list
> of special cases). While certainly possible, implementing this would
> be much more complicated (and would probably make the code a bit
> slower) than a simple tolower(3) call. And the benefits of this are
> not really clear to me.
Assuming a UTF-8 encoding, will tolower(3) destroy any non-ASCII characters in
the process? In Swift, for example, we allow a wide range of unicode characters
in identifiers and I want to make sure that this doesn't cause any problems.

-- adrian> 
> Do you know if we already make any promises or assumptions about the
> encoding and/or locale of the symbol names (and here I mainly mean the
> names in the debug info metadata, not llvm symbols).
> 
> If we don't already have a policy about this, then I propose to
> implement the case folding via tolower() (which is compatible with the
> full case folding algorithm, as long as one sticks to basic latin
> characters).
> 
> What do you think?

Pavel Labath via llvm-dev

2018-Jan-30 15:49 UTC

head link

[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

On 30 January 2018 at 15:41, Adrian Prantl <aprantl at apple.com>
wrote:>
>
>> On Jan 30, 2018, at 7:35 AM, Pavel Labath <labath at google.com>
wrote:
>>
>> Hello all,
>>
>> I am looking for feedback regarding implementation of the case folding
>> algorithm for .debug_names hashes.
>>
>> Unlike the apple tables, the .debug_names hashes are computed from
>> case-folded names (to enable case-insensitive lookups for languages
>> where that makes sense). The dwarf5 document specifies that the case
>> folding should be done according the the "Caseless matching"
Section
>> of the Unicode standard (whose implementation is basically a long list
>> of special cases). While certainly possible, implementing this would
>> be much more complicated (and would probably make the code a bit
>> slower) than a simple tolower(3) call. And the benefits of this are
>> not really clear to me.
>
> Assuming a UTF-8 encoding, will tolower(3) destroy any non-ASCII characters
in the process? In Swift, for example, we allow a wide range of unicode
characters in identifiers and I want to make sure that this doesn't cause
any problems.
>
I'm not sure what it will do out-of-the-box, but I could certainly
implement it such that it does not touch the fancy characters.

However, if we already have unicode characters in the input, then it
may make sense to go all the way and implement the full folding
algorithm. Because, once we start producing hashes like this, it will
be hard to switch to being fully standard-compliant (as that would
invalidate the existing hashes).

But the question then is: can I assume the input names will be unicode
(w/utf8 encoding)?

Adrian Prantl via llvm-dev

2018-Jan-30 16:20 UTC

head link

[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

> On Jan 30, 2018, at 7:49 AM, Pavel Labath <labath at google.com>
wrote:
> 
> On 30 January 2018 at 15:41, Adrian Prantl <aprantl at apple.com>
wrote:
>> 
>> 
>>> On Jan 30, 2018, at 7:35 AM, Pavel Labath <labath at
google.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I am looking for feedback regarding implementation of the case
folding
>>> algorithm for .debug_names hashes.
>>> 
>>> Unlike the apple tables, the .debug_names hashes are computed from
>>> case-folded names (to enable case-insensitive lookups for languages
>>> where that makes sense). The dwarf5 document specifies that the
case
>>> folding should be done according the the "Caseless
matching" Section
>>> of the Unicode standard (whose implementation is basically a long
list
>>> of special cases). While certainly possible, implementing this
would
>>> be much more complicated (and would probably make the code a bit
>>> slower) than a simple tolower(3) call. And the benefits of this are
>>> not really clear to me.
>> 
>> Assuming a UTF-8 encoding, will tolower(3) destroy any non-ASCII
characters in the process? In Swift, for example, we allow a wide range of
unicode characters in identifiers and I want to make sure that this doesn't
cause any problems.
>> 
> 
> I'm not sure what it will do out-of-the-box, but I could certainly
> implement it such that it does not touch the fancy characters.
> 
> However, if we already have unicode characters in the input, then it
> may make sense to go all the way and implement the full folding
> algorithm. Because, once we start producing hashes like this, it will
> be hard to switch to being fully standard-compliant (as that would
> invalidate the existing hashes).
> 
> But the question then is: can I assume the input names will be unicode
> (w/utf8 encoding)?
We can make that happen and encode it explicitly in each compile unit:
> 3.1.1 Full and Partial Compilation Unit Entries
> ...
> A DW_AT_use_UTF8 attribute, which is a flag whose presence indicates that
all strings (such as the names of declared entities in the source program, or
filenames in the line number table) are represented using the UTF-8
representation.
-- adrian

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jan 2018 - [lldb-dev] Adding DWARF5 accelerator table support to llvm

[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

Apparently Analagous Threads