thr3ads.net - llvm dev - [llvm-dev] Linking Linux kernel with LLD [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Rafael Avila de Espindola via llvm-dev

2017-Jan-27 19:17 UTC

[llvm-dev] Linking Linux kernel with LLD

> Hmm..., the crux of not being able to lex arithmetic expressions seems to
> be due to lack of context sensitivity. E.g. consider `foo*bar`. Could be a
> multiplication, or could be a glob pattern.
>
> Looking at the code more closely, adding context sensitivity wouldn't
be
> that hard. In fact, our ScriptParserBase class is actually a lexer (look at
> the interface; it is a lexer's interface). It shouldn't be hard to
change
> from an up-front tokenization to a more normal lexer approach of scanning
> the text for each call that wants the next token. Roughly speaking, just
> take the body of the for loop inside ScriptParserBase::tokenize and add a
> helper which does that on the fly and is called by consume/next/etc.
> Instead of an index into a token vector, just keep a `const char *` pointer
> that we advance.
>
> Once that is done, we can easily add a `nextArithmeticToken` or something
> like that which just lexes with different rules.
I like that idea. I first thought of always having '*' as a token, but
then space has to be a token, which is an incredible pain.

I then thought of having a "setLexMode" method, but the lex mode can
always be implicit from where we are in the parser. The parser should
always know if it should call next or nextArithmetic.

And I agree we should probably implement this. Even if it is not common,
it looks pretty silly to not be able to handle 2*5.

Cheers,
Rafael

Rui Ueyama via llvm-dev

2017-Jan-27 21:31 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

Sean,

So as you noticed that linker script tokenization rule is not very trivial
-- it is context sensitive. The current lexer is extremely simple and
almost always works well. Improving "almost always" to
"perfect" is not
high priority because we have many more high priority things, but I'm fine
if someone improves it. If you are interested, please take it. Or maybe
I'll take a look at it. It shouldn't be hard. It's probably just a
half day
work.

As far as I know, the grammar is LL(1), so it needs only one push-back
buffer. Handling INCLUDE directive can be a bit tricky though.

Maybe we should rename ScriptParserBase ScriptLexer.

On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:
> > Hmm..., the crux of not being able to lex arithmetic expressions seems
to
> > be due to lack of context sensitivity. E.g. consider `foo*bar`. Could
be
> a
> > multiplication, or could be a glob pattern.
> >
> > Looking at the code more closely, adding context sensitivity
wouldn't be
> > that hard. In fact, our ScriptParserBase class is actually a lexer
(look
> at
> > the interface; it is a lexer's interface). It shouldn't be
hard to change
> > from an up-front tokenization to a more normal lexer approach of
scanning
> > the text for each call that wants the next token. Roughly speaking,
just
> > take the body of the for loop inside ScriptParserBase::tokenize and
add a
> > helper which does that on the fly and is called by consume/next/etc.
> > Instead of an index into a token vector, just keep a `const char *`
> pointer
> > that we advance.
> >
> > Once that is done, we can easily add a `nextArithmeticToken` or
something
> > like that which just lexes with different rules.
>
> I like that idea. I first thought of always having '*' as a token,
but
> then space has to be a token, which is an incredible pain.
>
> I then thought of having a "setLexMode" method, but the lex mode
can
> always be implicit from where we are in the parser. The parser should
> always know if it should call next or nextArithmetic.
>
> And I agree we should probably implement this. Even if it is not common,
> it looks pretty silly to not be able to handle 2*5.
>
> Cheers,
> Rafael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170127/5c837679/attachment.html>

Sean Silva via llvm-dev

2017-Jan-28 03:48 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

On Fri, Jan 27, 2017 at 1:31 PM, Rui Ueyama <ruiu at google.com> wrote:
> Sean,
>
> So as you noticed that linker script tokenization rule is not very trivial
> -- it is context sensitive. The current lexer is extremely simple and
> almost always works well. Improving "almost always" to
"perfect" is not
> high priority because we have many more high priority things, but I'm
fine
> if someone improves it. If you are interested, please take it. Or maybe
> I'll take a look at it. It shouldn't be hard. It's probably
just a half day
> work.
>
Yeah. To be clear, I wasn't saying that this was high priority. Since
I'm
complaining so much about it maybe I should take a look this weekend :)

>
> As far as I know, the grammar is LL(1), so it needs only one push-back
> buffer. Handling INCLUDE directive can be a bit tricky though.
>
> Maybe we should rename ScriptParserBase ScriptLexer.
>
That sounds like a good idea.

-- Sean Silva

>
> On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
>> > Hmm..., the crux of not being able to lex arithmetic expressions
seems
>> to
>> > be due to lack of context sensitivity. E.g. consider `foo*bar`.
Could
>> be a
>> > multiplication, or could be a glob pattern.
>> >
>> > Looking at the code more closely, adding context sensitivity
wouldn't be
>> > that hard. In fact, our ScriptParserBase class is actually a lexer
>> (look at
>> > the interface; it is a lexer's interface). It shouldn't be
hard to
>> change
>> > from an up-front tokenization to a more normal lexer approach of
>> scanning
>> > the text for each call that wants the next token. Roughly
speaking, just
>> > take the body of the for loop inside ScriptParserBase::tokenize
and add
>> a
>> > helper which does that on the fly and is called by
consume/next/etc.
>> > Instead of an index into a token vector, just keep a `const char
*`
>> pointer
>> > that we advance.
>> >
>> > Once that is done, we can easily add a `nextArithmeticToken` or
>> something
>> > like that which just lexes with different rules.
>>
>> I like that idea. I first thought of always having '*' as a
token, but
>> then space has to be a token, which is an incredible pain.
>>
>> I then thought of having a "setLexMode" method, but the lex
mode can
>> always be implicit from where we are in the parser. The parser should
>> always know if it should call next or nextArithmetic.
>>
>> And I agree we should probably implement this. Even if it is not
common,
>> it looks pretty silly to not be able to handle 2*5.
>>
>> Cheers,
>> Rafael
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170127/4685e429/attachment.html>

llvm dev - Jan 2017 - Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD