thr3ads.net - llvm dev - [llvm-dev] Linking Linux kernel with LLD [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Sean Silva via llvm-dev

2017-Jan-28 03:48 UTC

[llvm-dev] Linking Linux kernel with LLD

On Fri, Jan 27, 2017 at 1:31 PM, Rui Ueyama <ruiu at google.com> wrote:
> Sean,
>
> So as you noticed that linker script tokenization rule is not very trivial
> -- it is context sensitive. The current lexer is extremely simple and
> almost always works well. Improving "almost always" to
"perfect" is not
> high priority because we have many more high priority things, but I'm
fine
> if someone improves it. If you are interested, please take it. Or maybe
> I'll take a look at it. It shouldn't be hard. It's probably
just a half day
> work.
>
Yeah. To be clear, I wasn't saying that this was high priority. Since
I'm
complaining so much about it maybe I should take a look this weekend :)

>
> As far as I know, the grammar is LL(1), so it needs only one push-back
> buffer. Handling INCLUDE directive can be a bit tricky though.
>
> Maybe we should rename ScriptParserBase ScriptLexer.
>
That sounds like a good idea.

-- Sean Silva

>
> On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
>> > Hmm..., the crux of not being able to lex arithmetic expressions
seems
>> to
>> > be due to lack of context sensitivity. E.g. consider `foo*bar`.
Could
>> be a
>> > multiplication, or could be a glob pattern.
>> >
>> > Looking at the code more closely, adding context sensitivity
wouldn't be
>> > that hard. In fact, our ScriptParserBase class is actually a lexer
>> (look at
>> > the interface; it is a lexer's interface). It shouldn't be
hard to
>> change
>> > from an up-front tokenization to a more normal lexer approach of
>> scanning
>> > the text for each call that wants the next token. Roughly
speaking, just
>> > take the body of the for loop inside ScriptParserBase::tokenize
and add
>> a
>> > helper which does that on the fly and is called by
consume/next/etc.
>> > Instead of an index into a token vector, just keep a `const char
*`
>> pointer
>> > that we advance.
>> >
>> > Once that is done, we can easily add a `nextArithmeticToken` or
>> something
>> > like that which just lexes with different rules.
>>
>> I like that idea. I first thought of always having '*' as a
token, but
>> then space has to be a token, which is an incredible pain.
>>
>> I then thought of having a "setLexMode" method, but the lex
mode can
>> always be implicit from where we are in the parser. The parser should
>> always know if it should call next or nextArithmetic.
>>
>> And I agree we should probably implement this. Even if it is not
common,
>> it looks pretty silly to not be able to handle 2*5.
>>
>> Cheers,
>> Rafael
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170127/4685e429/attachment.html>

Dmitry Golovin via llvm-dev

2017-Jan-28 17:57 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

<div>At this point I'm able to link Linux kernel with LLD and objcopy
doen't give me any errors.</div><div> </div><div>The
versions are:</div><div> </div><div>Linux 4.10.0-rc5 (+
applied the patch from my previous message)</div><div>LLD 5.0.0
(https://github.com/llvm-mirror/lld db83a5cc3968b3aac1dbe3270190bd3282862e74) (+
applied D28612)</div><div>GNU objcopy (GNU Binutils)
2.27</div><div> </div><div>The problem is that the
resulting kernel doesn't boot. Does anybody have any suggestions on how to
debug it or any guesses what did go wrong while
linking?</div><div> </div><div>Regards,</div><div>Dmitry</div><div><br
/></div><div><br /></div><div>28.01.2017,
05:48, "Sean Silva via llvm-dev"
<llvm-dev@lists.llvm.org>:</div><blockquote
type="cite"><div dir="ltr"><br
/><div><br /><div>On Fri, Jan 27, 2017 at 1:31 PM, Rui
Ueyama <span dir="ltr"><<a
href="mailto:ruiu@google.com"
target="_blank">ruiu@google.com</a>></span>
wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px
#ccc solid;padding-left:1ex;"><div
dir="ltr">Sean,<div><br /></div><div>So as
you noticed that linker script tokenization rule is not very trivial -- it is
context sensitive. The current lexer is extremely simple and almost always works
well. Improving "almost always" to "perfect" is not high
priority because we have many more high priority things, but I'm fine if
someone improves it. If you are interested, please take it. Or maybe I'll
take a look at it. It shouldn't be hard. It's probably just a half day
work.</div></div></blockquote><div><br
/></div><div>Yeah. To be clear, I wasn't saying that this was
high priority. Since I'm complaining so much about it maybe I should take a
look this weekend :)</div><div> </div><blockquote
style="margin:0 0 0 0.8ex;border-left:1px #ccc
solid;padding-left:1ex;"><div
dir="ltr"><div><br /></div><div>As far as I
know, the grammar is LL(1), so it needs only one push-back buffer. Handling
INCLUDE directive can be a bit tricky though.<br
/></div><div><br /></div><div>Maybe we should
rename ScriptParserBase
ScriptLexer.</div></div></blockquote><div><br
/></div><div>That sounds like a good
idea.</div><div><br /></div><div>-- Sean
Silva</div><div> </div><blockquote style="margin:0 0 0
0.8ex;border-left:1px #ccc solid;padding-left:1ex;"><div
dir="ltr"><div><div><div><br
/><div>On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola
<span dir="ltr"><<a
href="mailto:rafael.espindola@gmail.com"
target="_blank">rafael.espindola@gmail.com</a>></span>
wrote:<br /><blockquote style="margin:0 0 0 0.8ex;border-left:1px
#ccc solid;padding-left:1ex;"><span>> Hmm..., the crux of not
being able to lex arithmetic expressions seems to<br
/>> be due to lack of context sensitivity. E.g. consider `foo*bar`. Could be
a<br />
> multiplication, or could be a glob pattern.<br />
><br />
> Looking at the code more closely, adding context sensitivity wouldn't
be<br />
> that hard. In fact, our ScriptParserBase class is actually a lexer (look
at<br />
> the interface; it is a lexer's interface). It shouldn't be hard to
change<br />
> from an up-front tokenization to a more normal lexer approach of
scanning<br />
> the text for each call that wants the next token. Roughly speaking,
just<br />
> take the body of the for loop inside ScriptParserBase::tokenize and add
a<br />
> helper which does that on the fly and is called by consume/next/etc.<br
/>
> Instead of an index into a token vector, just keep a `const char *`
pointer<br />
> that we advance.<br />
><br />
> Once that is done, we can easily add a `nextArithmeticToken` or
something<br />
> like that which just lexes with different rules.<br /><br />
</span>I like that idea. I first thought of always having '*' as a
token, but<br />
then space has to be a token, which is an incredible pain.<br />
<br />
I then thought of having a "setLexMode" method, but the lex mode
can<br />
always be implicit from where we are in the parser. The parser should<br
/>
always know if it should call next or nextArithmetic.<br />
<br />
And I agree we should probably implement this. Even if it is not common,<br
/>
it looks pretty silly to not be able to handle 2*5.<br />
<br />
Cheers,<br />
Rafael<br />
</blockquote></div><br
/></div></div></div></div>
</blockquote></div><br /></div></div>
,<p>_______________________________________________<br />LLVM
Developers mailing list<br /><a
href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br
/><a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br
/></p></blockquote>

George Rimar via llvm-dev

2017-Jan-29 08:18 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

>At this point I'm able to link Linux kernel with LLD and objcopy
doen't give me any errors.
>
>The versions are:
>
>Linux 4.10.0-rc5 (+ applied the patch from my previous message)
>LLD 5.0.0 (https://github.com/llvm-mirror/lld
db83a5cc3968b3aac1dbe3270190bd3282862e74) (+ applied D28612)
>GNU objcopy (GNU Binutils) 2.27
>
>The problem is that the resulting kernel doesn't boot. Does anybody have
any suggestions on how to debug it or any guesses what did go wrong while
linking?
>
>Regards,
>Dmitry
It should not boot atm, I believe.
I mentioned earlier, LLD currently generates wrong output for scripts like:


.rodata : {
 *(.rodata)
 *(.rodata.*)
 . = ALIGN(16);
 video_cards = .;
 *(.videocards)
 video_cards_end = .;?

That is sample from kernel realmode script. We produce wrong values for
video_cards/video_cards_end.
Reduced sample is D29217, and thread with possible patch for that is D27415
which is under discussions now.
(Though there are also probably can be other issues, but that one is obvious
atm).

I have a question also. You added -m elf_i386 to workaround emulation conflict
issue in LLD, do you know
does output produced by BFD boot fine after that change ?

George.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170129/2ea02ee0/attachment.html>

Sean Silva via llvm-dev

2017-Feb-01 22:01 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

On Sat, Jan 28, 2017 at 9:57 AM, Dmitry Golovin <dima at golovin.in>
wrote:
> At this point I'm able to link Linux kernel with LLD and objcopy
doen't
> give me any errors.
>
> The versions are:
>
> Linux 4.10.0-rc5 (+ applied the patch from my previous message)
> LLD 5.0.0 (https://github.com/llvm-mirror/lld
> db83a5cc3968b3aac1dbe3270190bd3282862e74) (+ applied D28612)
> GNU objcopy (GNU Binutils) 2.27
>
> The problem is that the resulting kernel doesn't boot. Does anybody
have
> any suggestions on how to debug it or any guesses what did go wrong while
> linking?
>
Based on our experience getting FreeBSD working, we spent most time getting
the bootloader to accept the kernel.

To debug this, we mostly used two approaches:
- printf debugging in the bootloader (will require rebuilding the
bootloader multiple times)
- using objdump-like tools to look at the differences between a good (BFD
or gold linked) kernel and the failing (LLD-linked) kernel. (e.g. different
program header, different section contents in certain sections that the
bootloader looks at, etc.)

As far as the setup, I would recommend setting up qemu for actually running
the LLD-linked kernel and custom bootloader etc. because then you can have
a single script that rebuilds the bootloader and kernel and copies the
files to the VM. This reduces iteration time significantly.
Davide is the one that set that up and could probably provide more details,
but qemu docs might be good enough that you can set things up without much
effort (not sure though).

-- Sean Silva

>
> Regards,
> Dmitry
>
>
> 28.01.2017, 05:48, "Sean Silva via llvm-dev" <llvm-dev at
lists.llvm.org>:
>
>
>
> On Fri, Jan 27, 2017 at 1:31 PM, Rui Ueyama <ruiu at google.com>
wrote:
>
> Sean,
>
> So as you noticed that linker script tokenization rule is not very trivial
> -- it is context sensitive. The current lexer is extremely simple and
> almost always works well. Improving "almost always" to
"perfect" is not
> high priority because we have many more high priority things, but I'm
fine
> if someone improves it. If you are interested, please take it. Or maybe
> I'll take a look at it. It shouldn't be hard. It's probably
just a half day
> work.
>
>
> Yeah. To be clear, I wasn't saying that this was high priority. Since
I'm
> complaining so much about it maybe I should take a look this weekend :)
>
>
>
> As far as I know, the grammar is LL(1), so it needs only one push-back
> buffer. Handling INCLUDE directive can be a bit tricky though.
>
> Maybe we should rename ScriptParserBase ScriptLexer.
>
>
> That sounds like a good idea.
>
> -- Sean Silva
>
>
>
> On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
> > Hmm..., the crux of not being able to lex arithmetic expressions seems
to
> > be due to lack of context sensitivity. E.g. consider `foo*bar`. Could
be
> a
> > multiplication, or could be a glob pattern.
> >
> > Looking at the code more closely, adding context sensitivity
wouldn't be
> > that hard. In fact, our ScriptParserBase class is actually a lexer
(look
> at
> > the interface; it is a lexer's interface). It shouldn't be
hard to change
> > from an up-front tokenization to a more normal lexer approach of
scanning
> > the text for each call that wants the next token. Roughly speaking,
just
> > take the body of the for loop inside ScriptParserBase::tokenize and
add a
> > helper which does that on the fly and is called by consume/next/etc.
> > Instead of an index into a token vector, just keep a `const char *`
> pointer
> > that we advance.
> >
> > Once that is done, we can easily add a `nextArithmeticToken` or
something
> > like that which just lexes with different rules.
>
> I like that idea. I first thought of always having '*' as a token,
but
> then space has to be a token, which is an incredible pain.
>
> I then thought of having a "setLexMode" method, but the lex mode
can
> always be implicit from where we are in the parser. The parser should
> always know if it should call next or nextArithmetic.
>
> And I agree we should probably implement this. Even if it is not common,
> it looks pretty silly to not be able to handle 2*5.
>
> Cheers,
> Rafael
>
>
>
> ,
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170201/8b74e48a/attachment.html>

Sean Silva via llvm-dev

2017-Feb-01 22:10 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

On Sat, Jan 28, 2017 at 9:57 AM, Dmitry Golovin <dima at golovin.in>
wrote:
> At this point I'm able to link Linux kernel with LLD and objcopy
doen't
> give me any errors.
>
> The versions are:
>
> Linux 4.10.0-rc5 (+ applied the patch from my previous message)
> LLD 5.0.0 (https://github.com/llvm-mirror/lld
> db83a5cc3968b3aac1dbe3270190bd3282862e74) (+ applied D28612)
> GNU objcopy (GNU Binutils) 2.27
>
> The problem is that the resulting kernel doesn't boot. Does anybody
have
> any suggestions on how to debug it or any guesses what did go wrong while
> linking?
>
As far as different things that can go wrong, some things to consider:

- LLD's output binary has the same (or similar) data as the BFD/gold output
binary. E.g. if the LLD binary is only half as big (or the PT_LOAD's that
the booloader looks at are half as bit), LLD might not be putting things
into the output or into the right output sections.

- The bootloader is looking for something in the dynamic symbol table, but
it isn't there. LLD might be resolving symbols differently.

- Section contents are different between LLD and BFD/gold. E.g. for freebsd
there is a "linker set" section which contains pointers to a bunch of
metadata structs that are needed. LLD was not relocating these correctly
because the symbols were not ending up in the output or something like that
(I forget exactly; Michael might remember better).


-- Sean Silva

>
> Regards,
> Dmitry
>
>
> 28.01.2017, 05:48, "Sean Silva via llvm-dev" <llvm-dev at
lists.llvm.org>:
>
>
>
> On Fri, Jan 27, 2017 at 1:31 PM, Rui Ueyama <ruiu at google.com>
wrote:
>
> Sean,
>
> So as you noticed that linker script tokenization rule is not very trivial
> -- it is context sensitive. The current lexer is extremely simple and
> almost always works well. Improving "almost always" to
"perfect" is not
> high priority because we have many more high priority things, but I'm
fine
> if someone improves it. If you are interested, please take it. Or maybe
> I'll take a look at it. It shouldn't be hard. It's probably
just a half day
> work.
>
>
> Yeah. To be clear, I wasn't saying that this was high priority. Since
I'm
> complaining so much about it maybe I should take a look this weekend :)
>
>
>
> As far as I know, the grammar is LL(1), so it needs only one push-back
> buffer. Handling INCLUDE directive can be a bit tricky though.
>
> Maybe we should rename ScriptParserBase ScriptLexer.
>
>
> That sounds like a good idea.
>
> -- Sean Silva
>
>
>
> On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
> > Hmm..., the crux of not being able to lex arithmetic expressions seems
to
> > be due to lack of context sensitivity. E.g. consider `foo*bar`. Could
be
> a
> > multiplication, or could be a glob pattern.
> >
> > Looking at the code more closely, adding context sensitivity
wouldn't be
> > that hard. In fact, our ScriptParserBase class is actually a lexer
(look
> at
> > the interface; it is a lexer's interface). It shouldn't be
hard to change
> > from an up-front tokenization to a more normal lexer approach of
scanning
> > the text for each call that wants the next token. Roughly speaking,
just
> > take the body of the for loop inside ScriptParserBase::tokenize and
add a
> > helper which does that on the fly and is called by consume/next/etc.
> > Instead of an index into a token vector, just keep a `const char *`
> pointer
> > that we advance.
> >
> > Once that is done, we can easily add a `nextArithmeticToken` or
something
> > like that which just lexes with different rules.
>
> I like that idea. I first thought of always having '*' as a token,
but
> then space has to be a token, which is an incredible pain.
>
> I then thought of having a "setLexMode" method, but the lex mode
can
> always be implicit from where we are in the parser. The parser should
> always know if it should call next or nextArithmetic.
>
> And I agree we should probably implement this. Even if it is not common,
> it looks pretty silly to not be able to handle 2*5.
>
> Cheers,
> Rafael
>
>
>
> ,
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170201/1eaff8eb/attachment-0001.html>

George Rimar via llvm-dev

2017-Feb-08 09:56 UTC

head link

[llvm-dev] Linking Linux kernel with LLD

??>On Fri, Jan 27, 2017 at 1:31 PM, Rui Ueyama <ruiu at
google.com<mailto:ruiu at google.com>> wrote:

?>>Sean,
?>>>>So as you noticed that linker script tokenization rule is not very
trivial -- it is context sensitive. The current lexer is extremely
>>simple and almost always works well. Improving "almost always"
to "perfect" is not high priority because we have many more
>>high priority things, but I'm fine if someone improves it. If you
are interested, please take it. Or maybe I'll take a look at it. It
>>shouldn't be hard. It's probably just a half day work.
>
>Yeah. To be clear, I wasn't saying that this was high priority. Since
I'm complaining so much about it maybe I should take a look >this weekend
:)?
>?>>As far as I know, the grammar is LL(1), so it needs only one push-back
buffer. Handling INCLUDE directive can be a bit tricky >>though.
?>>>>Maybe we should rename ScriptParserBase ScriptLexer.
>
>That sounds like a good idea.
>
>-- Sean Silva
Just in case, patch implementing this ideas is D29576. Works fine.

Imho looks fine either, except part that switches lexer modes. Probably I can
impove it somehow if overall
direction is ok.

George.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170208/d2cf6506/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Feb 2017 - Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

[llvm-dev] Linking Linux kernel with LLD

Apparently Analagous Threads