>
>
> One of the standard reasons to prefer refactoring, even though it appears
> to take longer or be more difficult, is that it allows you to always keep
> all tests green. It is very easy for things to slip through the cracks and
> not promptly return to being green on a "from-scratch" version.
This
> ultimately turns into bug reports later and the feature needs to be
> reimplemented; the apparent simplicity of the "from-scratch"
version can
> disappear very rapidly.
>
Hmm, why can't the from-scratch version use existing tests to make sure
major features are not regressed?
Refactoring requires a good foundation. If the foundation is broken,
rewriting is more preferred. There are many successful stories of complete
rewrite.
>
> In the refactoring approach you are forced to incorporate a holistic
> understanding of the necessary features into your simplification efforts,
> since the tests keep you from accidentally disregarding necessary features.
>
Features are protected with good tests. This has nothing to do with the
approach taken.
It is very easy to accidentally buy simplicity at the cost of
losing> features; if you eventually need the features back then the apparent
> simplicity is an illusion.
>
It is probably not quite useful debating in abstract. Rui already has the
initial implementation ready which shows very promising results ...
just my 2c.
David
>
> -- Sean Silva
>
>
>>
>> I understand what you are saying, because as you might have noticed,
I'm
>> probably the person who spent one's time most on refactoring it to
do what
>> you are saying. I wanted to make it more readable, easy to add
features,
>> and run faster. I worked actually really hard. Although I partly
succeeded,
>> I was disappointed to myself because of a (lack of) progress. After
all, I
>> had to conclude that that was not going to work -- they are so
different
>> that it's not reasonable to spend time on that direction. A better
approach
>> is to set a new foundation and move existing code to them, instead of
doing
>> rework in-place. It may also worth mentioning that the new approach
worked
>> well. I made up a self-hosting linker only in two weeks, which does
support
>> dead-stripping and is more than 4x faster.
>>
>>
>>>> Besides them, I'd say from my experiences of working on the
atom model,
>>>> the new model's ability is not that different from the atom
model. They are
>>>> different, there are pros and cons, and I don't agree that
the atom model
>>>> is more flexible or conceptually better.
>>>>
>>>
>>> I don't understand this focus on "the atom model".
"the atom model" is
>>> not any particular thing. We can generalize the meaning of atom, we
can
>>> make it more narrow, we can remove responsibilities from Atom, we
can add
>>> responsibilities to Atom, we can do whatever is needed. As you
yourself
>>> admit, the "new model" is not that different from
"the atom model". Think
>>> of "the atom model" like SSA. LLVM IR is SSA; there is a
very large amount
>>> of freedom to decide on the exact design within that scope.
"the atom
>>> model" AFAICT just means that a core abstraction inside the
linker is the
>>> notion of an indivisible chunk. Our current design might need to be
>>> changed, but starting from scratch only to arrive at the same basic
idea
>>> but now having to effectively maintain two codebases doesn't
seem worth it.
>>>
>>
>> Large part of the difficulties in development of the current LLD comes
>> from over-generalizataion to share code between pretty much different
file
>> formats. My observation is that we ended up having to write large
amount of
>> code to share little core even which doesn't really fit well any
platform
>> (an example is the virtual archive file I mentioned above -- that was
>> invented to hide platform-specific atom creation behind something
>> platform-neutral stuff, and because archive files are supported by
three
>> platforms, they are chosen.) Different things are different, we need to
get
>> the right balance. I don't think that the current balance is not
right.
>>
>> A lot of the issue here is that we are falsely distinguishing
>>> "section-based" and "atom-based". A suitable
generalization of the notion
>>> of "indivisible chunks" and what you can do with them
covers both cases,
>>> but traditional usage of sections makes the "indivisible
chunks" be a lot
>>> larger (and loses more information in doing so). But as
>>> -ffunction-sections/-fdata-sections shows, there is not really any
>>> fundamental difference.
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>> On Thu, May 28, 2015 at 8:22 PM, Rui Ueyama <ruiu at
google.com> wrote:
>>>>
>>>>> On Thu, May 28, 2015 at 6:25 PM, Nick Kledzik <kledzik
at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On May 28, 2015, at 5:42 PM, Sean Silva <chisophugis
at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I guess, looking back at Nick's comment:
>>>>>>
>>>>>> "The atom model is a good fit for the llvm
compiler model for all
>>>>>> architectures. There is a one-to-one mapping between
llvm::GlobalObject
>>>>>> (e.g. function or global variable) and
lld:DefinedAtom."
>>>>>>
>>>>>> it seems that the primary issue on the ELF/COFF side is
that
>>>>>> currently the LLVM backends are taking a finer-grained
atomicity that is
>>>>>> present inside LLVM, and losing information by
converting that to a
>>>>>> coarser-grained atomicity that is the typical
"section" in ELF/COFF.
>>>>>> But doesn't -ffunction-sections -fdata-sections
already fix this,
>>>>>> basically?
>>>>>>
>>>>>> On the Mach-O side, the issue seems to be that
Mach-O's notion of
>>>>>> section carries more hard-coded meaning than e.g. ELF,
so at the very least
>>>>>> another layer of subdivision below what Mach-O calls
"section" would be
>>>>>> needed to preserve this information; currently symbols
are used as a bit of
>>>>>> a hack as this "sub-section" layer.
>>>>>>
>>>>>> I’m not sure what you mean here.
>>>>>>
>>>>>>
>>>>>> So the problem seems to be that the transport format
between the
>>>>>> compiler and linker varies by platform, and each one
has a different way to
>>>>>> represent things, some can't represent everything
we want to do, apparently.
>>>>>>
>>>>>> Yes!
>>>>>>
>>>>>>
>>>>>> BUT it sounds like at least relocatable ELF semantics
can, in
>>>>>> principle, represent everything that we can imagine an
"atom-based file
>>>>>> format"/"native format" to want to
represent. Just to play devil's
>>>>>> advocate here, let's start out with the
"native format" being relocatable
>>>>>> ELF - on *all platforms*. Relocatable object files are
just a transport
>>>>>> format between compiler and linker, after all; who
cares what we use? If
>>>>>> the alternative is a completely new format, then
bootstrapping from
>>>>>> relocatable ELF is strictly less churn/tooling cost.
>>>>>>
>>>>>> People on the "atom side of the fence", what
do you think? Is there
>>>>>> anything that we cannot achieve by saying
"native"="relocatable ELF"?
>>>>>>
>>>>>> 1) Turns out .o files are written once but read many
times by the
>>>>>> linker. Therefore, the design goal of .o files should
be that they are as
>>>>>> fast to read/parse in the linker as possible. Slowing
down the compiler to
>>>>>> make a .o file that is faster for the linker to read is
a good trade off.
>>>>>> This is the motivation for the native format - not that
it is a universal
>>>>>> format.
>>>>>>
>>>>>
>>>>> I don't think that switching from ELF to something new
can make
>>>>> linkers significantly faster. We need to handle ELF files
carefully not to
>>>>> waste time on initial load, but if you do, reading data
required for symbol
>>>>> resolution from ELF file should be satisfactory fast (I did
that for COFF
>>>>> -- the current "atom-based ELF" linker is doing
too much things in an
>>>>> initial load, like read all relocation tables, splitting
indivisble chunk
>>>>> of data and connect them with "indivisible"
edges, etc.) Looks like we read
>>>>> symbol table pretty quickly in the new implementation, and
the bottleneck
>>>>> of it is now the time to insert symbols into the symbol
hash table -- which
>>>>> you cannot make faster by changing object file format.
>>>>>
>>>>> Speaking of the performance, if I want to make a
significant
>>>>> difference, I'd focus on introducing new symbol
resolution semantics.
>>>>> Especially, the Unix linker semantics is pretty bad for
performance because
>>>>> we have to visit files one by one serially and possibly
repeatedly. It's
>>>>> not only bad for parallelism but also for a single-thread
case because it
>>>>> increase size of data to be processed. This is I believe
the true
>>>>> bottleneck of Unix linkers. Tackling that problem seems to
be most
>>>>> important to me, and "ELF as a file format is
slow" is still an unproved
>>>>> thing to me.
>>>>>
>>>>>
>>>>>>
>>>>>> 2) I think the ELF camp still thinks that linkers are
“dumb”. That
>>>>>> they just collate .o files into executable files. The
darwin linker does a
>>>>>> lot of processing/optimizing the content (e.g.
Objective-C optimizing, dead
>>>>>> stripping, function/data re-ordering). This is why
atom level granularity
>>>>>> is needed.
>>>>>>
>>>>>
>>>>> I think that all these things are doable (and are being
done) using
>>>>> -ffunction-sections.
>>>>>
>>>>>
>>>>>>
>>>>>> For darwin, ELF based .o files is not interesting. It
won’t be
>>>>>> faster, and it will take a bunch of effort to figure
out how to encode all
>>>>>> the mach-o info into ELF. We’d rather wait for a new
native format.
>>>>>>
>>>>>
>>>>>
>>>>>> -Nick
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/cc7f2256/attachment.html>