thr3ads.net - llvm dev - [llvm-dev] [frontend-dev][beginner] Allocation of structures [May 2017]

If this information is useful, please help other people find it:
Share via:

Dimitri Racordon via llvm-dev

2017-May-30 01:14 UTC

[llvm-dev] [frontend-dev][beginner] Allocation of structures

Hi all,

I’m pretty new to the list, and to LLVM in general, so please excuse my extreme
newbiesness.

I’m trying to figure out what would be the appropriate way to implement move
semantics.
I’ve been trying to dump the IR produced by clang with some basic C++ snippet,
but I’m afraid it didn’t help me much.

Here’s the example I’ve been playing with (in C++):

struct S {
  S() noexcept: x(new int) {}
  S(S&& other) {
    x = other.x
    other.x = nullptr;
  }
  ~S() {
    delete x;
  }
};

S f1() {
  auto s = S();
  return s;
}

S f2() {
  auto s = S();
  return std::move(s);
}

This of course produces a lot of LLVM code (with -O0), but I think I may have
figured out most of what’s what. In particular, I’ve been able to identify the
IR code for `f1` and `f2`, but to my surprise, neither of those return a value.
Both take a pointer to `S` as parameter, which in turn gets passed to the
constructor of `S`, and return void.

This leaves me with two main questions:

  *   First, is the use of a pointer to S as parameter a specificity of clang,
or generally the way to go? I’ve seen in the language reference that one could
return a struct with a simple ret instruction, so I’m surprised not to see it
for the version that doesn’t use move semantics.
  *   Second, would I use a non-void ret instruction to return the result of an
alloca, when would the latter be destroyed? Would that involve a copy from the
runtime stack of the callee to that of the caller?

Thank you very much for your time and your answer,

Best,


Dimitri Racordon
CUI, Université de Genève
7, route de Drize, CH-1227 Carouge - Switzerland
Phone: +41 22 379 01 24




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/22b4026c/attachment.html>

Davide Italiano via llvm-dev

2017-May-30 01:20 UTC

head link

[llvm-dev] [frontend-dev][beginner] Allocation of structures

On Mon, May 29, 2017 at 6:14 PM, Dimitri Racordon via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi all,
>
> I’m pretty new to the list, and to LLVM in general, so please excuse my
> extreme newbiesness.
>
> I’m trying to figure out what would be the appropriate way to implement
move
> semantics.
> I’ve been trying to dump the IR produced by clang with some basic C++
> snippet, but I’m afraid it didn’t help me much.
>
> Here’s the example I’ve been playing with (in C++):
>
> struct S {
>   S() noexcept: x(new int) {}
>   S(S&& other) {
>     x = other.x
>     other.x = nullptr;
>   }
>   ~S() {
>     delete x;
>   }
> };
>
> S f1() {
>   auto s = S();
>   return s;
> }
>
> S f2() {
>   auto s = S();
>   return std::move(s);
> }
>
> This of course produces a lot of LLVM code (with -O0), but I think I may
> have figured out most of what’s what. In particular, I’ve been able to
> identify the IR code for `f1` and `f2`, but to my surprise, neither of
those
> return a value. Both take a pointer to `S` as parameter, which in turn gets
> passed to the constructor of `S`, and return void.
>
See https://en.wikipedia.org/wiki/Return_value_optimization

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Sean Silva via llvm-dev

2017-May-31 21:32 UTC

head link

[llvm-dev] [frontend-dev][beginner] Allocation of structures

On Mon, May 29, 2017 at 6:14 PM, Dimitri Racordon via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> I’m pretty new to the list, and to LLVM in general, so please excuse my
> extreme newbiesness.
>
> I’m trying to figure out what would be the appropriate way to implement
> move semantics.
> I’ve been trying to dump the IR produced by clang with some basic C++
> snippet, but I’m afraid it didn’t help me much.
>
Move semantics in C++ are just a mechanism to use overload resolution to
select a different overload (see
http://en.cppreference.com/w/cpp/language/move_constructor). For example,
if you think about how to map your example C++ code down to equivalent C,
you'll see that in that process you have fully resolved the "move
semantics". At the LLVM level (or the C level), there are no "move
semantics". std::move is basically just a cast that creates a `S&&`
which
will then select the right overload.

> Here’s the example I’ve been playing with (in C++):
>
> struct S {
>   S() noexcept: x(new int) {}
>   S(S&& other) {
>     x = other.x
>     other.x = nullptr;
>   }
>   ~S() {
>     delete x;
>   }
> };
>
> S f1() {
>   auto s = S();
>   return s;
> }
>
> S f2() {
>   auto s = S();
>   return std::move(s);
> }
>
> This of course produces a lot of LLVM code (with -O0), but I think I may
> have figured out most of what’s what. In particular, I’ve been able to
> identify the IR code for `f1` and `f2`, but to my surprise, neither of
> those return a value. Both take a pointer to `S` as parameter, which in
> turn gets passed to the constructor of `S`, and return void.
>
> This leaves me with two main questions:
>
>    - First, is the use of a pointer to S as parameter a specificity of
>    clang, or generally the way to go? I’ve seen in the language reference
that
>    one could return a struct with a simple ret instruction, so I’m
surprised
>    not to see it for the version that doesn’t use move semantics.
>
> This a language ABI question. There's really nothing LLVM-specific
aboutit. I would recommend thinking about it in terms of how to map C++ to C. As
Davide pointed out, RVO is one reason to choose this particular lowering.
These decisions are made inside Clang (not LLVM) and are mandated by the
ABI (all compilers must implement them the same way for code to be able to
be linked together and work). If you want all the gory details, the itanium
C++ ABI is documented at https://itanium-cxx-abi.github.io/cxx-abi/abi.html
(this is the C++ ABI used on basically all platforms except MSVC; there's a
historical connection to itanium but nothing specific to that processor
about it).

In particular, the description of how to lower return values is
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#return-value

Note that the C++ ABI is phrased in terms of the underlying C ABI (which is
processor-specific (and generally not OS-specific unless you care about
Windows vs non-Windows)), so familiarity with the C ABI is useful too;
documents for the C ABI of different processors can be found in
http://llvm.org/docs/CompilerWriterInfo.html e.g. the "X86 and X86-64 SysV
psABI". The C ABI may seem overwhelming (lots of processor details, corner
cases, etc.), but the basic gist is that there's a list of registers and
each argument is assigned in turn from that list (if the register list is
exhausted, the rest are passed through memory). This is easy as long as
everything (return value and argument types) are int, long, pointer, or
some other primitive type that fits in a register. Struct types are
decomposed into multiple primitive types according to certain rules, until
everything is in terms of primitive types.

e.g. if you understand the examples in https://godbolt.org/g/qSdzHj then
you pretty much have a basic understanding of the C parameter passing ABI.
The register lists for x86-64 are rdi, rsi, ... for arguments and rax, rdx,
... for return values . (Also look at the corresponding LLVM IR).

Very few people actually know the precise rules in detail. However,
basically all LLVM developers (or more generally toolchain developers:
compiler, linker, debugger, etc.) will know the basic rules (like the
examples I linked above) and will know how to look up specifics in the
psABI document as needed (and probably 90% of the time just looking at
Clang's output on an example will answer a particular question; for
example, I forgot that it as rdx as the second register for return values;
all I remembered was that there were two return registers and the first was
rax).

If you're interested, these are the notes I took when I was initially
learning about this:
https://github.com/chisophugis/x64-Forth/blob/master/abi.txt
(woah this takes me back)
That Forth implementation doesn't actually call any external libraries, so
the only external ABI it cares about is the Linux syscall ABI, which I
recorded in this comment for my own memory
https://github.com/chisophugis/x64-Forth/blob/master/compile4.asm#L13
(yes, this was before I learned to use version control...)
The internal "ABI" used by this small Forth implementation is
described in
https://github.com/chisophugis/x64-Forth/blob/master/compile4.asm#L6
(forth is a very low-level language so it needs its own processor-specific
ABI; most languages essentially just piggy-back on the C ABI for
processor-specific stuff)

LLVM handles all of this C ABI stuff for you (and there's other aspects of
the C ABI like stack alignment/layout, TLS access, relocations, etc.).
There's quite a bit of essential complexity, but as long as you stick to
the simple cases where the mapping from C is trivial, it's easy to
understand. There's a pretty deep rabbit hole though if you start getting
into complicated cases, but it's all just a relatively simple extension of
the cases that map trivially to C. For the most part, you're only
interaction with the rabbit hole will be needing to debug when things go
wrong, which will require understanding the basic concepts (which can be
understood via simple C examples) and then drill down as needed (e.g.
looking at what clang does, looking at the standard docs, etc.). This is in
some sense simple even for complex cases in that it doesn't require any
complex insight to understand harder cases; you're just verifying
assumptions against a list of rules.

That may sound scary, but as long as the IR your frontend generates is
internally consistent at the LLVM IR level it will be ABI compatible with
itself (modulo bugs in LLVM), so you can basically ignore it. You'll have
to have some familiarity with the C ABI in order to e.g. call an external
function like malloc in libc, but again, as long as the parameter types are
"simple" then the mapping between C and LLVM IR is very simple;
you'll need
to have a basic understanding though in order to verify this though and
debug any think-o's.
>
>    - Second, would I use a non-void ret instruction to return the result
>    of an alloca, when would the latter be destroyed? Would that involve a
copy
>    from the runtime stack of the callee to that of the caller?
>
> The result of an alloca is a pointer to the stack frame of the currentfunction, so returning it doesn't make sense (like returning the address of
a local variable in C). Again, this problem isn't really related to LLVM
per se and the easiest way to think about it is in terms of how to lower to
C.
Decisions about object destruction are up to the frontend. At the C or LLVM
level they just turn into function calls or whatever you use to implement
the semantics that the frontend requires. In other words, if you want an
object to be destroyed, it's up to your frontend to emit code to perform
the destruction wherever the destruction is expected to occur.

-- Sean Silva

>
> Thank you very much for your time and your answer,
>
> Best,
>
>
> Dimitri Racordon
> CUI, Université de Genève
> 7, route de Drize, CH-1227 Carouge - Switzerland
> Phone: +41 22 379 01 24 <+41%2022%20379%2001%2024>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/b0f69138/attachment.html>

Dimitri Racordon via llvm-dev

2017-Jun-01 13:47 UTC

head link

[llvm-dev] [frontend-dev][beginner] Allocation of structures

Thanks for this comprehensive answer!
I’m slowly but surely to understand better about all this.

That may sound scary,

Yes it does! That said, I’m not sure need to care that much about the C++ ABI,
as I don’t plan on linking my IR with C++ directly. It is very interesting to
see how it works though, as I’ll probably need to solve the same kind of
problems for my own frontend language.

The result of an alloca is a pointer to the stack frame of the current function,
so returning it doesn't make sense (like returning the address of a local
variable in C). Again, this problem isn't really related to LLVM per se and
the easiest way to think about it is in terms of how to lower to C.
Decisions about object destruction are up to the frontend. At the C or LLVM
level they just turn into function calls or whatever you use to implement the
semantics that the frontend requires. In other words, if you want an object to
be destroyed, it's up to your frontend to emit code to perform the
destruction wherever the destruction is expected to occur.

I understand my second question was probably very unclear. When I said
“destroyed”, I wasn’t talking about calls to C++ (or any other frontend
language) destructors, but more about what is happening to the memory that was
stack allocated. Let’s consider the following IR:

define i32 @f() {
entry:
  %0 = alloca i32
  store i32 123, i32* %0
  %1 = load i32, i32* %0
  ret i32 %1
}

define i32 @main() {
entry:
  %0 = alloca i32
  %1 = call i32 @f()
  store i32 %1, i32* %0
  %2 = load i32, i32* %0
  ret i32 %2
}

As I understand it, f will allocate a new 32 bit memory space on its stack
frame, and stores the value 123 in it. But what I’m not sure about is what’s
happening to that alloca when f returns. If the alloca is in the frame of f,
then the value stored at that location should be deallocated (which is what I
meant by "destroyed") when f returns, and so I don’t understand what
makes the `store i32 %1, i32* %0` successfully storing 123 in the alloca of the
main function. If I had to make a guess, I would say the value pointed by %0 in
f is copied to the alloca pointed by %0 in main (which is what I meant by “copy
from the runtime of the callee to that of the caller”). With primitive types
such as i32, this probably doesn't matter, since they would fit in a CPU
register, but what if %0 was an alloca for a very large struct? I guess that’s
what RVO is all about optimising, but I’d like to confirm that assumption.

Once again, I apologise for my poor knowledge of these kind of low-level
semantics.

Best,

Dimitri Racordon
CUI, Université de Genève
7, route de Drize, CH-1227 Carouge - Switzerland
Phone: +41 22 379 01 24

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170601/87e32850/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - May 2017 - [frontend-dev][beginner] Allocation of structures

[llvm-dev] [frontend-dev][beginner] Allocation of structures

[llvm-dev] [frontend-dev][beginner] Allocation of structures

[llvm-dev] [frontend-dev][beginner] Allocation of structures

[llvm-dev] [frontend-dev][beginner] Allocation of structures

Apparently Analagous Threads