thr3ads.net - llvm dev - [llvm-dev] High Performance containers [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Francois Fayard via llvm-dev

2017-Aug-16 10:24 UTC

[llvm-dev] High Performance containers

Hi,

Let me present myself : I work on High Performance Computing, mainly on number
crunching where the languages used are mainly Fortran and C++. This field is
moving more and more from pure number crunching where Fortran shines to a mix of
numbers, texts and other data that you cannot easily deal with Fortran.
Unfortunately, the C++ standard library suffers from its age and the fact that
it has never been thought for performance.

As a consequence, I develop an Open Source library of containers and utility
that could be useful for HPC people. But the more I look at LLVM, the more I
find that our problems are very close. Moreover, I find it disappointing that
when Chandler Carruth gives a talk about LLVM containers, people cannot ask for
a standalone library that they can use in their code. It turns out that the
world I live in is very close to the LLVM world:
- No exceptions (painful with highly multithreaded applications, and when mixing
languages)
- Need an efficient array container with small size optimization
- Need an efficient hash map, hash set (with open addressing)
- Need an efficient string that can plays smoothly with UTF8 and its folklore
(filenames being byte arrays on Linux, UCS2/UTF16 on Windows, etc)
- Need an easily used formatting library (the way Python does with format)
- Need an easy way to instrument the containers (such as checking statistics on
malloc size, on the number of copies vs moves, etc)
- Need an efficient way to “return errors” that must be checked with rich type
information
- Using the same Open Source licence

In the end, I believe that LLVM problems for performance are common to many
people. My background is mainly on x86-64, ILP, vectorization, multithreading
and memory layout optimizations. I am sure that mixing different background can
make a great library that could be useful to many C++ developers. What I am
looking for is for developers with LLVM experience in core containers design too
share ideas to build such a great library. The problem being that an API can
sometimes kill performance, it is very important to share experience when
designing it.

The project is still young and available here :
https://github.com/insideloop/InsideLoop
<https://github.com/insideloop/InsideLoop>

Let me know if you think that such a library could be useful for you and if you
would like to contribute. And, if it is a success, in a few years, why not using
it in some LLVM parts...

François Fayard
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170816/2e51e358/attachment.html>

Dean Michael Berris via llvm-dev

2017-Aug-17 08:23 UTC

head link

[llvm-dev] High Performance containers

Hi Francois,

Have you looked at the ADT library in LLVM, and have you considered contributing
to LLVM directly (and improving the available data structures / algorithms in
the codebase)?

I understand that might not meet the goal of something that is released and
supported by the LLVM project (i.e. a standalone containers/adapters library)
but I suspect something that developers working on LLVM passes and/or the
compilers can use.

Good luck with the project, BTW. :)

Cheers
> On 16 Aug 2017, at 20:24, Francois Fayard via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> Let me present myself : I work on High Performance Computing, mainly on
number crunching where the languages used are mainly Fortran and C++. This field
is moving more and more from pure number crunching where Fortran shines to a mix
of numbers, texts and other data that you cannot easily deal with Fortran.
Unfortunately, the C++ standard library suffers from its age and the fact that
it has never been thought for performance.
> 
> As a consequence, I develop an Open Source library of containers and
utility that could be useful for HPC people. But the more I look at LLVM, the
more I find that our problems are very close. Moreover, I find it disappointing
that when Chandler Carruth gives a talk about LLVM containers, people cannot ask
for a standalone library that they can use in their code. It turns out that the
world I live in is very close to the LLVM world:
> - No exceptions (painful with highly multithreaded applications, and when
mixing languages)
> - Need an efficient array container with small size optimization
> - Need an efficient hash map, hash set (with open addressing)
> - Need an efficient string that can plays smoothly with UTF8 and its
folklore (filenames being byte arrays on Linux, UCS2/UTF16 on Windows, etc)
> - Need an easily used formatting library (the way Python does with format)
> - Need an easy way to instrument the containers (such as checking
statistics on malloc size, on the number of copies vs moves, etc)
> - Need an efficient way to “return errors” that must be checked with rich
type information
> - Using the same Open Source licence
> 
> In the end, I believe that LLVM problems for performance are common to many
people. My background is mainly on x86-64, ILP, vectorization, multithreading
and memory layout optimizations. I am sure that mixing different background can
make a great library that could be useful to many C++ developers. What I am
looking for is for developers with LLVM experience in core containers design too
share ideas to build such a great library. The problem being that an API can
sometimes kill performance, it is very important to share experience when
designing it.
> 
> The project is still young and available here :
https://github.com/insideloop/InsideLoop
<https://github.com/insideloop/InsideLoop>
> 
> Let me know if you think that such a library could be useful for you and if
you would like to contribute. And, if it is a success, in a few years, why not
using it in some LLVM parts...
> 
> François Fayard
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- Dean

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/1a04a81a/attachment.html>

Francois Fayard via llvm-dev

2017-Aug-17 11:39 UTC

head link

[llvm-dev] High Performance containers

Hi Dean. Thanks for your reply.

The ADT library is exactly what I end up replicating. My library started 2 years
ago and at the beginning, what I needed was very different from LLVM. My first
containers were:

- A custom std::vector that does not initialize elements to 0 for int, double,
etc. This is very important in HPC for the first touch policy used on NUMA
architectures. It also allows alignement to vector width which can be important
for performance with vectorization.
- A small size optimized vector
- Multidimensional arrays

That’s later, when I discovered Chandler Carruth’s talks, that I discovered that
I was not the only one having issues with the STL. My hash map, is almost the
same as LLVM one. I also have a hash set, a view on a string, and a unicode
friendly string which can handle UTF8 as an invariant and which is implemented
the same way std::string is implemented in libc++.

Where I might bring some help, is with the probing method in map and with the
default hashing functions. There are some hashing strategies such as robin hood
hashing that might be worth trying. Also I know, that the hashing strategy for
integers in LLVM is suboptimal. But I am not sure it would give a lot of help as
I don’t think LLVM hashes a lot on integers.

François Fayard
> On Aug 17, 2017, at 10:23 AM, Dean Michael Berris <dean.berris at
gmail.com> wrote:
> 
> Hi Francois,
> 
> Have you looked at the ADT library in LLVM, and have you considered
contributing to LLVM directly (and improving the available data structures /
algorithms in the codebase)?
> 
> I understand that might not meet the goal of something that is released and
supported by the LLVM project (i.e. a standalone containers/adapters library)
but I suspect something that developers working on LLVM passes and/or the
compilers can use.
> 
> Good luck with the project, BTW. :)
> 
> Cheers
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/32df1584/attachment.html>

Joerg Sonnenberger via llvm-dev

2017-Aug-17 11:54 UTC

head link

[llvm-dev] High Performance containers

On Wed, Aug 16, 2017 at 12:24:33PM +0200, Francois Fayard via llvm-dev
wrote:> Unfortunately, the C++ standard library suffers from its age and the
> fact that it has never been thought for performance.
Just a friendly comment: opening remarks like that has a high chance to
annoy people to just ignore the rest of the post. Claiming that STL
design doesn't care about performance is not only ignorant, but
blatantly false. One of the major contributions STL had from the
beginning is requiring algorithmic complexity for many important
algorithms. The specific implementation choices might not agree
with your specific environment, but that doesn't mean they haven't been
carefully made. That's exactly where many LLVM ADT entries came from: a
specific problem with a big enough impact on the total design and/or
runtime that can be optimized for those constraints.

Joerg

Francois Fayard via llvm-dev

2017-Aug-17 12:10 UTC

head link

[llvm-dev] High Performance containers

Thanks for your friendly comment. I agree that I was a bit rough with the STL.
To be more specific, all those features can be performance issue in the STL. And
I thought it was a given in the LLVM community.

- there is now small array optimization in the STL
- The usage of unsigned int (hello std::size_t) prevents so many optimizations.
Vectorization could be one of them.
- std::unordered_map cannot be implemented with open addressing. Same for
std::unordered_set
- Some API seems to be designed to shoot yourself in the foot performance wise,
such as + for concatenating strings whereas an API such contact(s0, s1, …, sn)
will never create temporaries
- Default initializations of elements in containers such as std::vector make it
impossible to tune for NUMA architectures unless you really want to use custom
allocators which will give you more pain than solutions.
- Mathematical functions such as std::pow are a joke. I am sure it does not
affect STL people, but do you realize that the C++11 standard obliges the
implementers to treat std::pow(x, 3) with x as a float with the following
conversions : first convert x to double, then compute its cube, then convert it
back to float. With C++03, the problem was not there. (
http://en.cppreference.com/w/cpp/numeric/math/pow )

The LLVM team seems to have done a great job reimplementing all those containers
in a more efficient way which optimizations which are not at all specific to the
STL. I am sure all the great idea you had could be useful to so many people in
other field.

François Fayard
> On Aug 17, 2017, at 1:54 PM, Joerg Sonnenberger via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> On Wed, Aug 16, 2017 at 12:24:33PM +0200, Francois Fayard via llvm-dev
wrote:
>> Unfortunately, the C++ standard library suffers from its age and the
>> fact that it has never been thought for performance.
> 
> Just a friendly comment: opening remarks like that has a high chance to
> annoy people to just ignore the rest of the post. Claiming that STL
> design doesn't care about performance is not only ignorant, but
> blatantly false. One of the major contributions STL had from the
> beginning is requiring algorithmic complexity for many important
> algorithms. The specific implementation choices might not agree
> with your specific environment, but that doesn't mean they haven't
been
> carefully made. That's exactly where many LLVM ADT entries came from: a
> specific problem with a big enough impact on the total design and/or
> runtime that can be optimized for those constraints.
> 
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/6bf19416/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Aug 2017 - High Performance containers

[llvm-dev] High Performance containers

[llvm-dev] High Performance containers

[llvm-dev] High Performance containers

[llvm-dev] High Performance containers

[llvm-dev] High Performance containers

Maybe Matching Threads