Chris,> I'm not sure what you mean: do you have a pointer to these, or do you
> have an array of pointers? Do you know the size of the elements? The
> compiler can't lay out a structure or array without knowing the size
> (not just the alignment) of the elements.
I can think of three cases that are important to my research, detailed
below. [1] is solved by global variable alignment; [2] is hacked and
working, through type-alignment---not storage-location
alignment---would make it easier; [3] is unsolved, and we use a
sub-optimal work-around.
One could summarize all three cases succinctly with a simple question:
should alignment be an aspect of a storage location, or of a type? In
llvm, it appears to be an aspect of a storage location. But I think
it should be an aspect of the type.
(Case 1) There is some set of values V that we wish to transfer from
one thread to another. The size and types of V can only be known at
compile time. We want to create a corresponding type struct.V to
carry those values. In our situation, several threads will be writing
to its own structure simultaneously. By aligning the structures to
cacheline boundaries, we can ensure that no two of these structures
occupy the same cacheline, and so no two threads are competing for
that cacheline.
As you noted, this case can be handled by setting an alignment on a
global variable.
(Case 2) We have a group of closely related threads, all running the
same code, but parameterized with some integer instance number. Just
like the previous case, these threads need to communicate some set of
values. When they write their values, they can use their instance
number to index into an array of struct.V's.
But, say two threads i and i+1 are both trying to write to the array.
I would like to ensure that array[i] and array[i+1] lay in different
cachelines. The only way to guarantee that is if each /element/ of
the array is cacheline aligned. In other words, I am not trying to
align the global variable (the array) to a cacheline, rather its
elements.
I accomplish this by first making an assumption about cacheline size,
and then adding a char[] padding element to the structure type. I'd
rather set alignment on the element type.
(Case 3) We have a group of closely related threads, all running the
same code, but parameterized with some integer instance number.
Additionally, we use some external library to provide an inter-thread
communication facility. Each thread must have a distinct handle
object for that library.
The handle objects are complicated: many fields, many alignment rules,
etc. Also, as our research has progressed, the internal structure of
the handle object has been revised several times. For these reasons,
we would like to maintain the handle-type in llvm as an Opaque type.
Also, initialization of these handle objects is relatively slow. A
comparison could be made to a thread pool: you initialize a pool when
you start your program so you don't have to pay the thread-spawn
penalty inside a hot loop. For this reason, we initializie the
handles within static constructors. Therefore, we cannot easily place
them in thread-local storage. A global variable is the natural place
to put them.
I would like to create an array of handles so that the threads can
simply index into the array by their instance number. Like before, I
want to guarantee that if two threads i and i+1 try to access their
handles in the handles array, that handles[i] and handles[i+1] do not
share a cacheline.
In this case, I cannot even manually pad the arrays, because I cannot
determine the size of the OpaqueType handle. Ideally, I want to tell
llvm that, no matter how big or small a handle is, it should ensure
that each element of the array is a multiple of cacheline size.
Again, this is an alignment property of the array /element/ type, not
of the array storage location.
To work around this, we must use pointers to handles, and store those
pointers within structures that have been padded to a cacheline size.
Again, that's 8-bytes of pointer, and 120-bytes of padding.
Additionally, this forces us to dynamically allocate storage for the
handles, even though I'd prefer to have them statically allocated.
I'd really appreciate to hear your thoughts on this matter. Or, maybe
I'm missing some obvious solution...
--
Nick Johnson