thr3ads.net - llvm dev - [LLVMdev] Casting between address spaces and address space semantics [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Mon P Wang

2008-Jul-17 16:48 UTC

[LLVMdev] Casting between address spaces and address space semantics

In ISO/IEC WG14 n1169 on the C extensions to support embedded  
processors, any two address spaces must be disjoint, must be  
equivalent, or must be nested.  As Eli indicated, the actual  
relationship is platform specific depending on what makes the most  
sense for your hardware and how the program will behave will depend on  
that relationship.

   -- Mon Ping

On Jul 17, 2008, at 7:25 AM, Eli Friedman wrote:
> On Thu, Jul 17, 2008 at 5:08 AM, Matthijs Kooijman  
> <matthijs at stdin.nl> wrote:
>> Now, we are using a function which reads a value from one of these  
>> memories
>> and does some processing. Since we want to execute this function  
>> for multiple
>> memories, we make it accept a pointer in the generic address space  
>> (ie, no
>> address space attribute):
>>
>>       void do_stuff(char* mem);
>
> The "generic address space" you're referring to is really
just address
> space 0, at least in the current implementation.  Assuming alternate
> address spaces are actually separate, passing a pointer from a
> different address space to this function is undefined behavior.
>
>> However, this brings me to my actual question: How are address spaces
>> semantically defined? I see two options here:
>>
>> a) Every address space has the full range of addresses and they  
>> completely
>> live side by side. This means that, for example, i32 addrspace(1) *  
>> 100 points
>> to a different piece of memory than i32 addrspace(2) * 100. Also,  
>> this means
>> that a bitcast from one address space to another (possibly 0),  
>> makes the
>> pointer point to something different when loaded.
>>
>> b) Every address space is really a subspace of the full range of  
>> addresses,
>> but always disjoint. This means that, for example, i32 addrspace(1)  
>> * 100
>> points to the same memory as i32 addrspace(2) * 100, though one, or  
>> possibly
>> both of them can be invalid (since the pointer lies outside of that  
>> address
>> space). This also means that bitcasting a pointer from one address  
>> space to
>> another doesn't change it's meaning, though it can potentially
>> become invalid.
>
> Address spaces are platform-specific, so you can define them any way
> you want, I suppose.  But if you can map the alternate address spaces
> into address space 0, is there really any point to keeping the other
> address spaces around?
>
> -Eli
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Matthijs Kooijman

2008-Jul-18 21:04 UTC

head link

[LLVMdev] Casting between address spaces and address space semantics

Hi Eli, Mon Ping,

> In ISO/IEC WG14 n1169 on the C extensions to support embedded  
> processors, any two address spaces must be disjoint, must be  
> equivalent, or must be nested.  
Ah, that standard is a lot clearer on this subject than the DSP-C one I read
was.
> As Eli indicated, the actual  relationship is platform specific depending
on
> what makes the most  sense for your hardware and how the program will
behave
> will depend on  that relationship.If I read the standard correctly, the properties of these address spaces can
be fully captured by defining the relationship between every pair of address
spaces (disjoint, identical, subset/superset).

I think it would make sense to make these relationships backend/platform
specific, but for clang and the optimization passes to properly work with
address spaces, these relationships should be known. It seems to me that the
proper way to handle this, is to include this information in the TargetData
class (or whatever platform-specific class is appropriate, I'm not really
into
this currently).

When this information is known, we can define a "any memory" address
space,
which is a superset of the specific address spaces. This will allow a function
to be (temporarily) defined to work with any memory, while still properly
propagating information about the different memories through the program (and
preventing mixup, since you can cast to the "any memory" address
space, but
not back).

This requires clang to know about the relationships between address spaces, so
it knows when to insert implicit casts and when to error out.

This also requires the transformation passes to know about these
relationships, so they know when a bitcast can be removed and types
propagated.


So, is it a plan to make these relationships available to the
frontend/transformations in some way? If so, in what way?

Gr.

Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080718/5bf51108/attachment.sig>

Cédric Venet

2008-Jul-18 21:40 UTC

head link

[LLVMdev] Casting between address spaces and address space semantics

Hi, 
> When this information is known, we can define a "any memory"
address
> space, which is a superset of the specific address spaces. This will
The two uses I see for memory space have completely separate memory space
(even different instruction to access). You can't define an 'any memory
space', if your memory space is already mapped on a global memory, why
bother to use memory space  anyways...

[ the two use cas I see are:
 - on small µC, you have acces to the RAM and to the program Flash as a ROM
and sometimes to EEPROM, each with there own address (not event all the same
size) and own instructions for load/store.
 - on one embedded µC I use, the I/O port isn't mapped in the RAM space and
use different load/store instructions as RAM
]
> allow a function to be (temporarily) defined to work with any memory,
> while still properly propagating information about the different
> memories through the program (and preventing mixup, since you can cast
> to the "any memory" address space, but not back).
I think that what you want is a basic form of C++ template allowing a
function to take any address space. Like template, these function would be
codegened as many time as there is of different (used) combination of
address space in there parameter (since load and store are different for
each address space).


Regards,

-- 
Cédric

Matthijs Kooijman

2008-Jul-21 15:05 UTC

head link

[LLVMdev] Casting between address spaces and address space semantics

Hi all,
> If I read the standard correctly, the properties of these address spaces
can
> be fully captured by defining the relationship between every pair of
address
> spaces (disjoint, identical, subset/superset).
> 
> I think it would make sense to make these relationships backend/platform
> specific, but for clang and the optimization passes to properly work with
> address spaces, these relationships should be known. It seems to me that
the
> proper way to handle this, is to include this information in the TargetData
> class (or whatever platform-specific class is appropriate, I'm not
really into
> this currently).
To have something to actually talk about, I thought I'd create some code.
I've
got a patch attached which consists roughly of two parts: it adds the concept
of address space relations to TargetData, and teaches InstructionCombining to
remove casts based on those relations.

The instcombine is pretty straightforward: If a bitcast changes the address
space of a pointer in a way that is redundant, and all of the users of the
bitcasts can be rewritten to use the original pointer as well (currently this
means loads, stores and geps only), this is done.

The TargetData part is a lot more tricky. I originally planned to simply add a
virtual method to TargetData and create a specific subclass for our compiler,
when I discovered that this is not how TargetData is supposed to work.
Currently, TargetData consists only of a TargetDescription, which is basically
only sizes and alignment info for different types.

I solved this by creating a new class AddrspacesDescriptor, which has a single
virtual method that defines the relationship between address spaces. This
class contains a default implementation and can be subclassed to get a
different implementation.

A reference to a AddrspaceDescriptor is passed to TargetData's constructor,
with a default. This means that all of the current code uses the default
implementation. I'm not so sure what would be the correct way to make the
various tool select other configurations, currently.

Also, I'm quite positive that TargetData is not really the right place to
put
something like this. At least not in the way it is currently written, passing
around references to AddrspacesDescriptor is far from ideal. However, I do not
think we should be trying to define these relations between address spaces in
a text-based format similar to the current TargetDescription. In that way, we
would either loose a lot of flexibility, or end up with a huge
TargetDescription (with one element for each combination of address
spaces...).

So, any comments on the patch and/or hints on how to do things better? Any
thoughts on how selecting a address spaces configuration should work in the
different tools?

Gr.

Matthijs

Index: lib/Transforms/Scalar/InstructionCombining.cpp
==================================================================---
lib/Transforms/Scalar/InstructionCombining.cpp	(revision 53716)
+++ lib/Transforms/Scalar/InstructionCombining.cpp	(working copy)
@@ -400,6 +400,8 @@
     unsigned GetOrEnforceKnownAlignment(Value *V,
                                         unsigned PrefAlign = 0);
 
+    /// Propagate address spaces through bitcasts
+    Instruction *PropagateAddressSpace(BitCastInst &CI);
   };
 }
 
@@ -7841,6 +7843,81 @@
   return 0;
 }
 
+/// Try to propagate the address space from the source of the bitcast into all
+/// of its uses. CI casts between two pointer types with different address
spaces.
+Instruction *InstCombiner::PropagateAddressSpace(BitCastInst &CI) {
+  // Determine the new type. This is the old dest type with the address space
+  // changed. We can't simply use the source type, since the element type
can be
+  // changed in the cast in addition to the address space.
+  const PointerType *DestTy = cast<PointerType>(CI.getDestTy());
+  const PointerType *SrcTy  = cast<PointerType>(CI.getSrcTy());
+  const PointerType *NTy = PointerType::get(DestTy->getElementType(),
SrcTy->getAddressSpace());
+    
+  // Only propagate the address space when the source and dest address spaces
+  // are equivalent, or the source address space is a subset of the target
+  // address space.
+  AddrspacesDescriptor::AddrspaceRelation Rel;
+  Rel = TD->getAddrspaceRelation(SrcTy->getAddressSpace(),
+                                 DestTy->getAddressSpace());
+  if (Rel != AddrspacesDescriptor::Equivalent 
+      && Rel != AddrspacesDescriptor::Subset)
+    return 0;
+
+  // We can propagate the address space if we can modify the type of CI. This
+  // requires that any uses of it should be modifiable. For now, we'll just
+  // propagate if all uses are loads and stores.
+  for (User::use_iterator I = CI.use_begin(), E = CI.use_end(); I != E; ++I) {
+    if (isa<StoreInst>(*I)) {
+      if (I.getOperandNo() != StoreInst::getPointerOperandIndex())
+        return 0; // Can only handle store to this pointer
+    } else if (isa<GetElementPtrInst>(*I)) {
+      if (I.getOperandNo() != GetElementPtrInst::getPointerOperandIndex())
+        return 0;
+    } else if (!isa<LoadInst>(*I)) {
+      // Can't handle other instructions
+      return 0;
+    }
+  }
+  // Insert a new bitcast, if needed
+  Value *NewVal = InsertBitCastBefore(CI.getOperand(0), NTy, CI);
+  
+  for (User::use_iterator I = CI.use_begin(), E = CI.use_end(); I != E; ++I) {
+    Instruction *NewInst;
+    Instruction *OldInst = cast<Instruction>(*I);
+    // Declare this here, so we can reuse the dyn_cast
+    GetElementPtrInst *GEP = NULL;
+    // Create a new instruction
+    if (isa<StoreInst>(*I))
+      NewInst = new StoreInst(I->getOperand(0), NewVal);
+    else if ((GEP = dyn_cast<GetElementPtrInst>(*I))) {
+      // Insert a new GEP
+      std::vector<Value*> Indices(GEP->idx_begin(),
GEP->idx_end());
+      NewInst = GetElementPtrInst::Create(NewVal, Indices.begin(),
Indices.end());
+    } else // Must be LoadInst
+      NewInst = new LoadInst(NewVal);
+    
+    // Replace the old instruction
+    InsertNewInstBefore(NewInst, *OldInst);
+    NewInst->takeName(OldInst);
+    
+    // Put this in a Value*, since InsertBitCastBefore could return a Value*
+    // below.
+    Value *ReplaceWith = NewInst;
+    if (GEP) {
+      // Now insert a bitcast as well. We couldn't do this before, because
the
+      // above two lines don't hold for the GEPs
+      const Type* GEPTy =
PointerType::get(GEP->getType()->getElementType(),
DestTy->getAddressSpace());
+      ReplaceWith = InsertBitCastBefore(NewInst, GEPTy, *GEP);
+    }
+    ReplaceInstUsesWith(*OldInst, ReplaceWith);
+    EraseInstFromFunction(*OldInst);
+
+  }
+  
+  // Return the old instruction, so it can be removed.
+  return &CI;
+}
+
 Instruction *InstCombiner::visitBitCast(BitCastInst &CI) {
   // If the operands are integer typed then apply the integer transforms,
   // otherwise just apply the common ones.
@@ -7873,7 +7950,9 @@
     // If the address spaces don't match, don't eliminate the bitcast,
which is
     // required for changing types.
     if (SrcPTy->getAddressSpace() != DstPTy->getAddressSpace())
-      return 0;
+      // If we're casting to address space zero, see if we can't just
propagate
+      // the address space into the uses of CI.
+      return PropagateAddressSpace(CI);
     
     // If we are casting a malloc or alloca to a pointer to a type of the same
     // size, rewrite the allocation instruction to allocate the
"right" type.
Index: lib/Target/TargetData.cpp
==================================================================---
lib/Target/TargetData.cpp	(revision 53716)
+++ lib/Target/TargetData.cpp	(working copy)
@@ -231,8 +231,8 @@
   }
 }
 
-TargetData::TargetData(const Module *M) 
-  : ImmutablePass((intptr_t)&ID) {
+TargetData::TargetData(const Module *M, const AddrspacesDescriptor
&Addrspaces)
+  : ImmutablePass((intptr_t)&ID), Addrspaces(Addrspaces) {
   init(M->getDataLayout());
 }
 
@@ -604,3 +604,5 @@
 unsigned TargetData::getPreferredAlignmentLog(const GlobalVariable *GV) const {
   return Log2_32(getPreferredAlignment(GV));
 }
+
+const AddrspacesDescriptor TargetData::DefaultAddrspaces;
Index: include/llvm/Target/TargetData.h
==================================================================---
include/llvm/Target/TargetData.h	(revision 53716)
+++ include/llvm/Target/TargetData.h	(working copy)
@@ -63,12 +63,41 @@
   std::ostream &dump(std::ostream &os) const;
 };
 
+/// This class is used to describe relations between different address spaces.
+/// The default implementation defines all different address spaces as
+/// equivalent.
+///
+/// Create a subclass and pass it to TargetData's constructors to define
+/// different behaviour.
+class AddrspacesDescriptor {
+public:
+  typedef enum {
+    Equivalent,
+    Disjoint,
+    Subset,
+    Superset
+  } AddrspaceRelation;
+
+  /// getAddrspaceRelation - Returns the relation between the given address
+  /// spaces. Should be read left to right, for example Subset means that
+  /// Addrspace1 is a subset of Addrspace2.
+  virtual AddrspaceRelation getAddrspaceRelation(
+    int Addrspace1,
+    int Addrspace2) const {
+      return Equivalent;
+  }
+
+  virtual ~AddrspacesDescriptor() {}
+};
+
 class TargetData : public ImmutablePass {
 private:
   bool          LittleEndian;          ///< Defaults to false
   unsigned char PointerMemSize;        ///< Pointer size in bytes
   unsigned char PointerABIAlign;       ///< Pointer ABI alignment
   unsigned char PointerPrefAlign;      ///< Pointer preferred alignment
+  const AddrspacesDescriptor &Addrspaces; ///< Address spaces descriptor
+  static const AddrspacesDescriptor DefaultAddrspaces; ///< Default address
spaces descriptor
 
   //! Where the primitive type alignment data is stored.
   /*!
@@ -109,20 +138,23 @@
   ///
   /// @note This has to exist, because this is a pass, but it should never be
   /// used.
-  TargetData() : ImmutablePass(intptr_t(&ID)) {
+  TargetData() : ImmutablePass(intptr_t(&ID)),
Addrspaces(DefaultAddrspaces) {
     assert(0 && "ERROR: Bad TargetData ctor used.  "
            "Tool did not specify a TargetData to use?");
     abort();
   }
     
   /// Constructs a TargetData from a specification string. See init().
-  explicit TargetData(const std::string &TargetDescription) 
-    : ImmutablePass(intptr_t(&ID)) {
+  explicit TargetData(const std::string &TargetDescription,
+                      const AddrspacesDescriptor &Addrspaces
+                      = DefaultAddrspaces)
+    : ImmutablePass(intptr_t(&ID)), Addrspaces(Addrspaces) {
     init(TargetDescription);
   }
 
   /// Initialize target data from properties stored in the module.
-  explicit TargetData(const Module *M);
+  explicit TargetData(const Module *M, const AddrspacesDescriptor
&Addrspaces
+                                       = DefaultAddrspaces);
 
   TargetData(const TargetData &TD) : 
     ImmutablePass(intptr_t(&ID)),
@@ -130,6 +162,7 @@
     PointerMemSize(TD.PointerMemSize),
     PointerABIAlign(TD.PointerABIAlign),
     PointerPrefAlign(TD.PointerPrefAlign),
+    Addrspaces(TD.Addrspaces),
     Alignments(TD.Alignments)
   { }
 
@@ -240,6 +273,15 @@
   /// requested alignment (if the global has one).
   unsigned getPreferredAlignmentLog(const GlobalVariable *GV) const;
 
+  /// getAddrspaceRelation - Returns the relation between the given address
+  /// spaces. Should be read left to right, for example Subset means that
+  /// Addrspace1 is a subset of Addrspace2.
+  AddrspacesDescriptor::AddrspaceRelation getAddrspaceRelation(int Addrspace1,
+                                                               int Addrspace2)
{
+                            
+    return Addrspaces.getAddrspaceRelation(Addrspace1, Addrspace2);
+  }
+
   static char ID; // Pass identification, replacement for typeid
 };
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080721/e5fb5120/attachment.sig>

Mon P Wang

2008-Jul-21 16:48 UTC

head link

[LLVMdev] Casting between address spaces and address space semantics

Hi Matthijs,

I was going to check in a change into clang assuming that by default,  
address spaces are separate unless there was some additional  
information (which since we didn't have any was always true) so that  
implicit cast will always generate an error.  I'll hold off on this  
until we resolved what we want to do here.

   -- Mon Ping

On Jul 18, 2008, at 2:04 PM, Matthijs Kooijman wrote:
> When this information is known, we can define a "any memory"
address
> space,
> which is a superset of the specific address spaces. This will allow  
> a function
> to be (temporarily) defined to work with any memory, while still  
> properly
> propagating information about the different memories through the  
> program (and
> preventing mixup, since you can cast to the "any memory" address
> space, but
> not back).
>
> This requires clang to know about the relationships between address  
> spaces, so
> it knows when to insert implicit casts and when to error out.

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jul 2008 - [LLVMdev] Casting between address spaces and address space semantics

[LLVMdev] Casting between address spaces and address space semantics

[LLVMdev] Casting between address spaces and address space semantics

[LLVMdev] Casting between address spaces and address space semantics

[LLVMdev] Casting between address spaces and address space semantics

[LLVMdev] Casting between address spaces and address space semantics

Maybe Matching Threads