Rodney M. Bates
2014-Apr-11 01:40 UTC
[LLVMdev] Advice on field access, adding a Modula-3 front end
I am doing some preliminary investigation into splicing the Modula-3 compiler front end onto llvm. I have a number of questions and will no doubt have more, but will start by asking for advice on this one. The M3 front end has lowered things farther than the llvm IR expects. Whereas llvm accesses fields/data members of records/structs by field number, M3 has already laid out the format of records, and its IR accesses fields by bit offsets. I could probably create llvm IR in this style by generating explicit address arithmetic, but I suspect that might hurt the optimization possibilities, perhaps a lot. It looks like re-raising the level to field numbers would not be horribly difficult, but it would require using information in the M3 IR that is apparently intended to be debug info only. Also, it looks like M3 IR follows the same principle that llvm does, i.e., that debug information should not affect translation. I presume llvm does its own memory layout for structs? It is worse with global variables and constants. Here, in the M3 IR, for each compilation unit, these have been collected into two records, one for constants and one for variables, with the memory layout within them already done. These are accessed with byte offsets within the two records. What makes it more complicated is that some of the fields are in a fixed layout that the runtime system expects. So to use field number access, I would still need to force llvm to accept the memory layout I supply. Can I do that? Local variables come through at a matching level, so are not a problem. Any advice would be greatly appreciated -- Rodney Bates rodney.m.bates at acm.org
Krzysztof Parzyszek
2014-Apr-11 02:02 UTC
[LLVMdev] Advice on field access, adding a Modula-3 front end
On 4/10/2014 8:40 PM, Rodney M. Bates wrote:> > I could probably create llvm IR in this style by generating explicit > address arithmetic, but I suspect that might hurt the optimization > possibilities, perhaps a lot. It looks like re-raising the level to > field numbers would not be horribly difficult, but it would require > using information in the M3 IR that is apparently intended to be debug > info only.You don't have to "re-raise" it, you may simply manufacture struct types that correspond to the data being used, which shouldn't be too hard if the data accesses to a specific member are always of the same size and type. To avoid problems with the layout differing between targets, you could make the type "packed" and make the padding explicit. This does not solve problems with unions, for which address arithmetic and type casting may be necessary.> I presume llvm does its own memory layout for structs?It uses the data layout that is provided when you create the TargetMachine for a given target. In other words, it can be different for each supported target.> It is worse with global variables and constants. Here, in the M3 IR, > for each compilation unit, these have been collected into two records, > one for constants and one for variables, with the memory layout within > them already done. These are accessed with byte offsets within the > two records. What makes it more complicated is that some of the > fields are in a fixed layout that the runtime system expects. So to > use field number access, I would still need to force llvm to accept > the memory layout I supply. Can I do that?Yes. Make it "packed" and add explicit padding. The only problem may be with translating the debug information. If all global variables became members of some aggregate, then I'm not sure how to generate debug information for them that would preserve original names and other relevant information. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Krzysztof Parzyszek
2014-Apr-11 02:07 UTC
[LLVMdev] Advice on field access, adding a Modula-3 front end
Check out these files. There is a class StructLayout there, whose constructor generates the member offset information. include/llvm/IR/DataLayout.h lib/IR/DataLayout.cpp -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Rodney M. Bates
2014-Apr-11 23:23 UTC
[LLVMdev] Advice on field access, adding a Modula-3 front end
On 04/10/2014 09:02 PM, Krzysztof Parzyszek wrote:> On 4/10/2014 8:40 PM, Rodney M. Bates wrote: >> >> I could probably create llvm IR in this style by generating explicit >> address arithmetic, but I suspect that might hurt the optimization >> possibilities, perhaps a lot. It looks like re-raising the level to >> field numbers would not be horribly difficult, but it would require >> using information in the M3 IR that is apparently intended to be debug >> info only. > > You don't have to "re-raise" it, you may simply manufacture struct types that correspond to the data being used, which shouldn't be too hard if the data accesses to a specific member are always of the same size and type. To avoid problems with the layout differing between targets, you could make the type "packed" and make the padding explicit. This does not solve problems with unions, for which address arithmetic and type casting may be necessary. >Yeah, I think that's kind of what I meant by "re-raise". The field access operators I get have no field-identifying information other than the offset, so I have to go backwards somewhere to find a field sequence number.> >> I presume llvm does its own memory layout for structs? > > It uses the data layout that is provided when you create the TargetMachine for a given target. In other words, it can be different for each supported target. >So it looks like StructLayout::Structlayout does _not_ reorder non-packed fields. Can I rely on this? I thought I remembered reading something to the contrary somewhere in the documentation.> >> It is worse with global variables and constants. Here, in the M3 IR, >> for each compilation unit, these have been collected into two records, >> one for constants and one for variables, with the memory layout within >> them already done. These are accessed with byte offsets within the >> two records. What makes it more complicated is that some of the >> fields are in a fixed layout that the runtime system expects. So to >> use field number access, I would still need to force llvm to accept >> the memory layout I supply. Can I do that? > > Yes. Make it "packed" and add explicit padding. > > The only problem may be with translating the debug information. If all global variables became members of some aggregate, then I'm not sure how to generate debug information for them that would preserve original names and other relevant information. >I think I can handle that eventually, in the debugger itself. We already have a modified gdb that, among many other things, unscrambles access to a global so that it looks normal to the source programmer, using a horribly cobbled up stabs variant. Getting better debug info, in dwarf, is one of my personal motives for this idea.> > -Krzysztof > >-- Rodney Bates rodney.m.bates at acm.org