I realize I am one of the few who uses the segment registers (especially CS and DS) on the ia32 chips for example, and a definite few with complete segregation models that rival specialized physical processors... GCC still fails to use these correctly and if your LLVM still depends on either Generic or some of the RTL models they use in various processor definitions, I express concern for optimization and compilation. Please at least hint that you intend to optimize and compile using all functionality of the processor; gcc compiles to binaries 2-3x slower in many projects due to this assinine problem historicly. In the machines with far more advanced registers, this is debilitating. Please inform. -Wilfred WilfredGuerin at gmail.com
> if your LLVM still depends on either Generic or some of the > RTL models they use in various processor definitions, I > express concern for optimization and compilation.Thank you for your concerns. However, LLVM is in the first place an environment to write compilers. As an example for llvm, it can use the GCC frontend for compilation. "clang" is another compile, that can compile programs without any call to any gcc part. To learn more about code-generation for x86 targets inside LLVM (e.g. without the help of GCC), look at those files: http://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/X86 Also, can you provide an example of the same program, once compiled with the use of CS/DS and once without this? I mean: show use the assembly code. How far apart is the performance of those two test programs? For which ABI can you compile with using CS/DS? AFAIK a Linux environment disables this. Somehow your mail remembers me at times than I compiled under MSDOS and I had several memory models to select from, e.g. "tiny", "medium", "large", "flat" ...
fucking hell, listserv... ---------- Forwarded message ---------- From: "Wilfred L. Guerin" <wilfredguerin at gmail.com> Date: Wed, 25 Jul 2007 10:54:46 -0500 Subject: Re: [LLVMdev] Segment Register Use To: Holger Schurig <hs4233 at mail.mn-solutions.de> I was very much expecting this style of response ;) I believe the following characteristics and class of example should demonstrate the concerns in all models, as I know significant research in massive data tables has been undertaken for decades... First: In favouring linear processing paths, it is more common to use a computational sequencing method and never use a push/pop stack under any circumstance (4:1 opt) Partitioning of seg registers is good for security but irrelevant in this example, both the code space and data space use identical management models. Methods: A sequential lookup table (turnary tree, reference table, "nibbler") is used for memory lookup using conventional infinite-bitlength identifiers using any size partition and turnary size... table has id1, it has 256 blocks (mem ptr or similar ID), value of turnary is n (of 8bit) and size of block is static per table. regardless of using id based lookups for data or memory locations, the process is identical: shift turnary scope, and turnary 0x0scope, mul(shl) by block size, get offset value, (typically confirm need to further recurse) then repeat with turnary until done. The final memory position is commonly either assigned as the base pointer for the requested data structure or code offset. Code is compiled for local memory values (static) relative cs and known memory locations for the data in the modular process (ds) ..... Here the model becomes critical: In complex vector comparitors, high dimension math, and dense data sets (with potentially distributed media storage being loaded), a limited processor has few options (ia for now). To compare characteristics of 2 or more "vectors" and write the result to a new data container, it needs obvious access to these locations. ia flat memory model makes this easy, and well partitoned cs sector (isolated like buffers) max available speed. The static-seeded tables are recursed, the result location is set to the offset register accordingly. let's assume code does something stupid like basic math or propogate the larger value from the sources. Using DS is obvious, but also due to inlined control methods and predictable flow routing and modeling, the stack offset register as well as a few others are available. Depending on processor, the computational registers can use these segments the way they were designed. ia32 requires one operand of compare in a comp register, others can handle offset-offset compare in one cycle. compreg = ds+numofblock*width when you are using TWO stack pointers, ds, and whatever else available, you end up with all available data for the vector comparison within one flop and an offset within the vector with one rol function on the control register. most accumulators can handle this easily. ... There are many names for this, but it differs from a common vector math inline because of processing method. I forget the common textbook term. ... The named variable convention in llvm is desirable because it allows the selection of which base pointer to use for each table reference based on what computaions go there. One of the ia32 is grand for writing to*, but cant be used to move mem to reg. Obviously this is an intentional design. Optimizer should handle this based on machine type and a model of hardware capabilities. (again back to HDL and the anchient ISDL specs for modeling.. but avoiding the new uml except at higher layers) so. a table that resolves to various data blocks, set the base pointers accordingly, then process. morphic code and code coppied to a new cs for runtime (load to chip) can have its own buffers, stacks are idiotic. (this directs post-operation routing and allows distributed processing when using the extra table with ids) in short, trying to do such a thing for complex data sets, especially when the result set or value dictates where to go next, using the linear sequence lookup to a register as defined in most current compilers, simply cant work. Doing so is (assume 2 usable registers for data) (lookupflops*numdatablocks)^2 for math processing. Even linearized, you need a test for eol using stack and ... well stacks break with recursion. Im sure there are documents describing this type and other similar models that function transparently on any hardware host. Obviously preparing large data sets in this manner is required to optimize loading of limited memory space processors like the GPU boards when using conventional CPU dependant controllers. (good example is ia64 with nvidia quad gpu) Hopefully someone can give a link to a simple overview of this issue, but im sure the concept is easy enough to understand. -Wilfred L. Guerin WilfredGuerin at gmail.com On 7/25/07, Holger Schurig <hs4233 at mail.mn-solutions.de> wrote:> > if your LLVM still depends on either Generic or some of the > > RTL models they use in various processor definitions, I > > express concern for optimization and compilation. > > Thank you for your concerns. > > However, LLVM is in the first place an environment to write > compilers. As an example for llvm, it can use the GCC frontend > for compilation. "clang" is another compile, that can compile > programs without any call to any gcc part. > > To learn more about code-generation for x86 targets inside LLVM > (e.g. without the help of GCC), look at those files: > > http://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/X86 > > > Also, can you provide an example of the same program, once > compiled with the use of CS/DS and once without this? I mean: > show use the assembly code. How far apart is the performance of > those two test programs? For which ABI can you compile with > using CS/DS? AFAIK a Linux environment disables this. > > Somehow your mail remembers me at times than I compiled under > MSDOS and I had several memory models to select from, > e.g. "tiny", "medium", "large", "flat" ... >
On Wed, 25 Jul 2007, Wilfred L. Guerin wrote:> fucking hell, listserv...Wilfred, Your emails are not in the spirit of the list. This list is for friendly discussion of LLVM related topics. If you don't intend to contribute in a positive way, please take your attitude and language elsewhere. I've set your account to require moderation before your posts go through. If you would like to contribute constructively to the list, and demonstrate it with your future posts, I'd be happy to reinstate your full access. If not, I will eventually block you from posting to the list entirely. Thanks for your understanding, -Chris> ---------- Forwarded message ---------- > From: "Wilfred L. Guerin" <wilfredguerin at gmail.com> > Date: Wed, 25 Jul 2007 10:54:46 -0500 > Subject: Re: [LLVMdev] Segment Register Use > To: Holger Schurig <hs4233 at mail.mn-solutions.de> > > I was very much expecting this style of response ;) > > I believe the following characteristics and class of example should > demonstrate the concerns in all models, as I know significant research > in massive data tables has been undertaken for decades... > > First: > > In favouring linear processing paths, it is more common to use a > computational sequencing method and never use a push/pop stack under > any circumstance (4:1 opt) > > Partitioning of seg registers is good for security but irrelevant in > this example, both the code space and data space use identical > management models. > > Methods: > > A sequential lookup table (turnary tree, reference table, "nibbler") > is used for memory lookup using conventional infinite-bitlength > identifiers using any size partition and turnary size... > > table has id1, it has 256 blocks (mem ptr or similar ID), value of > turnary is n (of 8bit) and size of block is static per table. > regardless of using id based lookups for data or memory locations, the > process is identical: > > shift turnary scope, and turnary 0x0scope, mul(shl) by block size, get > offset value, (typically confirm need to further recurse) then repeat > with turnary until done. > > The final memory position is commonly either assigned as the base > pointer for the requested data structure or code offset. > > Code is compiled for local memory values (static) relative cs and > known memory locations for the data in the modular process (ds) > > ..... > > Here the model becomes critical: > > In complex vector comparitors, high dimension math, and dense data > sets (with potentially distributed media storage being loaded), a > limited processor has few options (ia for now). > > To compare characteristics of 2 or more "vectors" and write the result > to a new data container, it needs obvious access to these locations. > > ia flat memory model makes this easy, and well partitoned cs sector > (isolated like buffers) max available speed. > > The static-seeded tables are recursed, the result location is set to > the offset register accordingly. > > let's assume code does something stupid like basic math or propogate > the larger value from the sources. > > Using DS is obvious, but also due to inlined control methods and > predictable flow routing and modeling, the stack offset register as > well as a few others are available. > > Depending on processor, the computational registers can use these > segments the way they were designed. > > ia32 requires one operand of compare in a comp register, others can > handle offset-offset compare in one cycle. > > compreg = ds+numofblock*width > > when you are using TWO stack pointers, ds, and whatever else > available, you end up with all available data for the vector > comparison within one flop and an offset within the vector with one > rol function on the control register. > > most accumulators can handle this easily. > > ... > > There are many names for this, but it differs from a common vector > math inline because of processing method. > > I forget the common textbook term. > > ... > > The named variable convention in llvm is desirable because it allows > the selection of which base pointer to use for each table reference > based on what computaions go there. One of the ia32 is grand for > writing to*, but cant be used to move mem to reg. > > Obviously this is an intentional design. > > Optimizer should handle this based on machine type and a model of > hardware capabilities. > > (again back to HDL and the anchient ISDL specs for modeling.. but > avoiding the new uml except at higher layers) > > so. > > a table that resolves to various data blocks, set the base pointers > accordingly, then process. > > morphic code and code coppied to a new cs for runtime (load to chip) > can have its own buffers, stacks are idiotic. (this directs > post-operation routing and allows distributed processing when using > the extra table with ids) > > > in short, trying to do such a thing for complex data sets, especially > when the result set or value dictates where to go next, using the > linear sequence lookup to a register as defined in most current > compilers, simply cant work. > > Doing so is (assume 2 usable registers for data) > (lookupflops*numdatablocks)^2 for math processing. > > Even linearized, you need a test for eol using stack and ... well > stacks break with recursion. > > Im sure there are documents describing this type and other similar > models that function transparently on any hardware host. > > Obviously preparing large data sets in this manner is required to > optimize loading of limited memory space processors like the GPU > boards when using conventional CPU dependant controllers. (good > example is ia64 with nvidia quad gpu) > > Hopefully someone can give a link to a simple overview of this issue, > but im sure the concept is easy enough to understand. > > -Wilfred L. Guerin > WilfredGuerin at gmail.com > > > > > > > > > On 7/25/07, Holger Schurig <hs4233 at mail.mn-solutions.de> wrote: >>> if your LLVM still depends on either Generic or some of the >>> RTL models they use in various processor definitions, I >>> express concern for optimization and compilation. >> >> Thank you for your concerns. >> >> However, LLVM is in the first place an environment to write >> compilers. As an example for llvm, it can use the GCC frontend >> for compilation. "clang" is another compile, that can compile >> programs without any call to any gcc part. >> >> To learn more about code-generation for x86 targets inside LLVM >> (e.g. without the help of GCC), look at those files: >> >> http://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/X86 >> >> >> Also, can you provide an example of the same program, once >> compiled with the use of CS/DS and once without this? I mean: >> show use the assembly code. How far apart is the performance of >> those two test programs? For which ABI can you compile with >> using CS/DS? AFAIK a Linux environment disables this. >> >> Somehow your mail remembers me at times than I compiled under >> MSDOS and I had several memory models to select from, >> e.g. "tiny", "medium", "large", "flat" ... >> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-Chris -- http://nondot.org/sabre/ http://llvm.org/