thr3ads.net - search: "quadword"

[LLVMdev] Register Dependencies and Register Allocation

2008 Dec 23

3

[LLVMdev] Register Dependencies and Register Allocation

I'm writing a back-end for an architecture that supports multi-word loads. As a concrete example, "ldqw r0, [addr]" would load a quadword (4 words) into 4 registers starting with r0 (implicit writes to r1, r2, and r3). First, is there any currently supported architecture that has anything like this? I suspect not. If not, I hope someone might help me figure out how to make this work, particularly with the cooperation of the regist...

[LLVMdev] Custom GEP lowering

2007 Aug 29

3

[LLVMdev] Custom GEP lowering

...hat CellSPU does not know how to natively perform byte- level addressing. For example, here's an indexed stack instruction to load register $3: ldq $3, 4($sp) In reality, the "4($sp)" doesn't mean what you think it means in the PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four zero bits to the right of the offset. To get at the 4th byte requires loading from 0($sp) and some vector shuffling. (Dan: Think about older Cray hardware... you'll immediately understand!) I could try custom lowering loads and stores as an interim step and detect if...

[LLVMdev] Inconsistent naming of SSE intrinsics?

2012 Jun 22

1

[LLVMdev] Inconsistent naming of SSE intrinsics?

Hey guys, Is there a reason for the following naming quirk in the x86 SSE intrinsics: int_x86_sse2_pcmpeq_b int_x86_sse2_pcmpeq_w int_x86_sse2_pcmpeq_d int_x86_sse41_pcmpeqq I anticipated a "_q" suffix for the quadword variant, but was surprised to see the intrinsic named above. Just FYI..., Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120622/6ed6c744/attachment.html>

SYSLINUX 3.83-pre3

2009 Jul 30

2

SYSLINUX 3.83-pre3

I *think* I have found and fixed the Thinkpad MEMDISK problem. The problem with MS-DOS I understand... not so when it comes to an apparently unrelated FreeDOS problem, and as such I really don't know *why* the hack I did works, nor if it will *stay* fixed, but at least it seems to boot on my T61 (at least until it crashes due to another error...) -hpa -- H. Peter Anvin, Intel Open Source

[LLVMdev] Register Dependencies and Register Allocation

2008 Dec 23

0

[LLVMdev] Register Dependencies and Register Allocation

On Dec 23, 2008, at 11:03 AMPST, Marc de Kruijf wrote: > > I'm writing a back-end for an architecture that supports multi-word > loads. As a concrete example, "ldqw r0, [addr]" would load a > quadword (4 words) into 4 registers starting with r0 (implicit > writes to r1, r2, and r3). ARM has this. It currently works by creating such instructions in a peephole pass following register allocation, which is not ideal. I think defining a quad-word register class containing 4 smaller registe...

[PATCH] [memdisk] Additional EDD Device Parameter Table fields

2009 Jul 31

1

[PATCH] [memdisk] Additional EDD Device Parameter Table fields

...+ uint8_t dpilen; /* DPI length */ + uint8_t res1; /* Reserved */ + uint8_t res2; /* Reserved */ + uint8_t bustype[4]; /* Host bus type */ + uint8_t inttype[8]; /* Interface type */ + uint64_t intpath; /* Interface path */ + uint64_t devpath[2]; /* Device path (double QuadWord!) */ + uint8_t res3; /* Reserved */ + uint8_t chksum; /* DPI checksum */ }; =20 struct patch_area { --=20 1.5.6.3 ------_=_NextPart_001_01CA11A5.1B50C173 Content-Type: application/octet-stream; name="0001--memdisk-Additional-EDD-Device-Parameter-Table-fiel.patch" Content-Tr...

[LLVMdev] Custom GEP lowering

2007 Aug 29

0

[LLVMdev] Custom GEP lowering

...w to natively perform byte- > level addressing. For example, here's an indexed stack instruction to > load register $3: > > ldq $3, 4($sp) > > In reality, the "4($sp)" doesn't mean what you think it means in the > PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four > zero bits to the right of the offset. To get at the 4th byte requires > loading from 0($sp) and some vector shuffling. (Dan: Think about > older Cray hardware... you'll immediately understand!) Isn't this just an ISel issue? You have to ISel unaligned load/ s...

[PATCH 0/7] Using %gs for per-cpu areas on x86

2007 Apr 18

1

[PATCH 0/7] Using %gs for per-cpu areas on x86

OK, here it is. Benchmarks still coming. This is against Andi's 2.6.18-rc7-git3 tree, and replaces the patches between (and not including) i386-pda-asm-offsets and i386-early-fault. One patch is identical, one is mildly modified, the rest are re-implemented but inspired by Jeremy's PDA work. Thanks, Rusty. -- Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

[PATCH 0/7] Using %gs for per-cpu areas on x86

2007 Apr 18

1

[PATCH 0/7] Using %gs for per-cpu areas on x86

OK, here it is. Benchmarks still coming. This is against Andi's 2.6.18-rc7-git3 tree, and replaces the patches between (and not including) i386-pda-asm-offsets and i386-early-fault. One patch is identical, one is mildly modified, the rest are re-implemented but inspired by Jeremy's PDA work. Thanks, Rusty. -- Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

2004 Oct 06

3

flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

Sadly the latest optimization broke completely everything. The asm code isn't gas compliant. the libFLAC linker script has a typo, disabling the asm optimization and/or altivec won't let a correct build anyway. Instant fixes for the asm stuff: sed -i -e"s:;:\#:" on the lpc_asm.s to load address instead of addis+ori you could use lis and la and PLEASE use the @l(register)

question about src/test_seeking.c - seek_barrage()

2016 Jan 31

2

question about src/test_seeking.c - seek_barrage()

seek_barrage() has variable n of type long int (which is 32bit usually). Then we see something like n = (long int)total_samples; So, why n has type long int, and not FLAC__int64 or some other 64-bit type?

[LLVMdev] Custom GEP lowering

2007 Aug 28

0

[LLVMdev] Custom GEP lowering

On Mon, Aug 27, 2007 at 07:26:55PM -0700, Scott Michel wrote: > It looks like I need to be able to intercept GEP lowering (in > SelectionDAGLowering::visitGetElementPtr) and insert something else > other than the shifts and adds. The basic problem is that CellSPU > loads and stores on 16-byte boundaries. Consequently, the SPU backend > has to do the load or store differently

altivec lpc_restore_signal

2004 Sep 10

1

altivec lpc_restore_signal

...ere may be some avoidable stalls, ; and there may be a somewhat more clever way to do the outer loop ; the branch mechanism may prevent dynamic loading; I still need to examine ; this issue, and there may be a more elegant method stmw r31,-4(r1) addi r9,r1,-28 li r31,0xf andc r9,r9,r31 ; for quadword-aligned stack data slwi r6,r6,2 ; adjust for word size slwi r4,r4,2 add r4,r4,r8 ; r4 = data+data_len mfspr r0,256 ; cache old vrsave addis r31,0,hi16(0xfffffc00) ori r31,r31,lo16(0xfffffc00) mtspr 256,r31 ; declare VRs in vrsave cmplw cr0,r8,r4 ; i<data_len bc 4,0,L1400 ; load coe...

[RFC] Improving compact x86-64 compact unwind descriptors

2018 Jan 26

4

[RFC] Improving compact x86-64 compact unwind descriptors

...r and resulting requirements to be observed. Note that a null frame function has no distinct prologue, body or epilogue. Every instruction can be viewed as simultaneously in all three parts, or in none of them. This leads to the following proposal for the upper part of the extended compact unwind quadword for use in combination with MODE = 8 in the lower part. | 63 48 | 47 32 | |-------------------------------------------------------------------| | RESERVED | 0 ... 0 | |-------------------------------...

[LLVMdev] Custom GEP lowering

2007 Aug 28

2

[LLVMdev] Custom GEP lowering

It looks like I need to be able to intercept GEP lowering (in SelectionDAGLowering::visitGetElementPtr) and insert something else other than the shifts and adds. The basic problem is that CellSPU loads and stores on 16-byte boundaries. Consequently, the SPU backend has to do the load or store differently than most normal architectures that have byte-addressable operations.

[RFC] Improving compact x86-64 compact unwind descriptors

2018 Jan 27

0

[RFC] Improving compact x86-64 compact unwind descriptors

...ved. > > Note that a null frame function has no distinct prologue, body or > epilogue. Every instruction can > be viewed as simultaneously in all three parts, or in none of them. > > This leads to the following proposal for the upper part of the extended > compact unwind > quadword for use in combination with MODE = 8 in the lower part. > > | 63 48 | 47 32 | > |-------------------------------------------------------------------| > | RESERVED | 0 ... 0 | > |------...

[RFC] Improving compact x86-64 compact unwind descriptors

2018 Jan 29

2

[RFC] Improving compact x86-64 compact unwind descriptors

...ull frame function has no distinct prologue, body or > > epilogue. Every instruction can > > be viewed as simultaneously in all three parts, or in none of them. > > > > This leads to the following proposal for the upper part of the extended > > compact unwind > > quadword for use in combination with MODE = 8 in the lower part. > > > > | 63 48 | 47 32 | > > |-------------------------------------------------------------------| > > | RESERVED | 0 ......

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

4

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...zxdq 20(%rbx), %xmm0 4. We no longer emit a simpler 'vmovq' in the following case: vxorpd %xmm4, %xmm4, %xmm4 vblendpd $2, %xmm4, %xmm2, %xmm4 # %xmm4 = %xmm2[0],%xmm4[1] Before, we used to generate: vmovq %xmm2, %xmm4 Before, the vmovq implicitly zero-extended to 128 bits the quadword in %xmm2. Now we always do this with a vxorpd+vblendps. As I said, I will try to create smaller reproducible for each of the problems I found. I hope this helps. I will keep testing. Thanks, Andrea

[LLVMdev] First-class aggregate semantics

2010 Jan 08

0

[LLVMdev] First-class aggregate semantics

...an all-important microsecond. :-) I have had great success with my HLVM project by passing around large numbers of large structs by hand. LLVM has not only survived but actually generated decent code that beats most languages according to my benchmarks. In particular, HLVM uses "fat" quadword references (where word = sizeof(void*)) that are passed everywhere by value except when a struct is returned and HLVM gets the caller to alloca and passes that space by pointer to the callee for it to fill in. > > ...I believe right now, however, only structs up to a > > certain siz...

[RFC] Improving compact x86-64 compact unwind descriptors

2018 Jan 27

0

[RFC] Improving compact x86-64 compact unwind descriptors

...ved. > > Note that a null frame function has no distinct prologue, body or > epilogue. Every instruction can > be viewed as simultaneously in all three parts, or in none of them. > > This leads to the following proposal for the upper part of the extended > compact unwind > quadword for use in combination with MODE = 8 in the lower part. > > | 63 48 | 47 32 | > |-------------------------------------------------------------------| > | RESERVED | 0 ... 0 | > |------...

search for: quadword