similar to: NVPTX - Reordering load instructions

Displaying 20 results from an estimated 210 matches similar to: "NVPTX - Reordering load instructions"

2018 Jun 21
2
NVPTX - Reordering load instructions
We already have a pass that vectorizes loads and stores in nvptx and amdgpu. Not at my laptop, I forget the exact filename, but it's called load-store vectorizer. I think the question is, why is LSV not vectorizing this code? I think the answer is, llvm can't tell that the loads are aligned. Ptxas can, but only because it's (apparently) doing vectorization *after* it reesolves the
2006 Jun 26
0
[klibc 34/43] sh support for klibc
The parts of klibc specific to the sh architecture. Signed-off-by: H. Peter Anvin <hpa at zytor.com> --- commit 94473ed85b00ec45ff8ee6cac62f60a368ff4534 tree 5c09fdd824962cba83c198ac1dd077045d2cb0b1 parent 418ab99cc64fed344e2d3e446208f48655cccb31 author H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006 16:58:44 -0700 committer H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006
2004 Aug 06
1
status report on the Windows CE (PocketPC) ACM codec
> > It was essentially my own mistake. After the Embedded C++ IDE crashed unexpectedly, > the .lib file was no longer present in the project settings (the project had not been saved) > so all the errors I got were caused by a missing .lib reference in the link stage! > I had to change my cross compiler target to "arm-wince-pe" because otherwise the > object/.lib files
2006 Jun 28
35
[klibc 00/31] klibc as a historyless patchset (updated and reorganized)
I have updated the klibc patchset based on feedback received. In particular, the patchset has been reorganized so as not to break git-bisect. Additionally, this updates the patch base to 2.6.17-git12 (d38b69689c349f35502b92e20dafb30c62d49d63) and klibc 1.4.8; the main difference on the klibc side is removal of obsolete code. This is also available as a git tree at:
2013 Mar 20
2
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
OK. That helps. It does flash a warning though [DEVICE-C++] nbody.kernel.cpp nbody.kernel.cpp:29:9: warning: '__constant__' macro redefined #define __constant__ __attribute__((address_space(2))) ^ /opt/cuda/include/host_defines.h:183:9: note: previous definition is here #define __constant__ \ ^ 1 warning generated. Another question is What about extern __shared__ ? I
2013 Mar 20
0
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
On Wed, Mar 20, 2013 at 11:29 AM, upit <uday_pitambare at yahoo.com> wrote: > OK. That helps. > It does flash a warning though > > [DEVICE-C++] nbody.kernel.cpp > nbody.kernel.cpp:29:9: warning: '__constant__' macro redefined > #define __constant__ __attribute__((address_space(2))) > ^ > /opt/cuda/include/host_defines.h:183:9: note: previous
2013 Mar 20
0
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
The problem you are seeing is because clang is putting the global variables in address space 0, which in NVPTX means the generic address space. PTX does not allow this, so the back-end *should* be printing an error for you. Are you using trunk or 3.2? Generally, clang won't be compatible with the CUDA Toolkit headers. If you want to use the __constant__ modifier from CUDA in Clang, define
2013 Mar 18
2
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
Please find the .ll attached below . Yes, I am using the cuda_runtime.h from the toolkit. nbody.kernel.ll <http://llvm.1065342.n5.nabble.com/file/n56048/nbody.kernel.ll> - Uday -- View this message in context: http://llvm.1065342.n5.nabble.com/UNREACHABLE-executed-error-while-trying-to-generate-PTX-tp56026p56048.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
2013 Apr 20
1
[LLVMdev] Adding custom pragma's for generating particular LLVM IR
Hi ML readers, I would first apology if this was already addressed on this ML, but I didn't find information on this topic. I would like to add custom pragma's that generate particular LLVM IR. Here is an example: int a, b; /* ... */ #pragma mypragma shared(a, b) { a = b; b++; } would result in this (pseudo) LLVM IR: %1 alloca i32 %2 alloca i32 ... call void
2023 Mar 29
1
ChaCha20 Rekey Frequency
I was wondering if there was something specific to the internal chacha20 cipher as opposed to OpenSSL implementation. I can't just change the block size because it breaks compatibility. I can do something like as a hack (though it would probably be better to do it with the compat function): if (strstr(enc->name, "chacha")) *max_blocks = (u_int64_t)1 << (16*2);
2023 Mar 29
2
ChaCha20 Rekey Frequency
On Wed, 29 Mar 2023, Chris Rapier wrote: > I was wondering if there was something specific to the internal chacha20 > cipher as opposed to OpenSSL implementation. > > I can't just change the block size because it breaks compatibility. I can do > something like as a hack (though it would probably be better to do it with the > compat function): > > if
2023 Mar 29
1
[EXTERNAL] Re: ChaCha20 Rekey Frequency
I'm hardly an expert on this, but if I remember correctly, the rekey rate for good security is mostly dependent on the cipher block size. I left my reference books at home; so, I can't come up with a reference for you, but I would take Chris' "I'm deeply unsure of what impact that would have on the security of the cipher" comment seriously and switch to a cipher with a
2023 Mar 29
1
[EXTERNAL] Re: ChaCha20 Rekey Frequency
That's true for block ciphers, but ChaCha20+poly1305 is a stream cipher. On Wed, 29 Mar 2023, Robinson, Herbie wrote: > > I?m hardly an expert on this, but if I remember correctly, the rekey rate > for good security is mostly dependent on the cipher block size.? I left my > reference books at home; so, I can?t come up with a reference for you, but I > would take Chris?
2009 Dec 27
3
[PATCH 1/2] drm/nv50: align size of buffer object to the right boundaries.
- Depth and stencil buffers are supposed to be large enough in general. Signed-off-by: Maarten Maathuis <madman2003 at gmail.com> --- drivers/gpu/drm/nouveau/nouveau_bo.c | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index e342a41..9fc4bd6 100644 ---
2018 Jul 23
3
[hivex PATCH] Re-allocating unused blocks before assigning new blocks
Hello Richard As discussed in the IRC channel, when merging a moderately large reg file (~35MB) to a hiv file (~118 MB); hivex generates a huge hiv file (~580 MB). These changes address that by creating a list of unallocated blocks and reassigning unused blocks. I used https://github.com/msuhanov/regf/blob/master/Windows%20registry%20file%20format%20specification.md as a reference for the
2015 Oct 19
5
[PATCHv2 0/2] Introduce vfs_min_size API to get minimum filesystem size.
Tried to make it in accordance with your comments. Difference to v1: Added reply_with_error where necessary. Changed name get_min_size -> vfs_min_size. Maxim Perevedentsev (2): New API: vfs_min_size Include resize2fs_P into vfs_min_size. daemon/Makefile.am | 1 + daemon/daemon.h | 2 ++ daemon/ext2.c | 45 ++++++++++++++++++++++++++----- daemon/fs-min-size.c | 49
2023 Mar 24
1
ChaCha20 Rekey Frequency
I'm wondering why the ChaCha20 cipher rekeys so frequently. At speed I'm seeing rekeys every second or two. So I'm spending a large amount of time in the rekey process. From what I've read about ChaCha20 it shouldn't need to be rekeyed quite so frequently. Am I missing something obvious? Just curious more than anything else. Chris
2003 Mar 30
1
[RFC][patch] dynamic rolling block and sum sizes II
Mark II of the patch set. The first patch (dynsumlen2.patch) increments the protocol version to support per-file dynamic block checksum sizes. It is a prerequisite for varsumlen2.patch. varsumlen2.patch implements per-file dynamic block and checksum sizes. The current block size calculation only applies to files between 7MB and 160MB setting the block size to 1/10,0000 of the file length for a
2004 Aug 02
4
reducing memmoves
Attached is a patch that makes window strides constant when files are walked with a constant block size. In these cases, it completely avoids all memmoves. In my simple local test of rsyncing 57MB of 10 local files, memmoved bytes went from 18MB to zero. I haven't tested this for a big variety of file cases. I think that this will always reduce the memmoves involved with walking a large
2017 Jul 05
3
[PATCH v2] virtio-blk: add DISCARD support to virtio-blk driver
Currently virtio-blk driver does not provide discard feature flag, so the filesystems which built on top of the block device will not send discard command. This is okay for HDD backend, but it will impact the performance for SSD backend. Add a feature flag VIRTIO_BLK_F_DISCARD and command VIRTIO_BLK_T_DISCARD to extend exist virtio-blk protocol, define 16 bytes discard descriptor for each discard