search for: prefetchw

Displaying 20 results from an estimated 21 matches for "prefetchw".

Did you mean: prefetch
2015 Jul 30
0
[LLVMdev] [x86] Prefetch intrinsics and prefetchw
Hi, I am looking at how the PREFETCHW instruction is matched to the IR prefetch intrinsic (and __builtin_prefetch). Consider this C program: char foo[100]; int bar(void) { __builtin_prefetch(foo, 0, 0); __builtin_prefetch(foo, 0, 1); __builtin_prefetch(foo, 0, 2); __builtin_prefetch(foo, 0, 3); __builtin_prefetch(...
2015 Nov 19
1
[PATCH] virtio_ring: Shadow available ring flags & index
...; Thanks! >>> Venkatesh: >>> Is it that your patch only applies to CPUs w/ exclusive caches? >> No --- it applies when the inter-cache coherence flow is optimized by >> 'M' -> 'M' transfers and when producer reads might interfere w/ >> consumer prefetchw/reads. The AMD Optimization guides have specific >> language on this subject, but other platforms may benefit. >> (see Intel #'s below) For core2core case(not HT paire), after consumer reads that M cache line for avail_idx, is that line still in the producer core's L1 data cache...
2023 Sep 11
0
[PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
On 9/10/23 04:28, guoren at kernel.org wrote: > From: Guo Ren <guoren at linux.alibaba.com> > > The target of xchg_tail is to write the tail to the lock value, so > adding prefetchw could help the next cmpxchg step, which may > decrease the cmpxchg retry loops of xchg_tail. Some processors may > utilize this feature to give a forward guarantee, e.g., RISC-V > XuanTie processors would block the snoop channel & irq for several > cycles when prefetch.w instruction...
2007 Dec 18
3
[PATCH] finish processor.h integration
...t i387_fxsave_struct { u16 cwd; @@ -295,7 +311,7 @@ union i387_union { struct i387_fxsave_struct fxsave; }; -# include "processor_64.h" +DECLARE_PER_CPU(struct orig_ist, orig_ist); #endif extern void print_cpu_info(struct cpuinfo_x86 *); @@ -778,6 +794,124 @@ static inline void prefetchw(const void *x) } #define spin_lock_prefetch(x) prefetchw(x) +#ifdef CONFIG_X86_32 +/* + * User space process size: 3GB (default). + */ +#define TASK_SIZE (PAGE_OFFSET) + +#define INIT_THREAD { \ + .sp0 = sizeof(init_stack) + (long)&init_stack, \ + .vm86_info = NULL, \ + .sysen...
2007 Dec 18
3
[PATCH] finish processor.h integration
...t i387_fxsave_struct { u16 cwd; @@ -295,7 +311,7 @@ union i387_union { struct i387_fxsave_struct fxsave; }; -# include "processor_64.h" +DECLARE_PER_CPU(struct orig_ist, orig_ist); #endif extern void print_cpu_info(struct cpuinfo_x86 *); @@ -778,6 +794,124 @@ static inline void prefetchw(const void *x) } #define spin_lock_prefetch(x) prefetchw(x) +#ifdef CONFIG_X86_32 +/* + * User space process size: 3GB (default). + */ +#define TASK_SIZE (PAGE_OFFSET) + +#define INIT_THREAD { \ + .sp0 = sizeof(init_stack) + (long)&init_stack, \ + .vm86_info = NULL, \ + .sysen...
2005 May 16
6
x86_64 build broken?
...lude/asm-x86_64/processor.h: At top level: /usr/include/asm-x86_64/processor.h:229: error: `CONFIG_X86_L1_CACHE_SHIFT'' undeclared here (not in a function) /usr/include/asm-x86_64/processor.h:229: error: requested alignment is not a constant /usr/include/asm-x86_64/processor.h: In function `prefetchw'': /usr/include/asm-x86_64/processor.h:396: error: called object is not a function make[4]: *** [xc_ptrace.opic] Error 1 make[4]: Leaving directory `/root/xen/xeno-unstable.bk/tools/libxc'' make[3]: *** [build] Error 2 make[3]: Leaving directory `/root/xen/xeno-unstable.bk/tools/lib...
2012 Jul 10
6
[PATCH RFC] Btrfs: improve multi-thread buffer read
...struct extent_io_tree *tree, struct bio *bio = NULL; unsigned page_idx; unsigned long bio_flags = 0; + LIST_HEAD(page_pool); + struct pagelst *pagelst = NULL; for (page_idx = 0; page_idx < nr_pages; page_idx++) { struct page *page = list_entry(pages->prev, struct page, lru); prefetchw(&page->flags); list_del(&page->lru); + + if (!pagelst) + pagelst = kmalloc(sizeof(*pagelst), GFP_NOFS); + + if (!pagelst) { + page_cache_release(page); + continue; + } if (!add_to_page_cache_lru(page, mapping, page->index, GFP_NOFS)) { - __extent_read_full_pa...
2007 Dec 18
2
[PATCH 1/2] remove __init modifier from header declaration
This patch removes the __init modifier from an extern function declaration in acpi.h. Besides not being strictly needed, it requires the inclusion of linux/init.h, which is usually not even included directly, increasing header mess by a lot. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- include/asm-x86/acpi.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
2007 Dec 18
2
[PATCH 1/2] remove __init modifier from header declaration
This patch removes the __init modifier from an extern function declaration in acpi.h. Besides not being strictly needed, it requires the inclusion of linux/init.h, which is usually not even included directly, increasing header mess by a lot. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- include/asm-x86/acpi.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
2012 Jul 12
3
[PATCH v2] Btrfs: improve multi-thread buffer read
..., struct bio *bio = NULL; unsigned page_idx; unsigned long bio_flags = 0; + LIST_HEAD(page_pool); + struct pagelst *pagelst = NULL; for (page_idx = 0; page_idx < nr_pages; page_idx++) { struct page *page = list_entry(pages->prev, struct page, lru); + bool delay_read = true; prefetchw(&page->flags); list_del(&page->lru); + + if (!pagelst) + pagelst = kmalloc(sizeof(*pagelst), GFP_NOFS); + if (!pagelst) + delay_read = false; + if (!add_to_page_cache_lru(page, mapping, page->index, GFP_NOFS)) { - __extent_read_full_page(tree, page, get_extent,...
2015 Nov 18
2
[PATCH] virtio_ring: Shadow available ring flags & index
...>> of my own, thanks! > > Thanks! > > Venkatesh: > Is it that your patch only applies to CPUs w/ exclusive caches? No --- it applies when the inter-cache coherence flow is optimized by 'M' -> 'M' transfers and when producer reads might interfere w/ consumer prefetchw/reads. The AMD Optimization guides have specific language on this subject, but other platforms may benefit. (see Intel #'s below) > Do you have perf data on Intel CPUs? Good idea -- I ran some tests on a couple of Intel platforms: (these are perf data from sample runs; for each I ran many...
2015 Nov 18
2
[PATCH] virtio_ring: Shadow available ring flags & index
...>> of my own, thanks! > > Thanks! > > Venkatesh: > Is it that your patch only applies to CPUs w/ exclusive caches? No --- it applies when the inter-cache coherence flow is optimized by 'M' -> 'M' transfers and when producer reads might interfere w/ consumer prefetchw/reads. The AMD Optimization guides have specific language on this subject, but other platforms may benefit. (see Intel #'s below) > Do you have perf data on Intel CPUs? Good idea -- I ran some tests on a couple of Intel platforms: (these are perf data from sample runs; for each I ran many...
2007 Apr 18
0
[RFC/PATCH PV_OPS X86_64 08/17] paravirt_ops - memory managment
...agetable(unsigned long addres pmd_t *pmd; pte_t *pte; - asm("movq %%cr3,%0" : "=r" (pgd)); + pgd = (pgd_t *)read_cr3(); pgd = __va((unsigned long)pgd & PHYSICAL_PAGE_MASK); pgd += pgd_index(address); @@ -347,7 +347,7 @@ asmlinkage void __kprobes do_page_fault( prefetchw(&mm->mmap_sem); /* get the address */ - __asm__("movq %%cr2,%0":"=r" (address)); + address = read_cr2(); info.si_code = SEGV_MAPERR; Index: clean-start/arch/x86_64/mm/init.c =================================================================== --- clean-start.orig...
2007 Apr 18
0
[RFC/PATCH PV_OPS X86_64 08/17] paravirt_ops - memory managment
...agetable(unsigned long addres pmd_t *pmd; pte_t *pte; - asm("movq %%cr3,%0" : "=r" (pgd)); + pgd = (pgd_t *)read_cr3(); pgd = __va((unsigned long)pgd & PHYSICAL_PAGE_MASK); pgd += pgd_index(address); @@ -347,7 +347,7 @@ asmlinkage void __kprobes do_page_fault( prefetchw(&mm->mmap_sem); /* get the address */ - __asm__("movq %%cr2,%0":"=r" (address)); + address = read_cr2(); info.si_code = SEGV_MAPERR; Index: clean-start/arch/x86_64/mm/init.c =================================================================== --- clean-start.orig...
2015 Nov 18
0
[PATCH] virtio_ring: Shadow available ring flags & index
...anks! > > > > > Venkatesh: > > Is it that your patch only applies to CPUs w/ exclusive caches? > > No --- it applies when the inter-cache coherence flow is optimized by > 'M' -> 'M' transfers and when producer reads might interfere w/ > consumer prefetchw/reads. The AMD Optimization guides have specific > language on this subject, but other platforms may benefit. > (see Intel #'s below) > > > Do you have perf data on Intel CPUs? > > Good idea -- I ran some tests on a couple of Intel platforms: > > (these are perf da...
2013 Nov 18
12
[Patch v3 0/4] Xen stack trace printing improvements
This series consists of improvements to Xen''s ability to print traces of its own stack, and specifically for the stack overflow case to be able to use frame pointers in a debug build. I have dev tested the series in debug and non-debug cases, with and without memory guards, and I believe that all the stack traces look correct (given the available information Xen has), and that the
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
On Wed, Nov 11, 2015 at 02:34:33PM +0200, Michael S. Tsirkin wrote: > On Tue, Nov 10, 2015 at 04:21:07PM -0800, Venkatesh Srinivas wrote: > > Improves cacheline transfer flow of available ring header. > > > > Virtqueues are implemented as a pair of rings, one producer->consumer > > avail ring and one consumer->producer used ring; preceding the > > avail ring
2015 Nov 13
2
[PATCH] virtio_ring: Shadow available ring flags & index
On Wed, Nov 11, 2015 at 02:34:33PM +0200, Michael S. Tsirkin wrote: > On Tue, Nov 10, 2015 at 04:21:07PM -0800, Venkatesh Srinivas wrote: > > Improves cacheline transfer flow of available ring header. > > > > Virtqueues are implemented as a pair of rings, one producer->consumer > > avail ring and one consumer->producer used ring; preceding the > > avail ring
2013 Aug 09
14
[Patch 0/4] Xen stack trace printing improvements
This series consists of improvements to Xen''s ability to print traces of its own stack, and specifically for the stack overflow case to be able to use frame pointers in a debug build. I have dev tested the series in debug and non-debug cases, with and without memory guards, and I believe that all the stack traces look correct. However, I would greatly appreciate a second opinion on the
2007 Apr 18
2
[PATCH] x86_64 paravirt_ops port
...agetable(unsigned long addres pmd_t *pmd; pte_t *pte; - asm("movq %%cr3,%0" : "=r" (pgd)); + pgd = (pgd_t *)read_cr3(); pgd = __va((unsigned long)pgd & PHYSICAL_PAGE_MASK); pgd += pgd_index(address); @@ -347,7 +347,7 @@ asmlinkage void __kprobes do_page_fault( prefetchw(&mm->mmap_sem); /* get the address */ - __asm__("movq %%cr2,%0":"=r" (address)); + address = read_cr2(); info.si_code = SEGV_MAPERR; Index: linux-2.6.19-quilt/arch/x86_64/mm/init.c =================================================================== --- linux-2.6...