thr3ads.net - search: "hackbench"

2015 Mar 26

2

[PATCH 0/9] qspinlock stuff -v15

...IT as with the previous code. An astute observation, I had not considered that. > I presume when you did benchmarking this did not even register? Thought > I wonder if it would if you ran the benchmark for a week or so. You presume I benchmarked :-) I managed to boot something virt and run hackbench in it. I wouldn't know a representative virt setup if I ran into it. The thing is, we want this qspinlock for real hardware because its faster and I really want to avoid having to carry two spinlock implementations -- although I suppose that if we really really have to we could.

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 26

2

[PATCH 0/9] qspinlock stuff -v15

...IT as with the previous code. An astute observation, I had not considered that. > I presume when you did benchmarking this did not even register? Thought > I wonder if it would if you ran the benchmark for a week or so. You presume I benchmarked :-) I managed to boot something virt and run hackbench in it. I wouldn't know a representative virt setup if I ran into it. The thing is, we want this qspinlock for real hardware because its faster and I really want to avoid having to carry two spinlock implementations -- although I suppose that if we really really have to we could.

Interprocedural DSE for -ftrivial-auto-var-init

2019 Apr 16

2

Interprocedural DSE for -ftrivial-auto-var-init

...we can only benefit from removing extra stores. Hot functions in existing benchmarks are probably optimized good enough already, but speeding up the long tail is also important. Also, at least the repro in https://bugs.llvm.org/show_bug.cgi?id=40527 has been extracted from a real kernel benchmark (hackbench), where this extra store costed us 0.45% > > This is on LLVM codebase with -ftrivial-auto-var-init=patter. > > > > As-is it's less than I expected, so I would like to find good benchmark to decide if we should work to make production code from my experiment. > > > &g...

Interprocedural DSE for -ftrivial-auto-var-init

2019 Apr 16

2

Interprocedural DSE for -ftrivial-auto-var-init

...ng benchmarks are probably optimized good > enough already, but speeding up the long tail is also important. > Also, at least the repro in > https://bugs.llvm.org/show_bug.cgi?id=40527 <https://bugs.llvm.org/show_bug.cgi?id=40527> has been extracted from a > real kernel benchmark (hackbench), where this extra store costed us > 0.45% > > > > This is on LLVM codebase with -ftrivial-auto-var-init=patter. > > > > > > As-is it's less than I expected, so I would like to find good benchmark to decide if we should work to make production code from my expe...

Interprocedural DSE for -ftrivial-auto-var-init

2019 May 13

2

Interprocedural DSE for -ftrivial-auto-var-init

...e probably optimized good >> enough already, but speeding up the long tail is also important. >> Also, at least the repro in >> https://bugs.llvm.org/show_bug.cgi?id=40527 <https://bugs.llvm.org/show_bug.cgi?id=40527> has been extracted from a >> real kernel benchmark (hackbench), where this extra store costed us >> 0.45% >> >> > > This is on LLVM codebase with -ftrivial-auto-var-init=patter. >> > > >> > > As-is it's less than I expected, so I would like to find good benchmark to decide if we should work to make producti...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 25

0

[PATCH 0/9] qspinlock stuff -v15

...t; convoluted and I've no real way to test that but it should be stright fwd to > make work. > > I ran this using the virtme tool (thanks Andy) on my laptop with a 4x > overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and > it both booted and survived a hackbench run (perf bench sched messaging -g 20 > -l 5000). > > So while the paravirt code isn't the most optimal code ever conceived it does work. > > Also, the paravirt patching includes replacing the call with "movb $0, %arg1" > for the native case, which should greatly r...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 27

0

[PATCH 0/9] qspinlock stuff -v15

...bservation, I had not considered that. Thank you. > > > I presume when you did benchmarking this did not even register? Thought > > I wonder if it would if you ran the benchmark for a week or so. > > You presume I benchmarked :-) I managed to boot something virt and run > hackbench in it. I wouldn't know a representative virt setup if I ran > into it. > > The thing is, we want this qspinlock for real hardware because its > faster and I really want to avoid having to carry two spinlock > implementations -- although I suppose that if we really really have to...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 25

0

[PATCH 0/9] qspinlock stuff -v15

...t; convoluted and I've no real way to test that but it should be stright fwd to > make work. > > I ran this using the virtme tool (thanks Andy) on my laptop with a 4x > overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and > it both booted and survived a hackbench run (perf bench sched messaging -g 20 > -l 5000). > > So while the paravirt code isn't the most optimal code ever conceived it does work. > > Also, the paravirt patching includes replacing the call with "movb $0, %arg1" > for the native case, which should greatly r...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 27

0

[PATCH 0/9] qspinlock stuff -v15

...bservation, I had not considered that. Thank you. > > > I presume when you did benchmarking this did not even register? Thought > > I wonder if it would if you ran the benchmark for a week or so. > > You presume I benchmarked :-) I managed to boot something virt and run > hackbench in it. I wouldn't know a representative virt setup if I ran > into it. > > The thing is, we want this qspinlock for real hardware because its > faster and I really want to avoid having to carry two spinlock > implementations -- although I suppose that if we really really have to...

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 12

3

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

...crease is mainly due to not having access to the 32-bit signed >> relocation that can be used with mcmodel=kernel. A small part is due to reduced >> optimization for PIE code. This bug [1] was opened with gcc to provide a better >> code generation for kernel PIE. >> >> Hackbench (50% and 1600% on thread/process for pipe/sockets): >> - PIE disabled: no significant change (avg +0.1% on latest test). >> - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config). >> >> slab_test (average of 10 runs): >> - PIE disabled: no...

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 12

3

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

...crease is mainly due to not having access to the 32-bit signed >> relocation that can be used with mcmodel=kernel. A small part is due to reduced >> optimization for PIE code. This bug [1] was opened with gcc to provide a better >> code generation for kernel PIE. >> >> Hackbench (50% and 1600% on thread/process for pipe/sockets): >> - PIE disabled: no significant change (avg +0.1% on latest test). >> - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config). >> >> slab_test (average of 10 runs): >> - PIE disabled: no...

Interprocedural DSE for -ftrivial-auto-var-init

2019 Apr 15

3

Interprocedural DSE for -ftrivial-auto-var-init

Hi JF, I've heard that you are interested DSE improvements and maybe we need to be in sync. So far I experimented with following DSE improvements: * Cross-block DSE, it eliminates additional 7% stores comparing to existing DSE. But it's not visible on benchmarks. * Cross-block + Interprocedural analysis to annotate each function argument with: - can read before write - will

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 16

19

[PATCH 0/9] qspinlock stuff -v15

...Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work. I ran this using the virtme tool (thanks Andy) on my laptop with a 4x overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and it both booted and survived a hackbench run (perf bench sched messaging -g 20 -l 5000). So while the paravirt code isn't the most optimal code ever conceived it does work. Also, the paravirt patching includes replacing the call with "movb $0, %arg1" for the native case, which should greatly reduce the cost of having CONFI...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 16

19

[PATCH 0/9] qspinlock stuff -v15

...Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work. I ran this using the virtme tool (thanks Andy) on my laptop with a 4x overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and it both booted and survived a hackbench run (perf bench sched messaging -g 20 -l 5000). So while the paravirt code isn't the most optimal code ever conceived it does work. Also, the paravirt patching includes replacing the call with "movb $0, %arg1" for the native case, which should greatly reduce the cost of having CONFI...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 30

2

[PATCH 0/9] qspinlock stuff -v15

...'ve no real way to test that but it should be stright fwd to >> make work. >> >> I ran this using the virtme tool (thanks Andy) on my laptop with a 4x >> overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and >> it both booted and survived a hackbench run (perf bench sched messaging -g 20 >> -l 5000). >> >> So while the paravirt code isn't the most optimal code ever conceived it does work. >> >> Also, the paravirt patching includes replacing the call with "movb $0, %arg1" >> for the native case,...

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 30

2

[PATCH 0/9] qspinlock stuff -v15

...'ve no real way to test that but it should be stright fwd to >> make work. >> >> I ran this using the virtme tool (thanks Andy) on my laptop with a 4x >> overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and >> it both booted and survived a hackbench run (perf bench sched messaging -g 20 >> -l 5000). >> >> So while the paravirt code isn't the most optimal code ever conceived it does work. >> >> Also, the paravirt patching includes replacing the call with "movb $0, %arg1" >> for the native case,...

[RFC 08/07] qspinlock: integrate pending bit into queue

2014 May 21

0

[RFC 08/07] qspinlock: integrate pending bit into queue

...t; > As for now, I will focus on just having one pending bit. > > I'll throw some ideas at it, One of the ideas follows; it seems sound, but I haven't benchmarked it thoroughly. (Wasted a lot of time by writing/playing with various tools and loads.) Dbench on ext4 ramdisk, hackbench and ebizzy have shown a small improvement in performance, but my main drive was the weird design of Pending Bit. Does your setup yield improvements too? (A minor code swap noted in the patch might help things.) It is meant to be aplied on top of first 7 patches, because the virt stuff would just g...

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

0

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

...; > The size increase is mainly due to not having access to the 32-bit signed > relocation that can be used with mcmodel=kernel. A small part is due to reduced > optimization for PIE code. This bug [1] was opened with gcc to provide a better > code generation for kernel PIE. > > Hackbench (50% and 1600% on thread/process for pipe/sockets): > - PIE disabled: no significant change (avg +0.1% on latest test). > - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config). > > slab_test (average of 10 runs): > - PIE disabled: no significant change...

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 12

0

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

...to not having access to the 32-bit signed >>> relocation that can be used with mcmodel=kernel. A small part is due to reduced >>> optimization for PIE code. This bug [1] was opened with gcc to provide a better >>> code generation for kernel PIE. >>> >>> Hackbench (50% and 1600% on thread/process for pipe/sockets): >>> - PIE disabled: no significant change (avg +0.1% on latest test). >>> - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config). >>> >>> slab_test (average of 10 runs): >>&gt...

[PATCH v4 00/27] x86: PIE support and option to extend KASLR randomization

2018 May 29

1

[PATCH v4 00/27] x86: PIE support and option to extend KASLR randomization

...d: same - PIE enabled: +0.001% The size increase is mainly due to not having access to the 32-bit signed relocation that can be used with mcmodel=kernel. A small part is due to reduced optimization for PIE code. This bug [1] was opened with gcc to provide a better code generation for kernel PIE. Hackbench (50% and 1600% on thread/process for pipe/sockets): - PIE disabled: no significant change (avg -/+ 0.5% on latest test). - PIE enabled: between -1% to +1% in average (default and Ubuntu config). Kernbench (average of 10 Half and Optimal runs): Elapsed Time: - PIE disabled: no significant chang...

search for: hackbench