similar to: [LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)"

2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Thu, Apr 17, 2014 at 6:10 PM, Yaron Keren <yaron.keren at gmail.com> wrote: > If accuracy is not critical, incrementing the counters without any guards > might be good enough. > No. Contention on the counters leads to 5x-10x slowdown. This is never good enough. --kcc Hot areas will still be hot and cold areas will not be affected. > > Yaron > > > >
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Chandler Carruth <chandlerc at google.com> writes: > Having thought a bit about the best strategy to solve this, I think we should > use a tradeoff of memory to reduce contention. I don't really like any of the > other options as much, if we can get that one to work. Here is my specific > suggestion: > > On Thu, Apr 17, 2014 at 5:21 AM, Kostya Serebryany <kcc at
2014 Apr 18
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 17, 2014, at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote: > On Thu, Apr 17, 2014 at 1:27 PM, Justin Bogner <mail at justinbogner.com> wrote: > Chandler Carruth <chandlerc at google.com> writes: > > if (thread-ID != main's thread-ID && shard_count < std::min(MAX, NUMBER_OF_CORES)) { > > shard_count = std::min(MAX,
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Good thinking, but why do you think runtime selection of shard count is better than compile time selection? For single threaded apps, shard count is always 1, so why paying the penalty to check thread id each time function is entered? For multi-threaded apps, I would expect MAX to be smaller than NUM_OF_CORES to avoid excessive memory consumption, then you always end up with N == MAX. If MAX is
2014 Apr 18
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Fri, Apr 18, 2014 at 11:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > Hi, > > This is long thread, so I will combine several comments into single email. > > > >> - 8-bit per-thread counters, dumping into central counters on overflow. > >The overflow will happen very quickly with 8bit counter. > > Yes, but it reduces contention by 256x (a thread
2014 Apr 18
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Fri, Apr 18, 2014 at 11:44 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > On Fri, Apr 18, 2014 at 11:32 AM, Dmitry Vyukov <dvyukov at google.com>wrote: > >> On Fri, Apr 18, 2014 at 11:13 AM, Dmitry Vyukov <dvyukov at google.com>wrote: >> >>> Hi, >>> >>> This is long thread, so I will combine several comments into single
2014 Apr 18
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Fri, Apr 18, 2014 at 12:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > Hi, > > This is long thread, so I will combine several comments into single email. > > > >> - 8-bit per-thread counters, dumping into central counters on overflow. > >The overflow will happen very quickly with 8bit counter. > > Yes, but it reduces contention by 256x (a thread
2014 Apr 23
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 23, 2014, at 7:31 AM, Kostya Serebryany <kcc at google.com> wrote: > I've run one proprietary benchmark that reflects a large portion of the google's server side code. > -fprofile-instr-generate leads to 14x slowdown due to counter contention. That's serious. > Admittedly, there is a single hot function that accounts for half of that slowdown, > but even if
2014 Apr 25
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 24, 2014, at 1:33 AM, Dmitry Vyukov <dvyukov at google.com> wrote: >> >> I can see that the behavior of our current instrumentation is going to be a >> problem for the kinds of applications that you’re looking at. If you can >> find a way to get the overhead down without losing accuracy > > What are your requirements for accuracy? > Current
2018 Apr 20
7
Reconstructing files from shards
Hello, So I have a volume on a gluster install (3.12.5) on which sharding was enabled at some point recently. (Don't know how it happened, it may have been an accidental run of an old script.) So it has been happily sharding behind our backs and it shouldn't have. I'd like to turn sharding off and reverse the files back to normal. Some of these are sparse files, so I need to account
2016 Mar 07
3
Profile-based inlining status
Hello, I'm learning how LLVM performs PGO (profile-guided optimizations) by using the instrumentation-based profile build (-fprofile-instr-generate and -fprofile-instr-use). However, I found there is no difference in inlining behaviors between with and without PGO for a few spec benchmarks by checking the emit optimization reports (-Rpass=inline -Rpass-missed=inline -Rpass-analysis=inline).
2018 Feb 27
1
On sharded tiered volume, only first shard of new file goes on hot tier.
Does anyone have any ideas about how to fix, or to work-around the following issue? Thanks! Bug 1549714 - On sharded tiered volume, only first shard of new file goes on hot tier. https://bugzilla.redhat.com/show_bug.cgi?id=1549714 On sharded tiered volume, only first shard of new file goes on hot tier. On a sharded tiered volume, only the first shard of a new file goes on the hot tier, the rest
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Thu, Apr 17, 2014 at 8:37 PM, Jonathan Roelofs <jonathan at codesourcery.com > wrote: > How about per-thread if the counter is hot enough? > Err. How do you know if the counter is hot w/o first profiling the app? -------------- next part -------------- An HTML attachment was scrubbed... URL:
2018 Apr 22
4
Reconstructing files from shards
Il dom 22 apr 2018, 10:46 Alessandro Briosi <ab1 at metalit.com> ha scritto: > Imho the easiest path would be to turn off sharding on the volume and > simply do a copy of the files (to a different directory, or rename and > then copy i.e.) > > This should simply store the files without sharding. > If you turn off sharding on a sharded volume with data in it, all sharded
2017 Dec 08
2
Testing sharding on tiered volume
Hi, I'm looking to use sharding on tiered volume. This is very attractive feature that could benefit tiered volume to let it handle larger files without hitting the "out of (hot)space problem". I decided to set test configuration on GlusterFS 3.12.3 when tiered volume has 2TB cold and 1GB hot segments. Shard size is set to be 16MB. For testing 100GB files are used. It seems writes
2016 Mar 11
3
RFC: Pass to prune redundant profiling instrumentation
On Thu, Mar 10, 2016 at 8:33 PM, Sean Silva via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi, >> >> I'd like to add a new pass to LLVM which removes redundant profile counter >> updates. The goal is to speed up code coverage
2017 Dec 18
0
Testing sharding on tiered volume
----- Original Message ----- > From: "Viktor Nosov" <vnosov at stonefly.com> > To: gluster-users at gluster.org > Cc: vnosov at stonefly.com > Sent: Friday, December 8, 2017 5:45:25 PM > Subject: [Gluster-users] Testing sharding on tiered volume > > Hi, > > I'm looking to use sharding on tiered volume. This is very attractive > feature that could
2018 Apr 27
0
Reconstructing files from shards
The short answer is - no there exists no script currently that can piece the shards together into a single file. Long answer: IMO the safest way to convert from sharded to a single file _is_ by copying the data out into a new volume at the moment. Picking up the files from the individual bricks directly and joining them, although fast, is a strict no-no for many reasons - for example, when you
2018 Apr 23
0
Reconstructing files from shards
From some old May 2017 email. I asked the following: "From the docs, I see you can identify the shards by the GFID # getfattr -d -m. -e hex/path_to_file/ # ls /bricks/*/.shard -lh | grep /GFID Is there a gluster tool/script that will recreate the file? or can you just sort them sort them properly and then simply cat/copy+ them back together? cat shardGFID.1 .. shardGFID.X > thefile
2015 Aug 08
3
RFC: PGO Late instrumentation for LLVM
Instrumentation based Profile Guided Optimization (PGO) is a compiler technique that leverages important program runtime information, such as precise edge counts and frequent value information, to make frequently executed code run faster. It's proven to be one of the most effective ways to improve program performance. An important design point of PGO is to decide where to place the