Jason -Zhong Sheng- Hu via llvm-dev
2020-Apr-12 22:49 UTC
[llvm-dev] Optimization generate super long function definition
Hi all, sorry to have sent the same question around. I am quite desperately looking for a solution to this problem and I figured the mailing list is the best bet. In my code, I generate the following function: define i32 @gl.qi([500 x i32] %x, i32 %i) { entry: %x. = alloca [500 x i32] %i. = alloca i32 %0 = alloca [500 x i32] store [500 x i32] %x, [500 x i32]* %x. store i32 %i, i32* %i. %x.1 = load [500 x i32], [500 x i32]* %x. %i.2 = load i32, i32* %i. store [500 x i32] %x.1, [500 x i32]* %0 %1 = icmp slt i32 %i.2, 500 br i1 %1, label %in-bound, label %out-of-bound out-of-bound: ; preds = %entry call void @gen.panic(i8* getelementptr inbounds ([22 x i8], [22 x i8]* @pool.str.2, i32 0, i32 0)) unreachable in-bound: ; preds = %entry %2 = getelementptr inbounds [500 x i32], [500 x i32]* %0, i32 0, i32 %i.2 %idx = load i32, i32* %2 ret i32 %idx } the high level functionality is to use %i to index %x, and if %i is out of bound, a panic function is called instead. consider the store line: store [500 x i32] %x, [500 x i32]* %x. once I pass this function to opt -O1 -S --verify --verify-each, it generates code like this: define i32 @gl.qi([500 x i32] %x, i32 %i) local_unnamed_addr { entry: %0 = alloca [500 x i32], align 4 %x.fca.0.extract = extractvalue [500 x i32] %x, 0 %x.fca.1.extract = extractvalue [500 x i32] %x, 1 %x.fca.2.extract = extractvalue [500 x i32] %x, 2 %x.fca.3.extract = extractvalue [500 x i32] %x, 3 %x.fca.4.extract = extractvalue [500 x i32] %x, 4 %x.fca.5.extract = extractvalue [500 x i32] %x, 5 until 500. I put the number to 50000 and it won’t stop. This is puzzling. I am not sure why must a store command be expanded to a sequence of etractvalues then stores? Is there a way to turn off this particular optimization without turning off the whole optimization? Or am I looking at the wrong way to do this simple task? Thanks, Jason Hu https://hustmphrrr.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200412/81912338/attachment.html>
Neil Henning via llvm-dev
2020-Apr-13 13:10 UTC
[llvm-dev] Optimization generate super long function definition
Hey Jason Hu, So I think this is the SROA pass breaking up the load/store of the aggregate (the [500 x i32] array) into individual load/stores so that it can see if any are unused or can have their stores forwarded. This is a bit of LLVM that I personally find pretty dumb - but if you look at how Clang generates code for your above pattern it'll output memcpy intrinsics instead of doing load/stores, which gets around this issue. Note: the compiler hasn't hung, it's just spinning and comparing a few thousand instructions generated from SROA against each other, you might need to wait until the end of the universe but it should complete eventually ;) Cheers, -Neil. On Mon, Apr 13, 2020 at 12:37 AM Jason -Zhong Sheng- Hu via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > > sorry to have sent the same question around. I am quite desperately > looking for a solution to this problem and I figured the mailing list is > the best bet. > > > In my code, I generate the following function: > > define i32 @gl.qi([500 x i32] %x, i32 %i) { > entry: > %x. = alloca [500 x i32] > %i. = alloca i32 > %0 = alloca [500 x i32] > store [500 x i32] %x, [500 x i32]* %x. > store i32 %i, i32* %i. > %x.1 = load [500 x i32], [500 x i32]* %x. > %i.2 = load i32, i32* %i. > store [500 x i32] %x.1, [500 x i32]* %0 > %1 = icmp slt i32 %i.2, 500 > br i1 %1, label %in-bound, label %out-of-bound > > out-of-bound: ; preds = %entry > call void @gen.panic(i8* getelementptr inbounds ([22 x i8], [22 x i8]* @pool.str.2, i32 0, i32 0)) > unreachable > > in-bound: ; preds = %entry > %2 = getelementptr inbounds [500 x i32], [500 x i32]* %0, i32 0, i32 %i.2 > %idx = load i32, i32* %2 > ret i32 %idx > } > > the high level functionality is to use %i to index %x, and if %i is out > of bound, a panic function is called instead. > > > consider the store line: > > store [500 x i32] %x, [500 x i32]* %x. > > once I pass this function to opt -O1 -S --verify --verify-each, it > generates code like this: > > define i32 @gl.qi([500 x i32] %x, i32 %i) local_unnamed_addr { > entry: > %0 = alloca [500 x i32], align 4 > %x.fca.0.extract = extractvalue [500 x i32] %x, 0 > %x.fca.1.extract = extractvalue [500 x i32] %x, 1 > %x.fca.2.extract = extractvalue [500 x i32] %x, 2 > %x.fca.3.extract = extractvalue [500 x i32] %x, 3 > %x.fca.4.extract = extractvalue [500 x i32] %x, 4 > %x.fca.5.extract = extractvalue [500 x i32] %x, 5 > > until 500. I put the number to 50000 and it won’t stop. > > > This is puzzling. I am not sure why must a store command be expanded to a > sequence of etractvalues then stores? Is there a way to turn off this > particular optimization without turning off the whole optimization? > > Or am I looking at the wrong way to do this simple task? > > > *Thanks,* > *Jason Hu* > *https://hustmphrrr.github.io/ <https://hustmphrrr.github.io/>* > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Neil Henning Senior Software Engineer Compiler unity.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200413/ae7f1ef3/attachment.html>
Jason -Zhong Sheng- Hu via llvm-dev
2020-Apr-13 22:43 UTC
[llvm-dev] Optimization generate super long function definition
Hey Neil, thank you for replying and the information. I think it is dumb... I have changed my implementation to passing a byval array pointer. In this case, this optimization doesn't trigger. I think the expansion in this optimization is quite useless because byval array pointer achieves the same (I suppose) while the generated size remains small. This reveals unnecessarily complex execution detail. Thanks, Jason Hu https://hustmphrrr.github.io/ ________________________________ From: Neil Henning <neil.henning at unity3d.com> Sent: April 13, 2020 9:10 AM To: Jason -Zhong Sheng- Hu <fdhzs2010 at hotmail.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Optimization generate super long function definition Hey Jason Hu, So I think this is the SROA pass breaking up the load/store of the aggregate (the [500 x i32] array) into individual load/stores so that it can see if any are unused or can have their stores forwarded. This is a bit of LLVM that I personally find pretty dumb - but if you look at how Clang generates code for your above pattern it'll output memcpy intrinsics instead of doing load/stores, which gets around this issue. Note: the compiler hasn't hung, it's just spinning and comparing a few thousand instructions generated from SROA against each other, you might need to wait until the end of the universe but it should complete eventually ;) Cheers, -Neil. On Mon, Apr 13, 2020 at 12:37 AM Jason -Zhong Sheng- Hu via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi all, sorry to have sent the same question around. I am quite desperately looking for a solution to this problem and I figured the mailing list is the best bet. In my code, I generate the following function: define i32 @gl.qi([500 x i32] %x, i32 %i) { entry: %x. = alloca [500 x i32] %i. = alloca i32 %0 = alloca [500 x i32] store [500 x i32] %x, [500 x i32]* %x. store i32 %i, i32* %i. %x.1 = load [500 x i32], [500 x i32]* %x. %i.2 = load i32, i32* %i. store [500 x i32] %x.1, [500 x i32]* %0 %1 = icmp slt i32 %i.2, 500 br i1 %1, label %in-bound, label %out-of-bound out-of-bound: ; preds = %entry call void @gen.panic(i8* getelementptr inbounds ([22 x i8], [22 x i8]* @pool.str.2, i32 0, i32 0)) unreachable in-bound: ; preds = %entry %2 = getelementptr inbounds [500 x i32], [500 x i32]* %0, i32 0, i32 %i.2 %idx = load i32, i32* %2 ret i32 %idx } the high level functionality is to use %i to index %x, and if %i is out of bound, a panic function is called instead. consider the store line: store [500 x i32] %x, [500 x i32]* %x. once I pass this function to opt -O1 -S --verify --verify-each, it generates code like this: define i32 @gl.qi([500 x i32] %x, i32 %i) local_unnamed_addr { entry: %0 = alloca [500 x i32], align 4 %x.fca.0.extract = extractvalue [500 x i32] %x, 0 %x.fca.1.extract = extractvalue [500 x i32] %x, 1 %x.fca.2.extract = extractvalue [500 x i32] %x, 2 %x.fca.3.extract = extractvalue [500 x i32] %x, 3 %x.fca.4.extract = extractvalue [500 x i32] %x, 4 %x.fca.5.extract = extractvalue [500 x i32] %x, 5 until 500. I put the number to 50000 and it won’t stop. This is puzzling. I am not sure why must a store command be expanded to a sequence of etractvalues then stores? Is there a way to turn off this particular optimization without turning off the whole optimization? Or am I looking at the wrong way to do this simple task? Thanks, Jason Hu https://hustmphrrr.github.io/ _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- [https://unity3d.com/profiles/unity3d/themes/unity/images/ui/other/unity-logo-dark-email.png] Neil Henning Senior Software Engineer Compiler unity.com<http://unity.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200413/0d308301/attachment.html>