Hi, I am writing a function that clears the BSS section on an Cortex-M4 embedded system. The LLVM (version 3.7.0rc3) code I had wrote is : ;------------ target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "thumbv7em-none--eabi" @__bss_start = external global i32 @__bss_end = external global i32 define void @clearBSS () nounwind { entry: br label %bssLoopTest bssLoopTest: %p = phi i32* [@__bss_start, %entry], [%p.next, %bssLoop] %completed = icmp eq i32* %p, @__bss_end br i1 %completed, label %clearCompleted, label %bssLoop bssLoop: store i32 0, i32* %p, align 4 %p.next = getelementptr inbounds i32, i32* %p, i32 1 br label %bssLoopTest clearCompleted: ret void } ;------------ This code runs. But when I optimize it with : opt -disable-simplify-libcalls -Os -S source.ll -o optimized.ll I get the following code for the @clearBSS function : ;------------ define void @clearBSS() nounwind { entry: br label %bssLoop bssLoop: ; preds = %entry, %bssLoop %p1 = phi i32* [ @__bss_start, %entry ], [ %p.next, %bssLoop ] store i32 0, i32* %p1, align 4 %p.next = getelementptr inbounds i32, i32* %p1, i32 1 %completed = icmp eq i32* %p.next, @__bss_end br i1 %completed, label %clearCompleted, label %bssLoop clearCompleted: ; preds = %bssLoop ret void } ;------------ The optimizer has transformed the while loop into a repeat until. I think it assumes the two variables @__bss_start and @__bss_end are distinct. But they are solved at link time, and they are the same if the BSS section is empty : in this case, the optimized function fails. Is there a way to prevent the optimizer to assume the two variables are distinct ? Or what is the proper way to deal with link time values ? Thanks, Pierre Molinaro
On 8/28/15 10:52 AM, devh8h via llvm-dev wrote:> Hi, > > I am writing a function that clears the BSS section on an Cortex-M4 embedded system.I assume that, for some reason, the operating system is not demand-paging in zeroed memory. Is that correct?> > The LLVM (version 3.7.0rc3) code I had wrote is : > ;------------ > target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" > target triple = "thumbv7em-none--eabi" > > @__bss_start = external global i32 > @__bss_end = external global i32 > > define void @clearBSS () nounwind { > entry: > br label %bssLoopTest > > bssLoopTest: > %p = phi i32* [@__bss_start, %entry], [%p.next, %bssLoop] > %completed = icmp eq i32* %p, @__bss_end > br i1 %completed, label %clearCompleted, label %bssLoop > > bssLoop: > store i32 0, i32* %p, align 4 > %p.next = getelementptr inbounds i32, i32* %p, i32 1 > br label %bssLoopTest > > clearCompleted: > ret void > } > ;------------ > > This code runs. But when I optimize it with : > opt -disable-simplify-libcalls -Os -S source.ll -o optimized.ll > > I get the following code for the @clearBSS function : > ;------------ > define void @clearBSS() nounwind { > entry: > br label %bssLoop > > bssLoop: ; preds = %entry, %bssLoop > %p1 = phi i32* [ @__bss_start, %entry ], [ %p.next, %bssLoop ] > store i32 0, i32* %p1, align 4 > %p.next = getelementptr inbounds i32, i32* %p1, i32 1 > %completed = icmp eq i32* %p.next, @__bss_end > br i1 %completed, label %clearCompleted, label %bssLoop > > clearCompleted: ; preds = %bssLoop > ret void > } > ;------------ > The optimizer has transformed the while loop into a repeat until. > > I think it assumes the two variables @__bss_start and @__bss_end are distinct. But they are solved at link time, and they are the same if the BSS section is empty : in this case, the optimized function fails. > > Is there a way to prevent the optimizer to assume the two variables are distinct ? Or what is the proper way to deal with link time values ?Have you tried using the memset intrinsic? You could case bss_start and bss_end to integers, subtract them to find the length, and then use memset to zero the memory. I would think memset should work if the length is zero. Regards, John Criswell> > Thanks, > > Pierre Molinaro > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell
On Fri, Aug 28, 2015 at 04:52:56PM +0200, devh8h via llvm-dev wrote:> Is there a way to prevent the optimizer to assume the two variables > are distinct ? Or what is the proper way to deal with link time values ?Make one of them weak. Joerg
I had thought to use the memset intrinsic, unfortunately I did not succeed to cross compiling compiler-rt on my Mac. Regards, Pierre Molinaro> Le 28 août 2015 à 17:22, John Criswell <jtcriswel at gmail.com> a écrit : > > On 8/28/15 11:20 AM, devh8h wrote: >> It is a very basic "blink-led" program, on a Teensy 3.1. There is no operating system. The BSS clear function is called at boot. I would write a general BSS clear function, that behaves correctly even if the BSS section is empty. > > I thought it was something like that. > > Let me know if the memset intrinsic approach works. > > Regards, > > John Criswell > >> >> Thank, >> >> Pierre Molinaro >> >>> Le 28 août 2015 à 17:00, John Criswell <jtcriswel at gmail.com> a écrit : >>> >>> On 8/28/15 10:52 AM, devh8h via llvm-dev wrote: >>>> Hi, >>>> >>>> I am writing a function that clears the BSS section on an Cortex-M4 embedded system. >>> I assume that, for some reason, the operating system is not demand-paging in zeroed memory. Is that correct? >>> >>>> The LLVM (version 3.7.0rc3) code I had wrote is : >>>> ;------------ >>>> target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" >>>> target triple = "thumbv7em-none--eabi" >>>> >>>> @__bss_start = external global i32 >>>> @__bss_end = external global i32 >>>> >>>> define void @clearBSS () nounwind { >>>> entry: >>>> br label %bssLoopTest >>>> bssLoopTest: >>>> %p = phi i32* [@__bss_start, %entry], [%p.next, %bssLoop] >>>> %completed = icmp eq i32* %p, @__bss_end >>>> br i1 %completed, label %clearCompleted, label %bssLoop >>>> bssLoop: >>>> store i32 0, i32* %p, align 4 >>>> %p.next = getelementptr inbounds i32, i32* %p, i32 1 >>>> br label %bssLoopTest >>>> clearCompleted: >>>> ret void >>>> } >>>> ;------------ >>>> >>>> This code runs. But when I optimize it with : >>>> opt -disable-simplify-libcalls -Os -S source.ll -o optimized.ll >>>> >>>> I get the following code for the @clearBSS function : >>>> ;------------ >>>> define void @clearBSS() nounwind { >>>> entry: >>>> br label %bssLoop >>>> >>>> bssLoop: ; preds = %entry, %bssLoop >>>> %p1 = phi i32* [ @__bss_start, %entry ], [ %p.next, %bssLoop ] >>>> store i32 0, i32* %p1, align 4 >>>> %p.next = getelementptr inbounds i32, i32* %p1, i32 1 >>>> %completed = icmp eq i32* %p.next, @__bss_end >>>> br i1 %completed, label %clearCompleted, label %bssLoop >>>> >>>> clearCompleted: ; preds = %bssLoop >>>> ret void >>>> } >>>> ;------------ >>>> The optimizer has transformed the while loop into a repeat until. >>>> >>>> I think it assumes the two variables @__bss_start and @__bss_end are distinct. But they are solved at link time, and they are the same if the BSS section is empty : in this case, the optimized function fails. >>>> >>>> Is there a way to prevent the optimizer to assume the two variables are distinct ? Or what is the proper way to deal with link time values ? >>> Have you tried using the memset intrinsic? You could case bss_start and bss_end to integers, subtract them to find the length, and then use memset to zero the memory. I would think memset should work if the length is zero. >>> >>> Regards, >>> >>> John Criswell >>> >>>> Thanks, >>>> >>>> Pierre Molinaro >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> -- >>> John Criswell >>> Assistant Professor >>> Department of Computer Science, University of Rochester >>> http://www.cs.rochester.edu/u/criswell >>> > > > -- > John Criswell > Assistant Professor > Department of Computer Science, University of Rochester > http://www.cs.rochester.edu/u/criswell >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150828/bbcb48b5/attachment-0001.html>
On 8/28/15 11:31 AM, devh8h wrote:> I had thought to use the memset intrinsic, unfortunately I did not > succeed to cross compiling compiler-rt on my Mac.I don't think you need compiler-rt to use the memset intrinsic. I think the code generator will generate efficient inline code for it (though I'm not certain). In any event, Joerg's suggestion of making one external weak sounds a lot easier. :) Regards, John Criswell> > Regards, > > Pierre Molinaro > > > >> Le 28 août 2015 à 17:22, John Criswell <jtcriswel at gmail.com >> <mailto:jtcriswel at gmail.com>> a écrit : >> >> On 8/28/15 11:20 AM, devh8h wrote: >>> It is a very basic "blink-led" program, on a Teensy 3.1. There is no >>> operating system. The BSS clear function is called at boot. I would >>> write a general BSS clear function, that behaves correctly even if >>> the BSS section is empty. >> >> I thought it was something like that. >> >> Let me know if the memset intrinsic approach works. >> >> Regards, >> >> John Criswell >> >>> >>> Thank, >>> >>> Pierre Molinaro >>> >>>> Le 28 août 2015 à 17:00, John Criswell <jtcriswel at gmail.com >>>> <mailto:jtcriswel at gmail.com>> a écrit : >>>> >>>> On 8/28/15 10:52 AM, devh8h via llvm-dev wrote: >>>>> Hi, >>>>> >>>>> I am writing a function that clears the BSS section on an >>>>> Cortex-M4 embedded system. >>>> I assume that, for some reason, the operating system is not >>>> demand-paging in zeroed memory. Is that correct? >>>> >>>>> The LLVM (version 3.7.0rc3) code I had wrote is : >>>>> ;------------ >>>>> target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" >>>>> target triple = "thumbv7em-none--eabi" >>>>> >>>>> @__bss_start = external global i32 >>>>> @__bss_end = external global i32 >>>>> >>>>> define void @clearBSS () nounwind { >>>>> entry: >>>>> br label %bssLoopTest >>>>> bssLoopTest: >>>>> %p = phi i32* [@__bss_start, %entry], [%p.next, %bssLoop] >>>>> %completed = icmp eq i32* %p, @__bss_end >>>>> br i1 %completed, label %clearCompleted, label %bssLoop >>>>> bssLoop: >>>>> store i32 0, i32* %p, align 4 >>>>> %p.next = getelementptr inbounds i32, i32* %p, i32 1 >>>>> br label %bssLoopTest >>>>> clearCompleted: >>>>> ret void >>>>> } >>>>> ;------------ >>>>> >>>>> This code runs. But when I optimize it with : >>>>> opt -disable-simplify-libcalls -Os -S source.ll -o optimized.ll >>>>> >>>>> I get the following code for the @clearBSS function : >>>>> ;------------ >>>>> define void @clearBSS() nounwind { >>>>> entry: >>>>> br label %bssLoop >>>>> >>>>> bssLoop: ; preds = >>>>> %entry, %bssLoop >>>>> %p1 = phi i32* [ @__bss_start, %entry ], [ %p.next, %bssLoop ] >>>>> store i32 0, i32* %p1, align 4 >>>>> %p.next = getelementptr inbounds i32, i32* %p1, i32 1 >>>>> %completed = icmp eq i32* %p.next, @__bss_end >>>>> br i1 %completed, label %clearCompleted, label %bssLoop >>>>> >>>>> clearCompleted: ; preds = %bssLoop >>>>> ret void >>>>> } >>>>> ;------------ >>>>> The optimizer has transformed the while loop into a repeat until. >>>>> >>>>> I think it assumes the two variables @__bss_start and @__bss_end >>>>> are distinct. But they are solved at link time, and they are the >>>>> same if the BSS section is empty : in this case, the optimized >>>>> function fails. >>>>> >>>>> Is there a way to prevent the optimizer to assume the two >>>>> variables are distinct ? Or what is the proper way to deal with >>>>> link time values ? >>>> Have you tried using the memset intrinsic? You could case >>>> bss_start and bss_end to integers, subtract them to find the >>>> length, and then use memset to zero the memory. I would think >>>> memset should work if the length is zero. >>>> >>>> Regards, >>>> >>>> John Criswell >>>> >>>>> Thanks, >>>>> >>>>> Pierre Molinaro >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> -- >>>> John Criswell >>>> Assistant Professor >>>> Department of Computer Science, University of Rochester >>>> http://www.cs.rochester.edu/u/criswell >>>> >> >> >> -- >> John Criswell >> Assistant Professor >> Department of Computer Science, University of Rochester >> http://www.cs.rochester.edu/u/criswell >> >-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150828/dfc4013e/attachment.html>
Krzysztof Parzyszek via llvm-dev
2015-Aug-28 15:35 UTC
[llvm-dev] Clearing the BSS section
On 8/28/2015 9:52 AM, devh8h via llvm-dev wrote:> > Is there a way to prevent the optimizer to assume the two variables are distinct ? Or what is the proper way to deal with link time values ?You can use this: @__bss_start = extern_weak externally_initialized global i32 @__bss_end = extern_weak externally_initialized global i32 -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
The declaration is the solution : ;————— @__bss_start = extern_weak externally_initialized global i32 @__bss_end = extern_weak externally_initialized global i32 ;————— Now, the optimized code generated by opt is : ;————— define void @clearBSS() nounwind { entry: br i1 icmp eq (i32* @__bss_start, i32* @__bss_end), label %clearCompleted, label %bssLoop.preheader bssLoop.preheader: ; preds = %entry br label %bssLoop bssLoop: ; preds = %bssLoop.preheader, %bssLoop %p1 = phi i32* [ %p.next, %bssLoop ], [ @__bss_start, %bssLoop.preheader ] store i32 0, i32* %p1, align 4 %p.next = getelementptr inbounds i32, i32* %p1, i32 1 %completed = icmp eq i32* %p.next, @__bss_end br i1 %completed, label %clearCompleted.loopexit, label %bssLoop clearCompleted.loopexit: ; preds = %bssLoop br label %clearCompleted clearCompleted: ; preds = %clearCompleted.loopexit, %entry ret void } ;————— Thank you for the time you spent to help me solve my problem. Pierre Molinaro> Le 28 août 2015 à 17:35, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> a écrit : > > On 8/28/2015 9:52 AM, devh8h via llvm-dev wrote: >> >> Is there a way to prevent the optimizer to assume the two variables are distinct ? Or what is the proper way to deal with link time values ? > > You can use this: > > @__bss_start = extern_weak externally_initialized global i32 > @__bss_end = extern_weak externally_initialized global i32 > > -Krzysztof > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On Fri, Aug 28, 2015 at 8:27 AM, Joerg Sonnenberger via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Fri, Aug 28, 2015 at 04:52:56PM +0200, devh8h via llvm-dev wrote: > > Is there a way to prevent the optimizer to assume the two variables > > are distinct ? Or what is the proper way to deal with link time values ? > > Make one of them weak. >It'd be better to make the both zero-sized, like an array of i8 with zero elements. This idiom has come up before, and that's our recommended solution. We've tweaked the optimizers to ensure that zero-sized objects are not assumed to be distinct. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150828/17bd8770/attachment.html>