Artem Dergachev via llvm-dev
2018-Mar-02 21:58 UTC
[llvm-dev] [cfe-dev] [GSOC 2018] Information gathering
Hey, welcome! I'm curious about the unsequenced modification checker, is it something that I should have seen but missed for whatever reason? It might be useful, and I think I'm seeing why don't compiler warnings cover all cases, i.e. why the analyzer's path sensitivity would help here. But I can't answer until I see it :) -eg. on our Phabricator. We are currently having two confirmed mentors for the Analyzer for now (me and George), so we'd most likely be able to mentor one student each, for two projects, and it'd likely be the two projects we proposed - unless someone proposes something really interesting. And already two fairly motivated students have shown up here in the mailing lists, but this shouldn't stop you from posting your own proposal here in cfe-dev (most of the analyzer contributors aren't actively scanning llvm-dev, as far as I know). I don't know much about the binutils replacement project; someone else should reply on that one. A couple of words about the use-after-free-like checker for values managed by temporary objects (mostly strings) that go out of scope. Because internals of std::string and other similar classes are too hard for the analyzer's generic use-after-free checker to understand (mostly due to how hard it is to track STL's internal invariants, and how not all of the code is necessarily present in the header), an API-specific checker seems to be necessary. The original plan we've had in mind was to keep track of dangerous values like str.c_str() in the program state (similarly to how SimpleStreamChecker tracks file descriptors) and then see if any of them are still present in memory at the end of the original value's lifetime (similarly to how StackAddrEscape checker finds stack pointers at the end of a function's stack frame). The unknowns here include how easy would it be to track scopes (for now we only track function scopes, but if fairly old but recently reincarnated patches [1] and [2] land any time soon, we may get a much better granularity), how easy would it be to track objects when they are moved or lifetime-extended by binding to references, which was a large problem for other C++ object checkers, but we may work our way around it to some extent (or do it properly, depending on my current work outlined in [3] and in follow-up mails in February), and also how helpful inlining would be (eg. would we be able to automagically support string_view-like classes by inlining their methods?). So the checker would need an almost indefinite amount of incremental improvements once the initial prototype is done, some of which must be fairly curious and would certainly expose you to some of the analyzer's internals. On 01/03/2018 11:43 AM, Paul Semel via cfe-dev wrote:> Hey, > > On 02/20/2018 11:51 PM, Paul Semel wrote: >> Hello, >> >> >> I'm Paul Semel, a French student in computer science. I am currently >> in my 4th year (1st year of graduate school) at EPITA and enrolled in >> the system and security laboratory of the school. >> >> I would be very interested in working on a LLVM project during this >> GSoC. Implementing a PoC for an unsequenced modification checker in >> CSA helped me discover LLVM. However, I would like to dive deeper in >> this project. >> >> I've seen some of the proposals, and I would like to ask a few >> questions about two of those. >> >> As you might have guessed, I have some interest in the checker for >> dangling string pointers : >> >> - Do you think it would help if I kept working on improving my >> unsequenced modification checker to get more familiar with Clang >> Static Analyzer ? >> >> I'm also interested in the command line replacements for GNU Binutils : >> >> - What tools would you like to replace in priority ? >> - Does this subject imply to add options/features to some of the >> tools, or is it only about handling command line ? >> >> Thank you very much, >> >> > > Adding cfe-dev.. > > Regards, >
Paul Semel via llvm-dev
2018-Mar-06 16:24 UTC
[llvm-dev] [cfe-dev] [GSOC 2018] Information gathering
Hi, Thanks for replying ! On 03/02/2018 10:58 PM, Artem Dergachev wrote:> Hey, welcome! > > I'm curious about the unsequenced modification checker, is it something > that I should have seen but missed for whatever reason? It might be > useful, and I think I'm seeing why don't compiler warnings cover all > cases, i.e. why the analyzer's path sensitivity would help here. But I > can't answer until I see it :) -eg. on our Phabricator. >So.. I uploaded the checker on Phabricator ! Please keep in mind that it was for me a proof of concept, and I didn't have in mind to purpose this patch at the time I was developping it (and didn't have the time to improve it for the moment, as I am currently working on a structure pretty printing builtin - https://reviews.llvm.org/D44093). For the moment, this checker is not able to detect all the unsequenced modifications, but can detect things like this : ```c static int a = 0; int foo(void) { return a++; } int main(void) { int res = a++ + foo(); return res; } ``` So here is the link on Phabricator : https://reviews.llvm.org/D44154> We are currently having two confirmed mentors for the Analyzer for now > (me and George), so we'd most likely be able to mentor one student each, > for two projects, and it'd likely be the two projects we proposed - > unless someone proposes something really interesting. And already two > fairly motivated students have shown up here in the mailing lists, but > this shouldn't stop you from posting your own proposal here in cfe-dev > (most of the analyzer contributors aren't actively scanning llvm-dev, as > far as I know). > > I don't know much about the binutils replacement project; someone else > should reply on that one. >Sure, I would really like to have some other info on this one ! Maybe you know someone I could had in cc of this thread ? 🙂> A couple of words about the use-after-free-like checker for values > managed by temporary objects (mostly strings) that go out of scope. > Because internals of std::string and other similar classes are too hard > for the analyzer's generic use-after-free checker to understand (mostly > due to how hard it is to track STL's internal invariants, and how not > all of the code is necessarily present in the header), an API-specific > checker seems to be necessary. The original plan we've had in mind was > to keep track of dangerous values like str.c_str() in the program state > (similarly to how SimpleStreamChecker tracks file descriptors) and then > see if any of them are still present in memory at the end of the > original value's lifetime (similarly to how StackAddrEscape checker > finds stack pointers at the end of a function's stack frame). >Ok I think that I understand the idea. So the idea is that this checker will be an API that will permit to track those invariants (and we will use this API to track str.c_str()). Am I right ?> The unknowns here include how easy would it be to track scopes (for now > we only track function scopes, but if fairly old but recently > reincarnated patches [1] and [2] land any time soon, we may get a much > better granularity), how easy would it be to track objects when they are > moved or lifetime-extended by binding to references, which was a large > problem for other C++ object checkers, but we may work our way around it > to some extent (or do it properly, depending on my current work outlined > in [3] and in follow-up mails in February), and also how helpful > inlining would be (eg. would we be able to automagically support > string_view-like classes by inlining their methods?). So the checker > would need an almost indefinite amount of incremental improvements once > the initial prototype is done, some of which must be fairly curious and > would certainly expose you to some of the analyzer's internals. > >Wow. This project sounds really cool, it's really too bad that there is already two students on this project.> > On 01/03/2018 11:43 AM, Paul Semel via cfe-dev wrote: >> Hey, >> >> On 02/20/2018 11:51 PM, Paul Semel wrote: >>> Hello, >>> >>> >>> I'm Paul Semel, a French student in computer science. I am currently >>> in my 4th year (1st year of graduate school) at EPITA and enrolled in >>> the system and security laboratory of the school. >>> >>> I would be very interested in working on a LLVM project during this >>> GSoC. Implementing a PoC for an unsequenced modification checker in >>> CSA helped me discover LLVM. However, I would like to dive deeper in >>> this project. >>> >>> I've seen some of the proposals, and I would like to ask a few >>> questions about two of those. >>> >>> As you might have guessed, I have some interest in the checker for >>> dangling string pointers : >>> >>> - Do you think it would help if I kept working on improving my >>> unsequenced modification checker to get more familiar with Clang >>> Static Analyzer ? >>> >>> I'm also interested in the command line replacements for GNU Binutils : >>> >>> - What tools would you like to replace in priority ? >>> - Does this subject imply to add options/features to some of the >>> tools, or is it only about handling command line ? >>> >>> Thank you very much, >>> >>> >> >> Adding cfe-dev.. >> >> Regards, >> >By the way, if you have some free time, I would really appreciate to have some advices on a better way to do the unsequenced modification checker. 🙂 Thanks, -- Paul Semel
Artem Dergachev via llvm-dev
2018-Mar-08 03:29 UTC
[llvm-dev] [cfe-dev] [GSOC 2018] Information gathering
On 06/03/2018 8:24 AM, Paul Semel wrote:> Hi, > > Thanks for replying ! > > On 03/02/2018 10:58 PM, Artem Dergachev wrote: >> Hey, welcome! >> >> I'm curious about the unsequenced modification checker, is it >> something that I should have seen but missed for whatever reason? It >> might be useful, and I think I'm seeing why don't compiler warnings >> cover all cases, i.e. why the analyzer's path sensitivity would help >> here. But I can't answer until I see it :) -eg. on our Phabricator. >> > > So.. I uploaded the checker on Phabricator !Yay! I'll comment with my thoughts on this, so that you could polish it when you have time. Note that this shouldn't necessarily have anything to do with GSoC - we're accepting code in all seasons :)> Please keep in mind that it was for me a proof of concept, and I > didn't have in mind to purpose this patch at the time I was > developping it (and didn't have the time to improve it for the moment, > as I am currently working on a structure pretty printing builtin - > https://reviews.llvm.org/D44093). > > For the moment, this checker is not able to detect all the unsequenced > modifications, but can detect things like this : > > ```c > static int a = 0; > > int foo(void) > { >  return a++; > } > > int main(void) > { >  int res = a++ + foo(); >  return res; > } > ```This sounds like, for once, a bug that the analyzer might be really good at finding, and the check isn't going to be super loud, which makes me quite excited about this check.> So here is the link on Phabricator : https://reviews.llvm.org/D44154 > >> We are currently having two confirmed mentors for the Analyzer for >> now (me and George), so we'd most likely be able to mentor one >> student each, for two projects, and it'd likely be the two projects >> we proposed - unless someone proposes something really interesting. >> And already two fairly motivated students have shown up here in the >> mailing lists, but this shouldn't stop you from posting your own >> proposal here in cfe-dev (most of the analyzer contributors aren't >> actively scanning llvm-dev, as far as I know). >> >> I don't know much about the binutils replacement project; someone >> else should reply on that one. >> > > Sure, I would really like to have some other info on this one ! Maybe > you know someone I could had in cc of this thread ? 🙂Sorry, I'm completely out of topic on that one. This project has two assigned mentors, as mentioned in http://llvm.org/OpenProjects.html#replace_binary_utilities - you might try to contact them directly in case they accidentally missed your mail.> >> A couple of words about the use-after-free-like checker for values >> managed by temporary objects (mostly strings) that go out of scope. >> Because internals of std::string and other similar classes are too >> hard for the analyzer's generic use-after-free checker to understand >> (mostly due to how hard it is to track STL's internal invariants, and >> how not all of the code is necessarily present in the header), an >> API-specific checker seems to be necessary. The original plan we've >> had in mind was to keep track of dangerous values like str.c_str() in >> the program state (similarly to how SimpleStreamChecker tracks file >> descriptors) and then see if any of them are still present in memory >> at the end of the original value's lifetime (similarly to how >> StackAddrEscape checker finds stack pointers at the end of a >> function's stack frame). >> > > Ok I think that I understand the idea. So the idea is that this > checker will be an API that will permit to track those invariants (and > we will use this API to track str.c_str()). > Am I right ?No-no, i mean that .c_str() is a (part of) certain API :) ...and we want see if it's used correctly. But in order to do that, we don't want to understand how it works in a particular implementation of, say, C++ standard library. Instead, we know how it is supposed to work, and encode part of this knowledge about this API into the analyzer so that it could find misused of it. Eg., we don't care what exact value is returned by .c_str() and how exactly it is allocated or deleted. The only thing we care about is that we shouldn't keep it around after the string is destroyed. In this sense, the checker is API-specific: it works by knowing about a particular API, not through generic knowledge of the language. Similarly, SimpleStreamChecker doesn't want to know what it takes to open a file: it only knows that the file that was opened must also be closed. For this checker it's more realistic to fully understand how the API works internally, but still hard. Just in case, i'm mentioning SimpleStreamChecker because it's essentially an example/hello-world checker described in a very detailed manner in https://youtu.be/kdxlsP5QVPw (totally recommended).>> The unknowns here include how easy would it be to track scopes (for >> now we only track function scopes, but if fairly old but recently >> reincarnated patches [1] and [2] land any time soon, we may get a >> much better granularity), how easy would it be to track objects when >> they are moved or lifetime-extended by binding to references, which >> was a large problem for other C++ object checkers, but we may work >> our way around it to some extent (or do it properly, depending on my >> current work outlined in [3] and in follow-up mails in February), and >> also how helpful inlining would be (eg. would we be able to >> automagically support string_view-like classes by inlining their >> methods?). So the checker would need an almost indefinite amount of >> incremental improvements once the initial prototype is done, some of >> which must be fairly curious and would certainly expose you to some >> of the analyzer's internals. >> >> > > Wow. This project sounds really cool, it's really too bad that there > is already two students on this project. > >> On 01/03/2018 11:43 AM, Paul Semel via cfe-dev wrote: >>> Hey, >>> >>> On 02/20/2018 11:51 PM, Paul Semel wrote: >>>> Hello, >>>> >>>> >>>> I'm Paul Semel, a French student in computer science. I am >>>> currently in my 4th year (1st year of graduate school) at EPITA and >>>> enrolled in the system and security laboratory of the school. >>>> >>>> I would be very interested in working on a LLVM project during this >>>> GSoC. Implementing a PoC for an unsequenced modification checker in >>>> CSA helped me discover LLVM. However, I would like to dive deeper >>>> in this project. >>>> >>>> I've seen some of the proposals, and I would like to ask a few >>>> questions about two of those. >>>> >>>> As you might have guessed, I have some interest in the checker for >>>> dangling string pointers : >>>> >>>> - Do you think it would help if I kept working on improving my >>>> unsequenced modification checker to get more familiar with Clang >>>> Static Analyzer ? >>>> >>>> I'm also interested in the command line replacements for GNU >>>> Binutils : >>>> >>>> - What tools would you like to replace in priority ? >>>> - Does this subject imply to add options/features to some of the >>>> tools, or is it only about handling command line ? >>>> >>>> Thank you very much, >>>> >>>> >>> >>> Adding cfe-dev.. >>> >>> Regards, >>> >> > > By the way, if you have some free time, I would really appreciate to > have some advices on a better way to do the unsequenced modification > checker. 🙂 > > > Thanks, >