Hi, everyone. I'm a senior at Swarthmore College and would love to work with LLVM this summer. I'm interested in systems languages and security, and I'll start a PhD on these topics this fall. I also do a good deal of open source development and auditing with OpenBSD and a variety of other projects. I spent last year's GSoC doing security auditing for Pidgin/libpurple. GSoC seems like a great way to spend this summer as well. I'm particularly interested in the SAFECode project and LLVM's general security and auditing features. I haven't worked a ton with LLVM in the past, but I have a mostly finished strnlen(3) optimization I've been meaning to resubmit: https://marc.info/?l=llvm-commits&m=145485679322322&w=2 I also worked with Martin Natano to port the integer overflow checker to OpenBSD and build a working kernel and userland with it. We now have a patch that integrates it into the full system build through libc, and fixed a number of bugs in the process. Because of my background in auditing, I like to think I have an intuition for which compiler and scanner features developers will find useful and usable. I also have a good understanding of the more theoretical aspects of language and compiler design, and I'm very familiar with the ANSI C and POSIX specs. In regards to potential projects, I'd like to rewrite the SAFECode static array bounds check pass and add check optimizations (both to remove statically unnecessary checks and improve the generated code for remaining ones). In the process, I'd refactor and simplify what already exists, fixing bugs as I encounter them. New checks at the libc API level could also be interesting, if they're within the scope of the project. This work would almost certainly lead to an OpenBSD port. John would probably be helpful on both accounts, as he's integrated SAFECode with FreeBSD. I'm also competent with the OpenBSD port system's bulk build infrastructure, so I'm confident that I could test SAFECode on a handful of important projects or even en masse. I'm open to other project ideas as well. If anyone else is mentoring a project that seems like a good fit for me, please share. Thanks for your time, Michael McConville
Dear Michael, If you're interested in SAFECode, the first step is to get SAFECode working with a newer version of LLVM. A Master's student did some work on this last summer with LLVM 3.7 but didn't finish. It would now need to be updated to LLVM 3.8 (though I suppose a completed LLVM 3.7 port would be fine with me). After that, there are some interesting projects on which to work. One would be static array bounds checking. That could be interesting, but it doesn't really address my immediate research needs. Right now, I'm more interested in getting the Baggy Bounds with Accurate Checking (BBAC) feature enabled so that we can use it in research. For example, we could try to get faster enforcement of memory safety on operating system kernels, examine the use of combined safe/unsafe languages for OS kernels (without letting C code violate the safety provided by the safe language), and enforce dynamic security policies on kernel modules (to thwart rootkits). If you're interested in security projects on the kernel, you could enhance the KCoFI prototype to use a more accurate control-flow graph or to use code pointer integrity, or you could write optimizations for the software-fault isolation instrumentation (which would improve both KCoFI and Virtual Ghost, if you are familiar with those papers of mine). Does any of these projects sound interesting to you? Regards, John Criswell On 3/21/16 10:07 PM, Michael McConville via llvm-dev wrote:> Hi, everyone. > > I'm a senior at Swarthmore College and would love to work with LLVM this > summer. I'm interested in systems languages and security, and I'll start > a PhD on these topics this fall. I also do a good deal of open source > development and auditing with OpenBSD and a variety of other projects. > > I spent last year's GSoC doing security auditing for Pidgin/libpurple. > GSoC seems like a great way to spend this summer as well. I'm > particularly interested in the SAFECode project and LLVM's general > security and auditing features. I haven't worked a ton with LLVM in the > past, but I have a mostly finished strnlen(3) optimization I've been > meaning to resubmit: > > https://marc.info/?l=llvm-commits&m=145485679322322&w=2 > > I also worked with Martin Natano to port the integer overflow checker to > OpenBSD and build a working kernel and userland with it. We now have a > patch that integrates it into the full system build through libc, and > fixed a number of bugs in the process. > > Because of my background in auditing, I like to think I have an > intuition for which compiler and scanner features developers will find > useful and usable. I also have a good understanding of the more > theoretical aspects of language and compiler design, and I'm very > familiar with the ANSI C and POSIX specs.Can you clarify what you mean by "theoretical aspects of language and compiler design?" Does that mean that you understand Kam/Ullman (i.e., classical) data-flow analysis and SSA-based compiler analysis algorithms?> > In regards to potential projects, I'd like to rewrite the SAFECode > static array bounds check pass and add check optimizations (both to > remove statically unnecessary checks and improve the generated code for > remaining ones). In the process, I'd refactor and simplify what already > exists, fixing bugs as I encounter them. New checks at the libc API > level could also be interesting, if they're within the scope of the > project.> > This work would almost certainly lead to an OpenBSD port. John would > probably be helpful on both accounts, as he's integrated SAFECode with > FreeBSD. I'm also competent with the OpenBSD port system's bulk build > infrastructure, so I'm confident that I could test SAFECode on a handful > of important projects or even en masse. > > I'm open to other project ideas as well. If anyone else is mentoring a > project that seems like a good fit for me, please share. > > Thanks for your time, > Michael McConville > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell
John Criswell wrote:> If you're interested in SAFECode, the first step is to get SAFECode > working with a newer version of LLVM. A Master's student did some > work on this last summer with LLVM 3.7 but didn't finish. It would > now need to be updated to LLVM 3.8 (though I suppose a completed LLVM > 3.7 port would be fine with me). > > After that, there are some interesting projects on which to work. One > would be static array bounds checking. That could be interesting, but > it doesn't really address my immediate research needs. Right now, I'm > more interested in getting the Baggy Bounds with Accurate Checking > (BBAC) feature enabled so that we can use it in research. For > example, we could try to get faster enforcement of memory safety on > operating system kernels, examine the use of combined safe/unsafe > languages for OS kernels (without letting C code violate the safety > provided by the safe language), and enforce dynamic security policies > on kernel modules (to thwart rootkits). > > If you're interested in security projects on the kernel, you could > enhance the KCoFI prototype to use a more accurate control-flow graph > or to use code pointer integrity, or you could write optimizations for > the software-fault isolation instrumentation (which would improve both > KCoFI and Virtual Ghost, if you are familiar with those papers of > mine). > > Does any of these projects sound interesting to you?Yeah, definitely. Porting to LLVM 3.8 or finishing the 3.7 port would be a good way to get more familiar with LLVM internals. BBAC looks very interesting. I, like you (according to the BBAC paper's intro), am a little frustrated by the fact that these sorts of checkers still aren't used in standard software builds, so I find optimizing for performance and simplicity particularly interesting. Also, this is an anecdote, but have you considered writing pseudo-random data to the padding area and using its checksum as a canary? Alternately, you could even just use the first few bytes of the padding directly. We recently added optional canaries to OpenBSD and it's been useful in finding bugs. I'll have to read more about the kernel projects before I can comment. Thanks, Michael