Did anyone send an RFC for this?
First-class metadata would be exceptionally useful for sanitizers and other
dynamic tools. For
example, we want to construct PC-keyed metadata tables in the binary
(without affecting the
generated code), to inform program behavior at runtime or to allow offline
analysis. A
prerequisite is to actually propagate the metadata we need from the Clang
frontend or LLVM
middle-end down to the assembly printer.
Our team has brainstormed many use cases:
- *GWP-TSan* <https://youtu.be/2KvaKEyMVEU>: storing PCs of accesses
lowered from C++ atomics, to filter them out from race
detection.
* List<atomic access PC>
- *Stack trace compression*: storing a conservative call graph
<https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html>, for use
in decompressing stack
traces offline.
* Map[callsite PC] -> List<callee PC>
- *no_sanitize attributes*: storing a map of functions that have the
no_sanitize("...")
attribute to the associated sanitizer, for filtering out from GWP-*San.
Ideally we do not
introduce new no_sanitize string literals, but simply rely on existing
ones (e.g. a
no_sanitize("thread") works for both TSan but also GWP-TSan).
* Map[Func] -> SanitizerKind
- *Fuzzing aid/CFG reconstruction*: marking coverage PCs as function
entry/exit or # of
outgoing edges from BB (allows to find gaps in coverage frontier).
- *Type-aware malloc and heap profiling*: enable the allocator to get the
type for a given new
call, to optimize for expected usage of the allocation.
* Map[new callsite PC] -> object type
- *Other*: potential use cases for future bug-finding tools (GWP-assert,
GWP-MSan,
GWP-DFSan, GWP-UBSan).
First-class metadata would open the door to some really cool things.
Thanks,
Matt Morehouse
On Wed, Jan 6, 2021 at 5:56 AM Lorenzo Casalino via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Dear Tuan,
>
> How are you doing? Did you manage to start the draft for the RFC?
>
>
> I take this opportunity to wish you all the best for this new year :)
>
> Best regards,
> Lorenzo Casalino
> Le 10/11/20 à 09:27, Lorenzo Casalino a écrit :
>
>
> Le 09/11/20 à 00:30, Son Tuan VU a écrit :
>
> Hi,
>
> Thank you all for keeping this going. Indeed I was not aware that the
> discussion was going on, I am really sorry for this late reply.
>
> Nice to hear you again! Thank you for starting this thread ;)
>
> I understand Chris' point about metadata design. Either the metadata
> becomes stale or removed (if we do not teach transformations to preserve
> it), or we end up modifying many (if not all) transformations to keep the
> data intact.
> Currently in the IR, I feel like the default behavior is to ignore/remove
> the metadata, and only a limited number of transformations know how to
> maintain and update it, which is a best-effort approach.
> That being said, my initial thought was to adopt this approach to the MIR,
> so that we can at least have a minimal mechanism to communicate additional
> information to various transformations, or even dump it to the asm/object
> file.
> In other words, it is the responsibility of the users who introduce/use
> the metadata in the MIR to teach the transformations they selected how to
> preserve their metadata. A common API to abstract this would definitely
> help, just as combineMetadata() from lib/Transforms/Utils/Local.cpp does.
>
> Unfortunately, I never worked with the LLVM-IR Metadata (I almost focused
> on the back-end
> and I just scratched the LLVM's middle-end), but I see your point.
>
> Clearly, applying the needed modifications to all the back-end
> transformations/optimizations
> is unfeasible and, probably, not worth it -- different users may have
> different requirements/needs
> regarding a specific pass.
>
> I like the idea of a common API to handle the MIR metadata, and let the
> end user handle
> such data. Of course, if the community encounters common cases while
> handling the metadata, such
> cases may be integrated with the upstream project.
>
> Nonetheless, the main point of this thread is to preserve middle-end
> metadata down to the
> back-end, right after the Instruction Selection phase. Hence, despite the
> need of the end user, a
> "preserve-all" policy during the lowering stage is required,
which will
> involve a bit of changes,
> in particular in the DAGCombine pass.
>
>
> As for my use case, it is also security-related. However, I do not
> consider the metadata to be a compilation "correctness" criteria:
metadata,
> by definition (from the LLVM IR), can be safely removed without affecting
> the program's correctness.
> If possible, I would like to have more details on Lorenzo's use case in
> order to see how metadata would interfere with program's correctness.
>
> I would really like to discuss here the details, but, unfortunately, I am
> working on a publication
> and, thus, I cannot disclose any detail here :(
>
> However, with "correctness" I do not refer to "I/O
correctness", but the
> preservation of a
> security property expressed in the front-end (e.g., specified in the
> source-code) or in the
> middle-end (e.g., specified in the LLVM-IR, for instance by a
> transformation pass).
>
> From a security point-of-view, removing or altering metadata does not
> interfere with the I/O
> functionality of the code (although may impact on the performances), but
> may introduce
> vulnerabilities.
>
> As for the RFC, I can definitely try to write one, but this would be my
> first time doing so. But maybe it is better to start with Lorenzo's
> proposal, as you have already been working on this? Please tell me if you
> prefer me to start the RFC though.
>
> It is the first time for me too, do not worry!
>
> We could just use any other RFC as a template to get started :D
>
> I think that a structure like the following would be fine:
>
> 1. Background
> 1.1 Motivation
> 1.2 Use-cases
> 1.3 Other approaches
> 2. Goal(s)
> 3. Requirements
> 4. Drawbacks and main bottlenecks
> 5. Design sketch
> 6. Roadmap sketch
> 7. Potential future development
>
> It may be a bit overkill; you are warmly invited to cut/refine these
> points!
>
> And...no, I still have no sketch of the RFC; sorry, I had a bit of
> workload in these
> days.
>
> Yes, you can start the write up of the RFC.
>
> Quoting David:
>
> "Since you first raised the topic [...] I want to give you right of
> first refusal."
>
>
> Have a nice day!
>
> -- Lorenzo
>
> Thank you again for keeping this going.
>
> Sincerely,
>
> - Son
>
> On Wed, Nov 4, 2020 at 6:30 PM Lorenzo Casalino <
> lorenzo.casalino93 at gmail.com> wrote:
>
>>
>> Le 04/11/20 à 17:40, David Greene a écrit :
>> > Sorry about the late reply.
>> >
>> > Lorenzo Casalino <lorenzo.casalino93 at gmail.com> writes:
>> >
>> >>>>> - Should not impact compile time excessively (what
is "excessive?")
>> >>>> Probably, such estimation should be performed on
>> >>> Did something get cut off here?
>> >> Uops. Yep, I removed a paragraph, but, apparentely I forgot
the first
>> >> period. In any case, we should discuss about how to
quantitatively
>> >> determine an acceptable upper-bound on the overhead on the
compilation
>> >> time and give a motivation for it. For instance, max n%
overhead on the
>> >> compilation time must be guaranteed, because ** list of
reasons **.
>> > I am not sure how we'd arrive at such a number or
motivate/defend it.
>> > Do we have any sense of the impact of the existing metadata
>> > infrastructure? If not I'm not sure we can do it for
something
>> > completely new. I think we can set a goal but we'd have to
revise it as
>> > we gain experience.
>> I think it is the best approach to employ :)
>> >>> Since you initially raised the topic, do you want to take
the lead in
>> >>> writing up a RFC? I can certainly do it too but I want to
give you
>> >>> right of first refusal. :)
>> >>> -David
>> >> Uhm...actually, it wasn't me but Son Tuan, so the right of
refusal
>> >> should be granted to him :) And I noticed now that he
wasn't included
>> in
>> >> CC of all our mails; I hope he was able to follow our
discussion
>> >> anyways. I am adding him in this mail and let us wait if he
has any
>> >> critical feature or point to discuss.
>> > Fair enough! I have recently taken on a lot more work so
unfortunately
>> > I can't devote a lot of time to this at the moment. I've
got to clear
>> > out my pipeline first. I'd be very happy to help review text,
etc.
>> Do not worry, it is ok ;) Meanwhile we wait for any feedback/input from
>> Son,
>> I'll try to prepare a draft of RFC and publish it here.
>>
>> Thank you David, and have a nice day :)
>>
>> -- Lorenzo
>>
>> > -David
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/4cb70b07/attachment.html>