Thanks for the update, Lorenzo.
I have some free time to work on an RFC, but I'm unfamiliar with how the
implementation details would work.
If I dig through this thread and try to draft something, would you and/or
Son be willing to contribute?
Thanks,
Matt
On Wed, Jun 16, 2021 at 12:02 PM Lorenzo Casalino <
lorenzo.casalino93 at gmail.com> wrote:
> Hello Matt,
>
> I think that the RFC drafting went stale some months ago due to heavy
> workload on which all the partecipants were subject to.
>
> As of now, I do not know when the RFC will be actually drafted and sent.
>
> Cheers,
> Lorenzo
>
> Le 16 juin 2021 à 1:32 AM, Matt Morehouse <mascasa at google.com> a
écrit :
>
>
> Did anyone send an RFC for this?
>
> First-class metadata would be exceptionally useful for sanitizers and
> other dynamic tools. For
> example, we want to construct PC-keyed metadata tables in the binary
> (without affecting the
> generated code), to inform program behavior at runtime or to allow offline
> analysis. A
> prerequisite is to actually propagate the metadata we need from the Clang
> frontend or LLVM
> middle-end down to the assembly printer.
>
> Our team has brainstormed many use cases:
>
> - *GWP-TSan* <https://youtu.be/2KvaKEyMVEU>: storing PCs of accesses
> lowered from C++ atomics, to filter them out from race
> detection.
> * List<atomic access PC>
>
> - *Stack trace compression*: storing a conservative call graph
> <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html>,
for
> use in decompressing stack
> traces offline.
> * Map[callsite PC] -> List<callee PC>
>
> - *no_sanitize attributes*: storing a map of functions that have the
> no_sanitize("...")
> attribute to the associated sanitizer, for filtering out from GWP-*San.
> Ideally we do not
> introduce new no_sanitize string literals, but simply rely on existing
> ones (e.g. a
> no_sanitize("thread") works for both TSan but also GWP-TSan).
> * Map[Func] -> SanitizerKind
>
> - *Fuzzing aid/CFG reconstruction*: marking coverage PCs as function
> entry/exit or # of
> outgoing edges from BB (allows to find gaps in coverage frontier).
>
> - *Type-aware malloc and heap profiling*: enable the allocator to get
> the type for a given new
> call, to optimize for expected usage of the allocation.
> * Map[new callsite PC] -> object type
>
> - *Other*: potential use cases for future bug-finding tools (GWP-assert,
> GWP-MSan,
> GWP-DFSan, GWP-UBSan).
>
> First-class metadata would open the door to some really cool things.
>
> Thanks,
> Matt Morehouse
>
>
> On Wed, Jan 6, 2021 at 5:56 AM Lorenzo Casalino via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Dear Tuan,
>>
>> How are you doing? Did you manage to start the draft for the RFC?
>>
>>
>> I take this opportunity to wish you all the best for this new year :)
>>
>> Best regards,
>> Lorenzo Casalino
>> Le 10/11/20 à 09:27, Lorenzo Casalino a écrit :
>>
>>
>> Le 09/11/20 à 00:30, Son Tuan VU a écrit :
>>
>> Hi,
>>
>> Thank you all for keeping this going. Indeed I was not aware that the
>> discussion was going on, I am really sorry for this late reply.
>>
>> Nice to hear you again! Thank you for starting this thread ;)
>>
>> I understand Chris' point about metadata design. Either the
metadata
>> becomes stale or removed (if we do not teach transformations to
preserve
>> it), or we end up modifying many (if not all) transformations to keep
the
>> data intact.
>> Currently in the IR, I feel like the default behavior is to
ignore/remove
>> the metadata, and only a limited number of transformations know how to
>> maintain and update it, which is a best-effort approach.
>> That being said, my initial thought was to adopt this approach to the
>> MIR, so that we can at least have a minimal mechanism to communicate
>> additional information to various transformations, or even dump it to
the
>> asm/object file.
>> In other words, it is the responsibility of the users who introduce/use
>> the metadata in the MIR to teach the transformations they selected how
to
>> preserve their metadata. A common API to abstract this would definitely
>> help, just as combineMetadata() from lib/Transforms/Utils/Local.cpp
does.
>>
>> Unfortunately, I never worked with the LLVM-IR Metadata (I almost
focused
>> on the back-end
>> and I just scratched the LLVM's middle-end), but I see your point.
>>
>> Clearly, applying the needed modifications to all the back-end
>> transformations/optimizations
>> is unfeasible and, probably, not worth it -- different users may have
>> different requirements/needs
>> regarding a specific pass.
>>
>> I like the idea of a common API to handle the MIR metadata, and let the
>> end user handle
>> such data. Of course, if the community encounters common cases while
>> handling the metadata, such
>> cases may be integrated with the upstream project.
>>
>> Nonetheless, the main point of this thread is to preserve middle-end
>> metadata down to the
>> back-end, right after the Instruction Selection phase. Hence, despite
the
>> need of the end user, a
>> "preserve-all" policy during the lowering stage is required,
which will
>> involve a bit of changes,
>> in particular in the DAGCombine pass.
>>
>>
>> As for my use case, it is also security-related. However, I do not
>> consider the metadata to be a compilation "correctness"
criteria: metadata,
>> by definition (from the LLVM IR), can be safely removed without
affecting
>> the program's correctness.
>> If possible, I would like to have more details on Lorenzo's use
case in
>> order to see how metadata would interfere with program's
correctness.
>>
>> I would really like to discuss here the details, but, unfortunately, I
am
>> working on a publication
>> and, thus, I cannot disclose any detail here :(
>>
>> However, with "correctness" I do not refer to "I/O
correctness", but the
>> preservation of a
>> security property expressed in the front-end (e.g., specified in the
>> source-code) or in the
>> middle-end (e.g., specified in the LLVM-IR, for instance by a
>> transformation pass).
>>
>> From a security point-of-view, removing or altering metadata does not
>> interfere with the I/O
>> functionality of the code (although may impact on the performances),
but
>> may introduce
>> vulnerabilities.
>>
>> As for the RFC, I can definitely try to write one, but this would be my
>> first time doing so. But maybe it is better to start with Lorenzo's
>> proposal, as you have already been working on this? Please tell me if
you
>> prefer me to start the RFC though.
>>
>> It is the first time for me too, do not worry!
>>
>> We could just use any other RFC as a template to get started :D
>>
>> I think that a structure like the following would be fine:
>>
>> 1. Background
>> 1.1 Motivation
>> 1.2 Use-cases
>> 1.3 Other approaches
>> 2. Goal(s)
>> 3. Requirements
>> 4. Drawbacks and main bottlenecks
>> 5. Design sketch
>> 6. Roadmap sketch
>> 7. Potential future development
>>
>> It may be a bit overkill; you are warmly invited to cut/refine these
>> points!
>>
>> And...no, I still have no sketch of the RFC; sorry, I had a bit of
>> workload in these
>> days.
>>
>> Yes, you can start the write up of the RFC.
>>
>> Quoting David:
>>
>> "Since you first raised the topic [...] I want to give you right
of
>> first refusal."
>>
>>
>> Have a nice day!
>>
>> -- Lorenzo
>>
>> Thank you again for keeping this going.
>>
>> Sincerely,
>>
>> - Son
>>
>> On Wed, Nov 4, 2020 at 6:30 PM Lorenzo Casalino <
>> lorenzo.casalino93 at gmail.com> wrote:
>>
>>>
>>> Le 04/11/20 à 17:40, David Greene a écrit :
>>> > Sorry about the late reply.
>>> >
>>> > Lorenzo Casalino <lorenzo.casalino93 at gmail.com>
writes:
>>> >
>>> >>>>> - Should not impact compile time excessively
(what is "excessive?")
>>> >>>> Probably, such estimation should be performed on
>>> >>> Did something get cut off here?
>>> >> Uops. Yep, I removed a paragraph, but, apparentely I
forgot the first
>>> >> period. In any case, we should discuss about how to
quantitatively
>>> >> determine an acceptable upper-bound on the overhead on the
compilation
>>> >> time and give a motivation for it. For instance, max n%
overhead on
>>> the
>>> >> compilation time must be guaranteed, because ** list of
reasons **.
>>> > I am not sure how we'd arrive at such a number or
motivate/defend it.
>>> > Do we have any sense of the impact of the existing metadata
>>> > infrastructure? If not I'm not sure we can do it for
something
>>> > completely new. I think we can set a goal but we'd have
to revise it
>>> as
>>> > we gain experience.
>>> I think it is the best approach to employ :)
>>> >>> Since you initially raised the topic, do you want to
take the lead in
>>> >>> writing up a RFC? I can certainly do it too but I
want to give you
>>> >>> right of first refusal. :)
>>> >>> -David
>>> >> Uhm...actually, it wasn't me but Son Tuan, so the
right of refusal
>>> >> should be granted to him :) And I noticed now that he
wasn't included
>>> in
>>> >> CC of all our mails; I hope he was able to follow our
discussion
>>> >> anyways. I am adding him in this mail and let us wait if
he has any
>>> >> critical feature or point to discuss.
>>> > Fair enough! I have recently taken on a lot more work so
unfortunately
>>> > I can't devote a lot of time to this at the moment.
I've got to clear
>>> > out my pipeline first. I'd be very happy to help review
text, etc.
>>> Do not worry, it is ok ;) Meanwhile we wait for any feedback/input
from
>>> Son,
>>> I'll try to prepare a draft of RFC and publish it here.
>>>
>>> Thank you David, and have a nice day :)
>>>
>>> -- Lorenzo
>>>
>>> > -David
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210616/a5417e2d/attachment-0001.html>