Zixu Wang via llvm-dev
2021-Sep-01 18:18 UTC
[llvm-dev] [RFC] clang support for API information generation in JSON
Hi All! I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves. Background Motivation Library and SDK providers may find it useful to be able to create and inspect a “snapshot” of APIs they expose, for example, to check for API/ABI-breaking changes between two versions, or to automate generating documentation for the APIs. Here is a list of examples of information we want to extract from APIs: • the name (spelling/mangled) of the symbol; • the unique identifier of the symbol, for example the Unified Symbol Resolution (USR); • the source location of the API declaration (file, line, column); • access control of the API (public/private/protected); • availability (available/unavailable/deprecated); • function signatures (return/parameters); • documentation comments attached to a symbol; • relations with other symbols (class methods, typedef relations, struct data fields, enum constants, etc.) Since these API information is available in the header files, which declare and distribute the APIs, we can implement a tool to extract them without invoking a compilation of the whole project to enable easy access to the information for tooling. Existing solutions While there are some existing solutions in clang to dump symbols or AST information, they either expose unnecessary low-level details or fail to provide enough information of APIs. For example, clang -ast-dump dumps low-level details for all declarations for debug purposes and the output is not machine-parsable. Doxygen also extracts documentation comments and other information from API declarations, but its output is rendered documentation in web formats which is not flexible for other uses and tools. Proposal We propose to implement this tool as a new frontend action invoked by `clang -extract-api` as show in the example below. clang -extract-api \ header.h [more_header.h ...] or a filelist\ -isysroot <SDK> \ -target <TARGET> \ -I <INCLUDE PATH> \ -isystem <SYS INCLUDE PATH> \ ... -o output.json It takes in the header file(s) or a filelist file containing paths to the header file(s) as the input. The header files will be parsed by clang and the extract-api action will visit the AST to extract needed information and serialize to a JSON output. Please find an example input and output attached. The example output is based on the symbol graph format that's already used by Swift for serializing symbol information and their relations. This format can represent the required API information and is flexible and extendable as demonstrated in the example so we think it's a good starting point. We are excited about this idea and its potential uses, and we’d love to hear feedback and suggestions! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/205f82cf/attachment-0003.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Test.h Type: application/octet-stream Size: 535 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/205f82cf/attachment-0001.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/205f82cf/attachment-0004.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.json Type: application/json Size: 33537 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/205f82cf/attachment-0001.json> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/205f82cf/attachment-0005.html>
Tom Stellard via llvm-dev
2021-Sep-01 18:26 UTC
[llvm-dev] [cfe-dev] [RFC] clang support for API information generation in JSON
On 9/1/21 11:18 AM, Zixu Wang via cfe-dev wrote:> Hi All! > > I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves. >Would this tool be able to provide the same functionality as tools like abi-compliance-checker[1] and libabigail[2], that extra ABI/API information from debuginfo? -Tom [1] https://github.com/lvc/abi-compliance-checker [2] https://sourceware.org/libabigail/> *Background > * > *Motivation* > > Library and SDK providers may find it useful to be able to create and inspect a “snapshot” of APIs they expose, for example, to check for API/ABI-breaking changes between two versions, or to automate generating documentation for the APIs. Here is a list of examples of information we want to extract from APIs: > • the name (spelling/mangled) of the symbol; > • the unique identifier of the symbol, for example the Unified Symbol Resolution (USR); > • the source location of the API declaration (file, line, column); > • access control of the API (public/private/protected); > • availability (available/unavailable/deprecated); > • function signatures (return/parameters); > • documentation comments attached to a symbol; > • relations with other symbols (class methods, typedef relations, struct data fields, enum constants, etc.) > Since these API information is available in the header files, which declare and distribute the APIs, we can implement a tool to extract them without invoking a compilation of the whole project to enable easy access to the information for tooling. > > *Existing solutions* > > While there are some existing solutions in clang to dump symbols or AST information, they either expose unnecessary low-level details or fail to provide enough information of APIs. For example, clang -ast-dump dumps low-level details for all declarations for debug purposes and the output is not machine-parsable. Doxygen also extracts documentation comments and other information from API declarations, but its output is rendered documentation in web formats which is not flexible for other uses and tools. > > *Proposal* > > We propose to implement this tool as a new frontend action invoked by `clang -extract-api` as show in the example below. > > clang -extract-api \ > header.h [more_header.h ...] or a filelist\ > -isysroot <SDK> \ > -target <TARGET> \ > -I <INCLUDE PATH> \ > -isystem <SYS INCLUDE PATH> \ > ... > -o output.json > > It takes in the header file(s) or a filelist file containing paths to the header file(s) as the input. The header files will be parsed by clang and the extract-api action will visit the AST to extract needed information and serialize to a JSON output. Please find an example input and output attached. > > The example output is based on the symbol graph format that's already used by Swift for serializing symbol information and their relations. This format can represent the required API information and is flexible and extendable as demonstrated in the example so we think it's a good starting point. > > We are excited about this idea and its potential uses, and we’d love to hear feedback and suggestions! > > > > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >
Mats Larsen via llvm-dev
2021-Sep-01 20:07 UTC
[llvm-dev] [RFC] clang support for API information generation in JSON
Hi Zixu, I just wanted to say that this is of interest to me! I work on a couple of FFI generation tools, and something like this would make it easier for us to generate code from headers. The clang AST is pretty scary so a tool like this would definitely be appreciated. Best regards Mats Larsen On Wed, Sep 1, 2021 at 8:19 PM Zixu Wang via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi All! > > I’m writing to propose clang-extract-api, a tool to collect and serialize > API information from header files, for example function signatures, > Objective-C interfaces and protocols, and inline documentation comments. We > hope it could help future tools to understand clang-based language APIs > without needing to dig into the AST themselves. > > > *Background* > *Motivation* > > Library and SDK providers may find it useful to be able to create and > inspect a “snapshot” of APIs they expose, for example, to check for > API/ABI-breaking changes between two versions, or to automate generating > documentation for the APIs. Here is a list of examples of information we > want to extract from APIs: > • the name (spelling/mangled) of the symbol; > • the unique identifier of the symbol, for example the Unified Symbol > Resolution (USR); > • the source location of the API declaration (file, line, column); > • access control of the API (public/private/protected); > • availability (available/unavailable/deprecated); > • function signatures (return/parameters); > • documentation comments attached to a symbol; > • relations with other symbols (class methods, typedef relations, struct > data fields, enum constants, etc.) > Since these API information is available in the header files, which > declare and distribute the APIs, we can implement a tool to extract them > without invoking a compilation of the whole project to enable easy access > to the information for tooling. > > *Existing solutions* > > While there are some existing solutions in clang to dump symbols or AST > information, they either expose unnecessary low-level details or fail to > provide enough information of APIs. For example, clang -ast-dump dumps > low-level details for all declarations for debug purposes and the output is > not machine-parsable. Doxygen also extracts documentation comments and > other information from API declarations, but its output is rendered > documentation in web formats which is not flexible for other uses and tools. > > *Proposal* > > We propose to implement this tool as a new frontend action invoked by > `clang -extract-api` as show in the example below. > > clang -extract-api \ > header.h [more_header.h ...] or a filelist\ > -isysroot <SDK> \ > -target <TARGET> \ > -I <INCLUDE PATH> \ > -isystem <SYS INCLUDE PATH> \ > ... > -o output.json > > It takes in the header file(s) or a filelist file containing paths to the > header file(s) as the input. The header files will be parsed by clang and > the extract-api action will visit the AST to extract needed information and > serialize to a JSON output. Please find an example input and output > attached. > > The example output is based on the symbol graph format that's already used > by Swift for serializing symbol information and their relations. This > format can represent the required API information and is flexible and > extendable as demonstrated in the example so we think it's a good starting > point. > > We are excited about this idea and its potential uses, and we’d love to > hear feedback and suggestions! > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/bc14db2b/attachment.html>
Ben Boeckel via llvm-dev
2021-Sep-01 22:00 UTC
[llvm-dev] [cfe-dev] [RFC] clang support for API information generation in JSON
On Wed, Sep 01, 2021 at 11:18:55 -0700, Zixu Wang via cfe-dev wrote:> We are excited about this idea and its potential uses, and we’d love > to hear feedback and suggestions!I'll point to this tool which already exists: https://github.com/CastXML/CastXML It dumps XML instead of JSON, but it serves the goals at least. Note that one of the main problems that needs tackling is emulating other compilers (e.g., seeing the API as MSVC sees it). --Ben
Petr Hosek via llvm-dev
2021-Sep-02 06:46 UTC
[llvm-dev] [cfe-dev] [RFC] clang support for API information generation in JSON
In Fuchsia, we have been using clang-doc ( https://clang.llvm.org/extra/clang-doc.html) for this purpose (using the YAML output format). Would it be possible to use clang-doc for your purposes? You might need to extend the output format to include additional information but that should be quite straightforward. On Wed, Sep 1, 2021 at 11:19 AM Zixu Wang via cfe-dev < cfe-dev at lists.llvm.org> wrote:> Hi All! > > I’m writing to propose clang-extract-api, a tool to collect and serialize > API information from header files, for example function signatures, > Objective-C interfaces and protocols, and inline documentation comments. We > hope it could help future tools to understand clang-based language APIs > without needing to dig into the AST themselves. > > > *Background* > *Motivation* > > Library and SDK providers may find it useful to be able to create and > inspect a “snapshot” of APIs they expose, for example, to check for > API/ABI-breaking changes between two versions, or to automate generating > documentation for the APIs. Here is a list of examples of information we > want to extract from APIs: > • the name (spelling/mangled) of the symbol; > • the unique identifier of the symbol, for example the Unified Symbol > Resolution (USR); > • the source location of the API declaration (file, line, column); > • access control of the API (public/private/protected); > • availability (available/unavailable/deprecated); > • function signatures (return/parameters); > • documentation comments attached to a symbol; > • relations with other symbols (class methods, typedef relations, struct > data fields, enum constants, etc.) > Since these API information is available in the header files, which > declare and distribute the APIs, we can implement a tool to extract them > without invoking a compilation of the whole project to enable easy access > to the information for tooling. > > *Existing solutions* > > While there are some existing solutions in clang to dump symbols or AST > information, they either expose unnecessary low-level details or fail to > provide enough information of APIs. For example, clang -ast-dump dumps > low-level details for all declarations for debug purposes and the output is > not machine-parsable. Doxygen also extracts documentation comments and > other information from API declarations, but its output is rendered > documentation in web formats which is not flexible for other uses and tools. > > *Proposal* > > We propose to implement this tool as a new frontend action invoked by > `clang -extract-api` as show in the example below. > > clang -extract-api \ > header.h [more_header.h ...] or a filelist\ > -isysroot <SDK> \ > -target <TARGET> \ > -I <INCLUDE PATH> \ > -isystem <SYS INCLUDE PATH> \ > ... > -o output.json > > It takes in the header file(s) or a filelist file containing paths to the > header file(s) as the input. The header files will be parsed by clang and > the extract-api action will visit the AST to extract needed information and > serialize to a JSON output. Please find an example input and output > attached. > > The example output is based on the symbol graph format that's already used > by Swift for serializing symbol information and their relations. This > format can represent the required API information and is flexible and > extendable as demonstrated in the example so we think it's a good starting > point. > > We are excited about this idea and its potential uses, and we’d love to > hear feedback and suggestions! > > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210901/f22a177c/attachment.html>