Luke Drummond via llvm-dev
2020-Jan-14 12:59 UTC
[llvm-dev] [tablegen] table readability / performance
Hello I've been looking at the tables generated by `SequenceToOffsetTable::emit`, and notice that when the generated data are strings, the data is basically un-grep-able, and very tricky to read, as they are emitted as an array of comma-separated char-literal: extern const char HexagonInstrNameData[] = { /* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0, /* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0, /* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0, /* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0, /* 34 */ 'V', '6', '_', 'l', 'd', '0', 0, /* 41 */ 'V', '6', '_', 'z', 'l', 'd', '0', 0, [...] }; As far as I can see, this makes it more difficult than necessary to read for at least the following cases: Target AsmStrs Target InstrNameData Target RegStrings Target RegClassStrings I hacked together a fix for the above cases locally, and found that for at least for clang and gcc, the compile-time for generated tables is significantly reduced when emitting string literals, and the user can grep the name tables without huge effort. The above table is now: extern const char HexagonInstrNameData[] = { /* 0 */ "G_FLOG10\0" /* 9 */ "ENDLOOP0\0" /* 18 */ "V6_vdd0\0" /* 26 */ "PS_vdd0\0" /* 34 */ "V6_ld0\0" /* 41 */ "V6_zld0\0" [...] }; My question then is: Is there a specific technical reason that we should avoid emitting concatenated string literals rather array of comma-separated char literals for "string-like" data? If not, I can probably post a patch, which I feel will make it much easier to understand the output from tablegen, and helps compilation speed of generated tables. Any thoughts appreciated. All the Best Luke -- Codeplay Software Ltd. Company registered in England and Wales, number: 04567874 Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
Reid Kleckner via llvm-dev
2020-Jan-14 20:27 UTC
[llvm-dev] [tablegen] table readability / performance
I don't think there's any technical reason for the current structure. Your change sounds like an improvement to me. I'm curious if you have any compile time measurements from this change. On Tue, Jan 14, 2020 at 4:59 AM Luke Drummond via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello > > I've been looking at the tables generated by > `SequenceToOffsetTable::emit`, and notice that when the generated data > are strings, the data is basically un-grep-able, and very tricky to > read, as they are emitted as an array of comma-separated char-literal: > > extern const char HexagonInstrNameData[] = { > /* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0, > /* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0, > /* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0, > /* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0, > /* 34 */ 'V', '6', '_', 'l', 'd', '0', 0, > /* 41 */ 'V', '6', '_', 'z', 'l', 'd', '0', 0, > [...] > }; > > As far as I can see, this makes it more difficult than necessary to read > for at least the following cases: > > Target AsmStrs > Target InstrNameData > Target RegStrings > Target RegClassStrings > > > I hacked together a fix for the above cases locally, and found that for > at least for clang and gcc, the compile-time for generated tables is > significantly reduced when emitting string literals, and the user can > grep the name tables without huge effort. The above table is now: > > extern const char HexagonInstrNameData[] = { > /* 0 */ "G_FLOG10\0" > /* 9 */ "ENDLOOP0\0" > /* 18 */ "V6_vdd0\0" > /* 26 */ "PS_vdd0\0" > /* 34 */ "V6_ld0\0" > /* 41 */ "V6_zld0\0" > [...] > }; > > My question then is: Is there a specific technical reason that we should > avoid emitting concatenated string literals rather array of > comma-separated char literals for "string-like" data? > > If not, I can probably post a patch, which I feel will make it much > easier to understand the output from tablegen, and helps compilation > speed of generated tables. > > Any thoughts appreciated. > > All the Best > > Luke > > -- > Codeplay Software Ltd. > Company registered in England and Wales, number: 04567874 > Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200114/b7b560a1/attachment.html>
Luke Drummond via llvm-dev
2020-Jan-15 18:41 UTC
[llvm-dev] [tablegen] table readability / performance
Hi On Tue Jan 14, 2020 at 8:27 PM, Reid Kleckner wrote:> I don't think there's any technical reason for the current structure. > Your > change sounds like an improvement to me. I'm curious if you have any > compile time measurements from this change.I can't find a good permanent place to dump my test files (they're too large to paste here, but the gist is): cd $BUILD awk ' /InstrNameData\[\] = / {printing = 1} (printing) {print} /};$/ {printing = 0} ' `grep InstrInfo.inc: build.ninja | awk '{print $2}' | tr -d :` > ~/after.cpp for both with and without my proposed fix. ...and then: clang++ before.cpp -S 0.59s user 0.06s system 96% cpu 0.668 total clang++ before.cpp -S 0.62s user 0.02s system 99% cpu 0.643 total clang++ before.cpp -S 0.61s user 0.03s system 99% cpu 0.639 total clang++ after.cpp -S 0.03s user 0.01s system 99% cpu 0.043 total clang++ after.cpp -S 0.03s user 0.02s system 99% cpu 0.049 total clang++ after.cpp -S 0.03s user 0.02s system 99% cpu 0.046 total g++ before.cpp -S 0.70s user 0.05s system 99% cpu 0.747 total g++ before.cpp -S 0.72s user 0.04s system 99% cpu 0.762 total g++ before.cpp -S 0.71s user 0.04s system 99% cpu 0.745 total g++ after.cpp -S 0.01s user 0.01s system 99% cpu 0.022 total g++ after.cpp -S 0.02s user 0.01s system 99% cpu 0.025 total g++ after.cpp -S 0.02s user 0.00s system 99% cpu 0.026 total clang is (Debian) version 8.0.1-4 (tags/RELEASE_801/final) gcc is (Debian 9.2.1-23) 9.2.1 20200110 Host is x86_64 Linux i7 I'll post the patch to Phabricator as soon as I'm able. Thanks for your input. Luke -- Codeplay Software Ltd. Company registered in England and Wales, number: 04567874 Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
Luke Drummond via llvm-dev
2020-Jan-15 18:52 UTC
[llvm-dev] [tablegen] table readability / performance
On Tue Jan 14, 2020 at 8:27 PM, Reid Kleckner wrote:> I don't think there's any technical reason for the current structure.Apparently [this](https://docs.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019) is a thing. This results in the following delight happening on Visual Studio 2015:> fatal error C1091: compiler limit: string exceeds 65535 bytes in lengthSo maybe another time... All the Best Luke -- Codeplay Software Ltd. Company registered in England and Wales, number: 04567874 Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF