Anton Korobeynikov via llvm-dev
2021-Dec-10 12:33 UTC
[llvm-dev] [cfe-dev] [Openmp-dev] Bugzilla migration is stopped again
Thanks for the try!>From the quick scan:1. There are no labels 2. Attachments are not real – they are just links to bugzilla and will be obsolete if bugzilla is e.g. down 3. Each attachment results in 2 comments, one of each is redundant 4. CC list is strange, e.g. https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to "mail.sandbox.de" 5. All text is in verbatim boxes (e.g. https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making it almost impossible to read due to horizontal scroll 6. There are no "depends on" / "blocks on" references (see https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900) 7. There are no cross-references in case of duplicates (see https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729) ... It's pretty straightforward to come to the present state and there are tools for this, we've been at this point in 2019 (see e.g. https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM DevMtg 2019 roundtable discussion). The non-trivial part is to workaround various GitHub issues which are also different depending on API used. On Fri, Dec 10, 2021 at 3:00 PM Arthur O'Dwyer <arthur.j.odwyer at gmail.com> wrote:> > On Sat, Dec 4, 2021 at 9:16 AM Arthur O'Dwyer <arthur.j.odwyer at gmail.com> wrote: >> >> On Sat, Dec 4, 2021 at 5:46 AM Anton Korobeynikov via cfe-dev <cfe-dev at lists.llvm.org> wrote: >>> >>> [...] >>> Surely, if the community will re-decide that these are unimportant >>> things we can push the existing code into a blank archive fairly >>> quickly. >> >> >> Please, test the above claim this week, on a blank repo. Let's actually find out whether it works, instead of relying on "Surely...". >> >> At this point I'm offering my own technical assistance, just to get the thing done and stop getting these emails every day. Send me your Bugzilla export script; I'll test it out this week on a blank repo, with the goal of mirroring a 100-bug subset of the LLVM Bugzilla publicly visible in https://github.com/Quuxplusone/LLVMBugzillaTest/ by EOW. > > > The promised EOW update: I have written Python scripts for the Export, Transform, and (dumbed-down, see below) Load stages of a bugzilla-to-github migration. You can find them at > https://github.com/Quuxplusone/BugzillaToGithub#bugzilla-to-github > and the resulting GitHub issues list (which is just partial, so far) lives at > https://github.com/Quuxplusone/LLVMBugzillaTest/issues > This is merely the result of five evenings of work, so e.g. the formatting of message bodies still isn't perfect, and as of this morning I'm aware of at least one bug (that GitHub's import API doesn't like a comment to have empty string as its `body`). And of course the biggest issue is that I was noodling around without special access to GitHub staff, who are the only people able to forge issue/comment authorship; so my script just puts everything under the username of the person-or-bot that runs it. I guarantee GitHub SRE can help with that. > > Arthur-- With best regards, Anton Korobeynikov Department of Statistical Modelling, Saint Petersburg State University
Arthur O'Dwyer via llvm-dev
2021-Dec-10 16:42 UTC
[llvm-dev] [cfe-dev] [Openmp-dev] Bugzilla migration is stopped again
On Fri, Dec 10, 2021 at 7:33 AM Anton Korobeynikov <anton at korobeynikov.info> wrote:> Thanks for the try! > > From the quick scan: > > 1. There are no labels >There are labels, but only according to the "keywords" field from Bugzilla. https://github.com/Quuxplusone/LLVMBugzillaTest/issues?q=is%3Aopen+is%3Aissue+label%3Aaccepts-invalid I agree it would make sense to apply more labels in Step 3 <https://github.com/Quuxplusone/BugzillaToGithub#step-3-process-each-xml-bug-into-githubs-json-schema> (e.g. according to the "Product" field). If you document the mapping somewhere, it would be trivial to add to my script and I could have 10,000 issues regenerated in about 3 hours. Also needed: the mapping from Bugzilla usernames to GitHub usernames.> 2. Attachments are not real – they are just links to bugzilla and will > be obsolete if bugzilla is e.g. down >Right. This is part of the "dumbed-down Load step", i.e. "take the actual data and munge it into the closest possible thing that can be loaded using the public API": GitHub's beta Issues Import API doesn't support adding files to issues. (Also, e.g., - forging authorship of comments is impossible using the public API - for cross-referencing to other issues, I'm currently using links back into the old Bugzilla's show_bug.cgi; but really these links should go to something like https://reviews.llvm.org/PR1234, which would be under our control and could be HTTP-redirected to their corresponding GitHub issues )> 3. Each attachment results in 2 comments, one of each is redundant >Ack. I wrote code to fix this for the very simplest "Created attachment 1234" auto-comments, but had not noticed that sometimes the auto-comment is more complicated. E.g. https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729#issuecomment-990590574 This wouldn't be hard to fix.> 4. CC list is strange, e.g. > https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to > "mail.sandbox.de" >That's partly an artifact of my lack of mapping from Bugzilla usernames to GitHub usernames (the relevant codepath <https://github.com/Quuxplusone/BugzillaToGithub/blob/822dbac/xml-to-json.py#L186-L195> is just a stub), but also something super weird...! The email addresses from Bugzilla show up in the XML when viewed in Chrome, but not when fetched in Python or curl. https://stackoverflow.com/questions/70307092/fetching-xml-from-bugzilla-gives-different-results-with-curl-versus-browser 5. All text is in verbatim boxes (e.g.> https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making > it almost impossible to read due to horizontal scroll >The monospace font is intentional on my part, and important even for https://bugs.llvm.org/show_bug.cgi?id=12092 because a big part of the initial comment is indented C++ code. However, I should implement linebreaking: looks like Bugzilla's website layout breaks around 84 characters, and 80 would be perfectly sensible. Will fix. 6. There are no "depends on" / "blocks on" references (see> https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900) >Ack. (This is an artifact of my not knowing that the <dependson> element exists. I should have thought to grep and get a list of all the tags that exist in the XML (that is, in the 51567 "xml/*.xml" files produced during Step 1 in the README <https://github.com/Quuxplusone/BugzillaToGithub#step-1-export-your-bugzilla-bugs-to-xml>), to make sure I understood each of them.) Will fix, at least for the <dependson> tag.> 7. There are no cross-references in case of duplicates (see > https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729) >Ack. I thought about mangling the duplicate-bug-number into the "Status" line, like Bugzilla does, but decided not to worry about it in the interest of being-done-by-my-self-imposed-EOW-deadline. :) There's also a harder issue on bug 10729's final comment <https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729#issuecomment-990590581>, where it says Yes, apparently I did. Sorry. I'll attach the logs to that issue instead :) *** This bug has been marked as a duplicate of bug 9072 <https://bugs.llvm.org/show_bug.cgi?id=9072> *** where we want that to be both monospaced *and* hyperlinked — Markdown can't do hyperlinks inside triple-backticks. The obvious solution is for the script to special-case Bugzilla's auto-comment and pull it outside of the triple-backticked section. I should grep for all the different Bugzilla auto-comments too. It looks like there are only three possible auto-comments: $ grep -hor '[*][*][*] .* [*][*][*]' xml/ > out $ sed 's/[0-9][0-9]*/9/g' out | sort | uniq -c | sort -rn | eyeballing-by-arthur 2563 *** Bug 9 has been marked as a duplicate of this bug. *** 2504 *** This bug has been marked as a duplicate of bug 9 *** 76 *** This bug has been marked as a duplicate of 9 *** ...> > It's pretty straightforward to come to the present state and there are > tools for this, we've been at this point in 2019 (see e.g. > https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM > DevMtg 2019 roundtable discussion). The non-trivial part is to > workaround various GitHub issues which are also different depending on > API used. >Nice! Yeah, steps 1, 2, 3 <https://github.com/Quuxplusone/BugzillaToGithub#step-1-export-your-bugzilla-bugs-to-xml> (Export and Transform) are possible for literally anyone to do — and also relatively *simple*, in that I wrote those scripts in a single week of evenings. :) Step 4 <https://github.com/Quuxplusone/BugzillaToGithub#step-4-import-your-json-bugs-into-github>, the Load step, is equally *simple* but requires special magic powers that only a GitHub SRE would have — e.g., forging comment authorship. If I were doing this migration for real, I'd ask what API they plan to use, and ask them to test it out on a blank repo in exactly the same way that you and I have now both done with https://github.com/asl/llvm-bugzilla/issues and https://github.com/Quuxplusone/LLVMBugzillaTest/issues That is, write the script that's going to be used, and then test it out, repeatedly, until it works perfectly... and then test once more, just for safety's sake, before doing it live. The mantras here are - "With enough eyeballs, all bugs are shallow" (we're both identified deficiencies in each other's scripts, and can now fix them!) - "Measure twice, cut once" (rehearse the entire deploy plan in blank repos until it's perfect, then do *only the perfect version* live) (Also, ideally, someone involved with LLVM would just get hired at GitHub, to cut down on round-trip time. But I'm not volunteering. ;)) –Arthur -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211210/d9a1c658/attachment.html>