James Henderson via llvm-dev
2019-Aug-29 09:19 UTC
[llvm-dev] 404s within LLVM documentation
Patrick, how long does the crawl take? I suspect if we fixed internal documentation links so that they point to local copies of documentation when building locally it would be quite quick (no actual idea though). That in turn would probably make it feasible to add to the existing documentation build bots, I think. James On Thu, 29 Aug 2019 at 03:47, Neil Nelson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Patrick, You have identified a good way to do this. Given it is likely > that the links are to files in a directory structure on a single server > with that file structure/path given by the link text, as we see in your > dead link list, and that in a good number, perhaps likely a large majority > of the cases, that the file names (less the directory path) are unique, > > It would be a fairly direct procedure to associate links by their file > name (less path) with file locations. The process would then update the > links for the correct paths, list links without an existing file, and list > dead links having more than one existing file with the same name. > > The frequency of that run would depend on the frequency of dead-link > discovery that the run could provide. > > Regards, Neil Nelson > On 8/28/19 7:52 PM, Patrick Nappa via llvm-dev wrote: > > Hi all, > > I'm currently in the process of updating the Kaleidoscope tutorials (first > and foremost, the ORC/BuildingAJIT ones), and I've noticed a fair few 404s > which are lingering within the current visible documentation. Some of these > don't seem to have linked to existing pages for a while. > > I was wondering if there was a way to set up a check in the buildbot to > ensure that documentation doesn't break between builds? I'm happy to fix > the current dead links I've found (see below) but thought it might be wise > to set up a more automated approach in the future. Does anyone have any > tips on how I'd go about doing this/if this should be set up at all? > > I ran a web crawler to find each of the dead links (this may not be > exhaustive), and they are as follows: > https://llvm.org/docs/TestSuiteMakefileGuide > https://llvm.org/docs/doxygen/structLICM.html > https://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression > https://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables > http://llvm.org/docs/lnt/modindex.html > > https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl6.html#user-defined-unary-operators > > https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl5.html#for-loop-expression > > https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl7.html#user-defined-local-variables > https://llvm.org/docs/tutorial/LangRef.html#instruction-reference > > https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl4.html#adding-a-jit-compiler > https://llvm.org/docs/tutorial/WritingAnLLVMPass.html > https://llvm.org/docs/tutorial/Passes.html > > https://llvm.org/docs/tutorial/ProgrammersManual.html#viewing-graphs-while-debugging-code > https://llvm.org/docs/tutorial/SourceLevelDebugging.html > https://llvm.org/docs/tutorial/Frontend/PerformanceTips.html > https://llvm.org/docs/tutorial/GetElementPtr.html > https://llvm.org/docs/tutorial/GarbageCollection.html > https://llvm.org/docs/tutorial/ExceptionHandling.html > https://www.llvm.org/docs/doxygen/structLICM.html > http://llvm.org/docs/TestSuiteMakefileGuide > http://llvm.org/docs/doxygen/structLICM.html > https://www.llvm.org/docs/TestSuiteMakefileGuide > http://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression > http://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables > > Some of these are trivial mistakes (i.e. > https://llvm.org/docs/tutorial/LangRef.html#instruction-reference -> > https://llvm.org/docs/LangRef.html#instruction-reference), and some > require a bit more inspection. > > Regards, > Patrick > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190829/fdf88c45/attachment-0001.html>
Patrick Nappa via llvm-dev
2019-Sep-01 10:33 UTC
[llvm-dev] 404s within LLVM documentation
> > It would be a fairly direct procedure to associate links by their file > name (less path) with file locations. The process would then update the > links for the correct paths, list links without an existing file, and list > dead links having more than one existing file with the same name. >and> Patrick, how long does the crawl take? I suspect if we fixed internal > documentation links so that they point to local copies of documentation > when building locally it would be quite quick (no actual idea though).That crawl was actually done on the live site, using the linkchecker tool. Doing it locally would indeed be much better, and it turns out Sphinx has a builtin tool for doing such a check (`cd llvm/docs && make -f Makefile.sphinx linkcheck`), but also checks external hyperlinks are reachable. Now, the runtime for this can be seriously reduced if we change all internal document links to actually point to internal document links (i.e. link to /docs/foo/bar, rather than https://llvm.org/docs/foo/bar, or llvm.org/docs/foo/bar - easily fixable), so as to avoid an internet check. I do believe we should check external links still, as having documentation link to nowhere can be jarring, however I don't think such crawls need to be as frequent. Cheers, Patrick On Thu, Aug 29, 2019 at 7:20 PM James Henderson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Patrick, how long does the crawl take? I suspect if we fixed internal > documentation links so that they point to local copies of documentation > when building locally it would be quite quick (no actual idea though). That > in turn would probably make it feasible to add to the existing > documentation build bots, I think. > > James > > On Thu, 29 Aug 2019 at 03:47, Neil Nelson via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Patrick, You have identified a good way to do this. Given it is likely >> that the links are to files in a directory structure on a single server >> with that file structure/path given by the link text, as we see in your >> dead link list, and that in a good number, perhaps likely a large majority >> of the cases, that the file names (less the directory path) are unique, >> >> It would be a fairly direct procedure to associate links by their file >> name (less path) with file locations. The process would then update the >> links for the correct paths, list links without an existing file, and list >> dead links having more than one existing file with the same name. >> >> The frequency of that run would depend on the frequency of dead-link >> discovery that the run could provide. >> >> Regards, Neil Nelson >> On 8/28/19 7:52 PM, Patrick Nappa via llvm-dev wrote: >> >> Hi all, >> >> I'm currently in the process of updating the Kaleidoscope tutorials >> (first and foremost, the ORC/BuildingAJIT ones), and I've noticed a fair >> few 404s which are lingering within the current visible documentation. Some >> of these don't seem to have linked to existing pages for a while. >> >> I was wondering if there was a way to set up a check in the buildbot to >> ensure that documentation doesn't break between builds? I'm happy to fix >> the current dead links I've found (see below) but thought it might be wise >> to set up a more automated approach in the future. Does anyone have any >> tips on how I'd go about doing this/if this should be set up at all? >> >> I ran a web crawler to find each of the dead links (this may not be >> exhaustive), and they are as follows: >> https://llvm.org/docs/TestSuiteMakefileGuide >> https://llvm.org/docs/doxygen/structLICM.html >> https://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression >> https://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables >> http://llvm.org/docs/lnt/modindex.html >> >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl6.html#user-defined-unary-operators >> >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl5.html#for-loop-expression >> >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl7.html#user-defined-local-variables >> https://llvm.org/docs/tutorial/LangRef.html#instruction-reference >> >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl4.html#adding-a-jit-compiler >> https://llvm.org/docs/tutorial/WritingAnLLVMPass.html >> https://llvm.org/docs/tutorial/Passes.html >> >> https://llvm.org/docs/tutorial/ProgrammersManual.html#viewing-graphs-while-debugging-code >> https://llvm.org/docs/tutorial/SourceLevelDebugging.html >> https://llvm.org/docs/tutorial/Frontend/PerformanceTips.html >> https://llvm.org/docs/tutorial/GetElementPtr.html >> https://llvm.org/docs/tutorial/GarbageCollection.html >> https://llvm.org/docs/tutorial/ExceptionHandling.html >> https://www.llvm.org/docs/doxygen/structLICM.html >> http://llvm.org/docs/TestSuiteMakefileGuide >> http://llvm.org/docs/doxygen/structLICM.html >> https://www.llvm.org/docs/TestSuiteMakefileGuide >> http://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression >> http://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables >> >> Some of these are trivial mistakes (i.e. >> https://llvm.org/docs/tutorial/LangRef.html#instruction-reference -> >> https://llvm.org/docs/LangRef.html#instruction-reference), and some >> require a bit more inspection. >> >> Regards, >> Patrick >> >> _______________________________________________ >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190901/ecbe7fed/attachment.html>
A practical way to proceed may be to have LLVM provide an html file list from their server by going to the top level https://llvm.org directory and executing the following command find . -name '*.htm?' > llvm.org_html_file_list giving all file names with parent directories for extensions with html or htm. It may be that there are multiple top directories of interest, such as one for https://clang.llvm.org/, that could also be put into their own file lists, though this is secondary at the moment. Having the name of that top level directory in each case may help or the top level web-page name could work. We just need to be sure the changes get back to the proper directory. Tar or zip the list(s) for easy download. The LLVM html files could then be downloaded to a local user's computer from the list using wget, the analysis done and the changes made. The changes could then be uploaded to https://bugs.llvm.org using diff files as patches or as LLVM directs. Without the file lists from LLVM for this local procedure, the only option would be to remove the html link tags for the dead-links, which removes an easy ability to make corrections, if can be done, to those links. This procedure be done by downloading the LLVM site's html pages through page links with wget. Since possibly useful information is lost with this procedure it is not likely a preferred option. The first option, without parent pages for the dead-links below, would tend to require the download of possibly all or most of the html files in the list in order to find those few of concern. Whether or not there are copyright or other issues with downloading large chunks of the LLVM site may be considered. There is an option in wget when downloading a site to change all the links to local files in a manner Patrick suggests that may obtain that objective. Considering the scale of that change it would best be done on the LLVM server in the manner of a copy with changes using wget and then directing a browser to the copy to see that result before going live. It may be the case that wget would not work or further link changes done with a program would be required. It would be easy to redirect back to the prior LLVM site if critical problems were found later. But the scale of this change suggests it would be done with more detailed consideration at LLVM as against the relatively few dead-link changes to this point identified that could be addressed with diff uploads. The option for writing a program for the dead-link analysis and changes seems less likely in that the programmer would need to write for an environment not immediately available to him and a program would not allow the more incremental and clear visibility of diff uploads. Regards, Neil Nelson On 9/1/19 4:33 AM, Patrick Nappa wrote:> > It would be a fairly direct procedure to associate links by their > file name (less path) with file locations. The process would then > update the links for the correct paths, list links without an > existing file, and list dead links having more than one existing > file with the same name. > > and > > Patrick, how long does the crawl take? I suspect if we fixed > internal documentation links so that they point to local copies of > documentation when building locally it would be quite quick (no > actual idea though). > > That crawl was actually done on the live site, using the linkchecker > tool. > > Doing it locally would indeed be much better, and it turns out Sphinx > has a builtin tool for doing such a check (`cd llvm/docs && make -f > Makefile.sphinx linkcheck`), but also checks external hyperlinks are > reachable. Now, the runtime for this can be seriously reduced if we > change all internal document links to actually point to internal > document links (i.e. link to /docs/foo/bar, rather than > https://llvm.org/docs/foo/bar, or llvm.org/docs/foo/bar > <http://llvm.org/docs/foo/bar> - easily fixable), so as to avoid an > internet check. I do believe we should check external links still, as > having documentation link to nowhere can be jarring, however I don't > think such crawls need to be as frequent. > > Cheers, > Patrick > > On Thu, Aug 29, 2019 at 7:20 PM James Henderson via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Patrick, how long does the crawl take? I suspect if we fixed > internal documentation links so that they point to local copies of > documentation when building locally it would be quite quick (no > actual idea though). That in turn would probably make it feasible > to add to the existing documentation build bots, I think. > > James > > On Thu, 29 Aug 2019 at 03:47, Neil Nelson via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Patrick, You have identified a good way to do this. Given it > is likely that the links are to files in a directory structure > on a single server with that file structure/path given by the > link text, as we see in your dead link list, and that in a > good number, perhaps likely a large majority of the cases, > that the file names (less the directory path) are unique, > > It would be a fairly direct procedure to associate links by > their file name (less path) with file locations. The process > would then update the links for the correct paths, list links > without an existing file, and list dead links having more than > one existing file with the same name. > > The frequency of that run would depend on the frequency of > dead-link discovery that the run could provide. > > Regards, Neil Nelson > > On 8/28/19 7:52 PM, Patrick Nappa via llvm-dev wrote: >> Hi all, >> >> I'm currently in the process of updating the Kaleidoscope >> tutorials (first and foremost, the ORC/BuildingAJIT ones), >> and I've noticed a fair few 404s which are lingering within >> the current visible documentation. Some of these don't seem >> to have linked to existing pages for a while. >> >> I was wondering if there was a way to set up a check in the >> buildbot to ensure that documentation doesn't break between >> builds? I'm happy to fix the current dead links I've found >> (see below) but thought it might be wise to set up a more >> automated approach in the future. Does anyone have any tips >> on how I'd go about doing this/if this should be set up at all? >> >> I ran a web crawler to find each of the dead links (this may >> not be exhaustive), and they are as follows: >> https://llvm.org/docs/TestSuiteMakefileGuide >> https://llvm.org/docs/doxygen/structLICM.html >> https://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression >> https://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables >> http://llvm.org/docs/lnt/modindex.html >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl6.html#user-defined-unary-operators >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl5.html#for-loop-expression >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl7.html#user-defined-local-variables >> https://llvm.org/docs/tutorial/LangRef.html#instruction-reference >> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl4.html#adding-a-jit-compiler >> https://llvm.org/docs/tutorial/WritingAnLLVMPass.html >> https://llvm.org/docs/tutorial/Passes.html >> https://llvm.org/docs/tutorial/ProgrammersManual.html#viewing-graphs-while-debugging-code >> https://llvm.org/docs/tutorial/SourceLevelDebugging.html >> https://llvm.org/docs/tutorial/Frontend/PerformanceTips.html >> https://llvm.org/docs/tutorial/GetElementPtr.html >> https://llvm.org/docs/tutorial/GarbageCollection.html >> https://llvm.org/docs/tutorial/ExceptionHandling.html >> https://www.llvm.org/docs/doxygen/structLICM.html >> http://llvm.org/docs/TestSuiteMakefileGuide >> http://llvm.org/docs/doxygen/structLICM.html >> https://www.llvm.org/docs/TestSuiteMakefileGuide >> http://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression >> http://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables >> >> >> Some of these are trivial mistakes (i.e. >> https://llvm.org/docs/tutorial/LangRef.html#instruction-reference >> -> https://llvm.org/docs/LangRef.html#instruction-reference), >> and some require a bit more inspection. >> >> Regards, >> Patrick >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190903/089ac216/attachment-0001.html>