Ivan Krylov
2023-Aug-05 20:52 UTC
[Rd] HTML documentation check works best with Tidy >= 5.0.0
Hello R-devel, Old versions of HTML Tidy report false positive NOTEs for the HTML verison of the manual where Tidy encounters HTML5 features it is not ready for. Conveniently, both HTML5 support and release version numbers officially appeared in HTML Tidy version 5.0.0 [*]. For example, the last version of the "old Tidy" I could find fails on an R help page: cvs -z3 -d:pserver:anonymous at tidy.cvs.sourceforge.net:/cvsroot/tidy \ co -P tidy cd tidy make -C build/gmake bin/tidy -v # HTML Tidy for Linux released on 25 March 2009 R-devel CMD Rdconv .../R-devel/src/library/stats/man/lm.Rd -t html | \ bin/tidy >/dev/null # line 4 column 1 - Warning: <link> inserting "type" attribute # line 12 column 1 - Warning: <script> proprietary attribute "onload" # line 12 column 1 - Warning: <script> inserting "type" attribute # line 17 column 1 - Warning: <table> lacks "summary" attribute # line 44 column 1 - Warning: <table> lacks "summary" attribute # line 200 column 1 - Warning: <table> lacks "summary" attribute # <...> On the other hand, the oldest released version of the Tidy-HTML5 handles Rd-produced HTML correctly: git clone https://github.com/htacg/tidy-html5 cd tidy-html5 git checkout 5.0.0 mkdir b5.0.0 cd b5.0.0 cmake .. cmake --build . ./tidy -v # HTML Tidy for Linux version 5.0.0 R-devel CMD Rdconv .../R-devel/src/library/stats/man/lm.Rd -t html | \ ./tidy >/dev/null # Info: Document content looks like HTML5 # No warnings or errors were found. # <...> We can use this information to only use HTML Tidy versions that support the idioms used by Rd2HTML: --- check.R (revision 84834) +++ check.R (working copy) @@ -5040,7 +5040,7 @@ t1 <- proc.time() if(i1) { ## validate - ## require HTML Tidy, and not macOS's ancient version. + ## require HTML5 Tidy, and not macOS's ancient version. msg <- "" Tidy <- Sys.getenv("R_TIDYCMD", "tidy") OK <- nzchar(Sys.which(Tidy)) @@ -5048,10 +5048,8 @@ ver <- system2(Tidy, "--version", stdout = TRUE) OK <- startsWith(ver, "HTML Tidy") if(OK) { - OK <- !grepl('Apple Inc. build 2649', ver) - if(!OK) msg <- ": 'tidy' is Apple's too old build" - ## Maybe we should also check version, - ## but e.g. Ubuntu 16.04 does not show one. + OK <- grepl('version 5.\\d+\\.\\d+', ver) + if(!OK) msg <- ": 'tidy' does not appear to be version 5" } else msg <- ": 'tidy' is not HTML Tidy" } else msg <- ": no command 'tidy' found" if(OK) { (This is just one way to solve the problem. Instead, we could discard versions that say "released on <date>", or try to parse the version specification and only discard it if (a) we can't parse it or (b) it's below 5.0.0.) With the patch applied, I get: PATH=.../tidy/bin:"$PATH" _R_CHECK_RD_VALIDATE_RD2HTML_=TRUE \ R-devel CMD check $package.tar.gz # * checking HTML version of manual ... NOTE # Skipping checking HTML validation: 'tidy' does not appear to be # version 5 PATH=.../tidy-html5/b5.0.0/:"$PATH" _R_CHECK_RD_VALIDATE_RD2HTML_=TRUE \ R-devel CMD check $package.tar.gz # * checking HTML version of manual ... OK -- Best regards, Ivan [*] There are commits in the tidy-html5 repo containing versions marked 4.x.x, but they aren't tagged and weren't considered an official release, as far as I know.