Hi All, I noticed that it is quite common to find in papers mentions to ?R libraries? developed for the algorithms/models/code/whatever that is being described by the paper, so that third parties will be able to use said method for themselves. On further enquiries these libraries are not actually available on CRAN, but need to be requested from the devs. That is in itself does not seem a big issue, were it not for the fact most of the time I am in such situation the code is very specific for the environment of the developer, and does not actually work on any machine I try to run it on (something that is painfully true for code calling C/C++/Fortran). A second pattern I seem to have noticed is that, despite said libraries being advertised for general use in a *published* paper, when I raise the issue the library is not actually formally published and it does not actually work like a CRAN published library would, I get a vague ?the person who actually did the work left and nobody can maintain the code/fix stuff/finish the job?. As a referee I am trying to weed out what I see as malpractice: the promise that third parties outside the developers might actually use the code because it has been packaged as a R library, a claim that seems to boost publishing chances. Thus my question: when can I consider a library to be properly published and really publicly available? CRAN and BioConductor are clearly gold standards. What about Github? I am currently using the rule ?not on CRAN == outright rejection?. If Github is as good as CRAN I will include it on my list of ?the code is available in a functional state as claimed?. Finally, please note the scope of my query: I am not looking at those cases where a colleague gives me half finished code that might be useful but I need to sort out. I am looking at formal claims ?we have developed a method to do X and said method is available to the public as a R library?. If that is the claim I expect it to be true. Best F -- Federico Calboli LBEG - Laboratory of Biodiversity and Evolutionary Genomics Charles Deberiotstraat 32 box 2439 3000 Leuven +32 16 32 87 67
On Mon, Oct 2, 2017 at 7:47 AM, Federico Calboli <federico.calboli at kuleuven.be> wrote:> > Thus my question: when can I consider a library to be properly published and really publicly available? CRAN and BioConductor are clearly gold standards. What about Github? I am currently using the rule ?not on CRAN == outright rejection?. If Github is as good as CRAN I will include it on my list of ?the code is available in a functional state as claimed?.CRAN has certain rules that are necessary for CRAN to function but may not be necessary for a package to be useful (e.g. size of data in a non-data package, licensing, run time of examples etc). I would ask two things from developers of a new package: 1. package is available for download from somewhere public; 2. package passes R CMD check without errors or warnings. Possibly also an explanation why they cannot upload the package to CRAN or Bioconductor, but I would not make the acceptance by CRAN or Bioconductor a condition for publishing. Just my humble opinion. Peter
I tend to regard GitHub as a bit of wild west... anyone can upload anything there, working or not. CRAN packages at least have to compile so there is some additional verification in being there. GitHub does have the advantage that you can easily download it and run an example if the authors have set up such scaffolding... which is better than "it ran once on that laptop that died". However, there is a distinct extra level of sophistication involved in getting researchers to make those examples or test cases beyond their mainline code, and nothing about GitHub requires that such features be present in uploaded code. -- Sent from my phone. Please excuse my brevity. On October 2, 2017 7:47:35 AM PDT, Federico Calboli <federico.calboli at kuleuven.be> wrote:>Hi All, > >I noticed that it is quite common to find in papers mentions to ?R >libraries? developed for the algorithms/models/code/whatever that is >being described by the paper, so that third parties will be able to use >said method for themselves. On further enquiries these libraries are >not actually available on CRAN, but need to be requested from the devs. > > >That is in itself does not seem a big issue, were it not for the fact >most of the time I am in such situation the code is very specific for >the environment of the developer, and does not actually work on any >machine I try to run it on (something that is painfully true for code >calling C/C++/Fortran). A second pattern I seem to have noticed is >that, despite said libraries being advertised for general use in a >*published* paper, when I raise the issue the library is not actually >formally published and it does not actually work like a CRAN published >library would, I get a vague ?the person who actually did the work left >and nobody can maintain the code/fix stuff/finish the job?. > >As a referee I am trying to weed out what I see as malpractice: the >promise that third parties outside the developers might actually use >the code because it has been packaged as a R library, a claim that >seems to boost publishing chances. > >Thus my question: when can I consider a library to be properly >published and really publicly available? CRAN and BioConductor are >clearly gold standards. What about Github? I am currently using the >rule ?not on CRAN == outright rejection?. If Github is as good as CRAN >I will include it on my list of ?the code is available in a functional >state as claimed?. > >Finally, please note the scope of my query: I am not looking at those >cases where a colleague gives me half finished code that might be >useful but I need to sort out. I am looking at formal claims ?we have >developed a method to do X and said method is available to the public >as a R library?. If that is the claim I expect it to be true. > >Best > >F > > > > >-- >Federico Calboli >LBEG - Laboratory of Biodiversity and Evolutionary Genomics >Charles Deberiotstraat 32 box 2439 >3000 Leuven >+32 16 32 87 67 > > > > > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
> On 2 Oct 2017, at 16:47, Federico Calboli <federico.calboli at kuleuven.be> wrote: > .....> As a referee I am trying to weed out what I see as malpractice: the promise that third parties outside the developers might actually use the code because it has been packaged as a R library, a claim that seems to boost publishing chances. > > Thus my question: when can I consider a library to be properly published and really publicly available? CRAN and BioConductor are clearly gold standards. What about Github? I am currently using the rule ?not on CRAN == outright rejection?. If Github is as good as CRAN I will include it on my list of ?the code is available in a functional state as claimed?. >As others have suggested: I would insist that code is presented as valid R package which the maker has at least checked with R CMD check with no errors (preferably with the --as-cran option). In addition I would also insist that packages have been sent to the winbuilder and passed all checks without error or warning. Berend Hasselman
Here's my view on this: CRAN = Comprehensive R Archive Network. The "Archive" part is very important - it "promises" the research community that R packages that have ever been published on CRAN, and all the versions of each package, will be available also in the future. It requires quite a bit for a package/code to disappear from CRAN, e.g. a package contains code/data that is not allowed to be shared (due to licenses and copyrights). Not even the original developer/maintainer can remove a package that has already been released on CRAN. What we see at times, a package is "archived" on CRAN (i.e. no longer available via install.packages()), but the old package versions are still distributed. That CRAN protects us this way is extremely valuable to the research community, open science, and reproducible research. The Bioconductor has a similar philosophy. However convenient GitHub / GitLab / ... is for development etc, it certainly does not provide scientific archiving - in that sense it is no different than sharing packages on Dropbox, Google Drive, etc. /Henrik On Mon, Oct 2, 2017 at 10:25 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I tend to regard GitHub as a bit of wild west... anyone can upload anything there, working or not. CRAN packages at least have to compile so there is some additional verification in being there. > > GitHub does have the advantage that you can easily download it and run an example if the authors have set up such scaffolding... which is better than "it ran once on that laptop that died". However, there is a distinct extra level of sophistication involved in getting researchers to make those examples or test cases beyond their mainline code, and nothing about GitHub requires that such features be present in uploaded code. > -- > Sent from my phone. Please excuse my brevity. > > On October 2, 2017 7:47:35 AM PDT, Federico Calboli <federico.calboli at kuleuven.be> wrote: >>Hi All, >> >>I noticed that it is quite common to find in papers mentions to ?R >>libraries? developed for the algorithms/models/code/whatever that is >>being described by the paper, so that third parties will be able to use >>said method for themselves. On further enquiries these libraries are >>not actually available on CRAN, but need to be requested from the devs. >> >> >>That is in itself does not seem a big issue, were it not for the fact >>most of the time I am in such situation the code is very specific for >>the environment of the developer, and does not actually work on any >>machine I try to run it on (something that is painfully true for code >>calling C/C++/Fortran). A second pattern I seem to have noticed is >>that, despite said libraries being advertised for general use in a >>*published* paper, when I raise the issue the library is not actually >>formally published and it does not actually work like a CRAN published >>library would, I get a vague ?the person who actually did the work left >>and nobody can maintain the code/fix stuff/finish the job?. >> >>As a referee I am trying to weed out what I see as malpractice: the >>promise that third parties outside the developers might actually use >>the code because it has been packaged as a R library, a claim that >>seems to boost publishing chances. >> >>Thus my question: when can I consider a library to be properly >>published and really publicly available? CRAN and BioConductor are >>clearly gold standards. What about Github? I am currently using the >>rule ?not on CRAN == outright rejection?. If Github is as good as CRAN >>I will include it on my list of ?the code is available in a functional >>state as claimed?. >> >>Finally, please note the scope of my query: I am not looking at those >>cases where a colleague gives me half finished code that might be >>useful but I need to sort out. I am looking at formal claims ?we have >>developed a method to do X and said method is available to the public >>as a R library?. If that is the claim I expect it to be true. >> >>Best >> >>F >> >> >> >> >>-- >>Federico Calboli >>LBEG - Laboratory of Biodiversity and Evolutionary Genomics >>Charles Deberiotstraat 32 box 2439 >>3000 Leuven >>+32 16 32 87 67 >> >> >> >> >> >>______________________________________________ >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.