Eric Lecoutre
2004-Nov-24 13:47 UTC
[Rd] Suggestions for packages / help / index (long mail)
Hi R-users and developers, This month may have seen one of the biggest thread never seen on R-related mailing lists, the one about "GPL software" and "hidden costs" (at this day, thread is still open - and active!). Lot's of mails in this thread are not really relevant to the original mail, send by Philippe Grosjean. Nevertheless, most of the mails are of interest and one of my conclusions was that there is a real need in "help/index relating" stuff. I have spent some times thinking about it. As everybody, I end up with: "this is not an easy problem at all" and "what we have *is* still very great". Indeed! What you will find now is a sketch of thoughts/proposals . I tend to think some of those proposals are "low-cost" and could improve the life of R beginners. First, I have to say I will put myself in the situation of a really beginner (say a first classes student): A user who has practiced for some years will find easier to crawl all the rich available material. His experiment will help him find easily the package relevant to his problem, the function, has learned to use help.search() and so on. And he will wisely use R-help, following the guideline. On the contrary, a beginneR will have more and more difficulties entering R world, as this one is constantly growing (leading to the famous supposed "hidden costs"). Appropriate poweR is not easy, specially if your daily task is specialized: you will have difficulties digging into all material to find those nuggets that will help you (and thanks to the community, there are so many nuggets... it may be hard to choose between gold or platine). What we have for now is a document listing keywords. Advanced user will know those keywords are to be used by package maintainer, feeding the help system building chain. This keyword database is very pertinent. It's content, which has been inherited in part from S, has previously beeing carefuly worked out. And that works well (try help.search("graphs") will provide you very interesting stuff - provided you have some packages installed...). I think that this keywords list may even have more uses. 1. As R community growths, it may be time to add some terms in this keywords list. Think about SciViews bundle on which Philippe is working. Most package in it are linked to GUI-stuff. Wouldn't the keyword GUI be useful? It could be worth offering for one month to the community the ability to suggest new entries (I am also thinking about econometry stuff). Then, R core team would choose if candidates are eligible or not. 2. DESCRIPTION files for packages may have a new field: keywords, allowing the author to add keywords to it's package (minimum one). Here are some things we could end up: package keyword(s) --------------------------------------------- abind Basics, manip, array accuracy Statistics acepack Statistics, regression adapt Mathematics ade4 multivariate ... 3. Package keywords could be used to propose "automatic" bundles and/or lists of package (consider for that keywords as categories). Thus, CRAN sites could have a listing of all packages, but also a listing of all packages related to Mathematics, to multivariate (statistics) and so on. And one could propose to install a whole bunch of packages at one time. Thus (and provided the existence of adequate keywords), the beginner interested in multivariate statistics would easily install his R with adequate starting package. Same for econometrics, geostatistics, and any other field of application. 4. What would really be useful then (I think) is a sort of PACKAGES_INDEX that would come with R. Explanation: one package index would be it's keywords (with a high weight) plus all it's functins and their associated keywords functions (lower weights). When downloading and installing the newest R, there would be an flat text file containg that (not so so ...so big). We could also add a function that will refresh this file. 5. Then, we could update "help.search", that would begin to list information on "installed packages" PLUS potentially suggest other packages available on CRAN. 6. Final point has already been discussed in the past. It is about misc packages and pieces of code. I propose the creation of 5 packages: - miscGraphics (keywords: misc, Graphics) - miscStatistics (keywords: misc, Statistics) - miscMathematics (keywords: misc, Mathematics) - miscBasics (keywords: misc, Basics) - miscProgramming (keywords: misc, Programming) With what I proposed before, they would be accessible as a bunch selecting package for categroy "misc" and each would also be listed in it's category ("Graphics",...). Each of those package would have a maintainer and a new mailing list (say R-misc) could be set up to talk about pieces of code that could enter such or such package. Yes, I am volonteer to maintain one of those. There is some work here for all 6 points, but not so much. What is great is that we already have most of the necessary stuff. And we only use KEYWORDS file... Please let me know what you think about those suggestions. If there is interest, I may ask for others volonteers to set one or more of those suggestions. Eric Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 lecoutre@stat.ucl.ac.be http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte
Gabor Grothendieck
2004-Nov-24 15:06 UTC
[Rd] Suggestions for packages / help / index (long mail)
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: : 6. Final point has already been discussed in the past. It is about misc : packages and pieces of code. I propose the creation of 5 packages: : - miscGraphics (keywords: misc, Graphics) : - miscStatistics (keywords: misc, Statistics) : - miscMathematics (keywords: misc, Mathematics) : - miscBasics (keywords: misc, Basics) : - miscProgramming (keywords: misc, Programming) Rather than preset the categories perhaps evolving them would be better, just starting out with a single Misc package and then decomposing it into multiple packages as the categories become clear.