Dear all, I am beginner using R. I have a question about it. When you use it, since it is written by so many authors, how do you know that the results are trustable?(I don't want to affend anyone, also I trust people). But I think this should be a question. Thanks, Ming
How do you know that any results from any software package are trustable? I'm not sure that the number of authors has anything to do with it. If you are extremely paranoid, you can reprogram everything you do a few times in a large number of completely different languages written by different people, and top it off with hand calculations. Then you should do this across 4-5 operating systems with different core libraries. I'm somewhat joking in the second paragraph, but very serious in the first. How and why do YOU trust software? What criteria fit? Perhaps a better question would be to ask by what criteria people use to "trust" software, using R as an illustration. best, -tony p.s. R does satisfy a good part of the second paragraph, at least for a critical subset of the language. On Wed, 26 Jan 2005 23:09:51 -0600, msck9 at mizzou.edu <msck9 at mizzou.edu> wrote:> Dear all, > I am beginner using R. I have a question about it. When you use it, > since it is written by so many authors, how do you know that the > results are trustable?(I don't want to affend anyone, also I trust > people). But I think this should be a question. > > Thanks, > Ming > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- best, -tony "Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes" (AJR, 4Jan05). A.J. Rossini blindglobe at gmail.com
What makes you trust any software? There are some obvious points. First of all the code is open so if you know enough you can actually read the code and make sure it does what you want. Secondly you can replicate a process using two pieces of software and compare the results. You can check the archives and you will find a number of posts that talk about the results produced by R and how they compare with other software. Typically R versus Excel or R versus SPSS / SAS. Just be careful as different answers does not automatically mean one is wrong, and it certainly doesn't mean R is wrong. Excel computes =ROUND(2.5,0) to be 3 R computes round(2.5) to be 2 As I understand it both are right, they are just using different standards. I however have always used the latter and rounded to the even number where the figure to be rounded lies exactly at the halfway mark. Hang around this list for a short time and it will become evident that if this software didn't work; the people using it would have stopped using it long ago. Forget the commercial versus open software arguments that raise their head from time to time. The question is how well a piece of software is written / maintained & supported and not issues of payment or the greater good. There is some woeful freeware, just as there is some woeful commercial products. The pedigree of the contributors to the base package is hard to beat. I wouldn't know the pedigree of those who write the other stats programmes, but I assume that R contributors are right in there, with the best. As to packages. They must vary with quality, and people do make mistakes. If you have something that in modern parlance is "mission critical" it wouldn't matter which product you had, you would test it to see that it fitted your requirements. You have raised a question that is often ignored or assumed. But to really know the answer for yourself you need to test it yourself or rely upon others that you trust. Whenever I start using a package I make sure it does not just what it states it can do, but also that it does what I want it to do. Tom> -----Original Message----- > From: msck9 at mizzou.edu [mailto:msck9 at mizzou.edu] > Sent: Thursday, 27 January 2005 1:10 PM > To: r-help at stat.math.ethz.ch > Subject: [R] A "rude" question > > > Dear all, > I am beginner using R. I have a question about it. When you use it, > since it is written by so many authors, how do you know that the > results are trustable?(I don't want to affend anyone, also I trust > people). But I think this should be a question. > > Thanks, > Ming > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
On Wednesday 26 January 2005 21:09, msck9 at mizzou.edu wrote:> Dear all, > I am beginner using R. I have a question about it. When you use it, > since it is written by so many authors, how do you know that the > results are trustable?(I don't want to affend anyone, also I trust > people). But I think this should be a question. >Almost all software - generally all "important" software - is has numerous authors. Windows has hundreds, perhaps thousand of coders. So too does Unix. The big difference between open source and closed source is not in the number of authors. Rather it is in the open availability of the code. Arguably, if there is sufficient interest in an open source project, studies have indicated that the code is likely to be superior to that of a comparable closed source program. This a probability though, not a natural law. If you are concerned about the trustworthiness of R, then perhaps the best gauge is that some of our favorite if occasionally curmudgeonly authors on this list are also experts in S and S-Plus, the proprietary, closed source language of which R is also a dialect. They evidently know what they're doing and work comfortably in both domains. If you compare statistical results using R and Excel, there is no question that R is superior, but that will also be true if you tested Excel against S-Plus, or SAS, or NCSS - all proprietary programs, or any number of other closed and open source programs designed to do statistical analyses. At the same time just about any spreadsheet, open or closed source will also suffer in a similar comparison. If you want a more information about the safety of Excel I would suggest this site: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html Read the various links. Beyond this there is a broad literature available on the risks and benefits of open and close source programs. Read it. JWDougherty
When Haydn was asked about his 100+ symphonies he is reputed to have replied "sunt mala bona mixta" which is kind of dog latin for "There are good ones and bad ones all mixed together". It's certainly the same with R packages so to continue the latin motif: "caveat emptor" The R engine, on the other hand, is pretty well uniformly excellent code but you have to take my word for that. Actually, you don't. The whole engine is open source so, if you wish, you can check every line of it. If people were out to push dodgy software, this is not the way they'd go about it. Bill Venables. -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of msck9 at mizzou.edu Sent: Thursday, 27 January 2005 3:10 PM To: r-help at stat.math.ethz.ch Subject: [R] A "rude" question Dear all, I am beginner using R. I have a question about it. When you use it, since it is written by so many authors, how do you know that the results are trustable?(I don't want to affend anyone, also I trust people). But I think this should be a question. Thanks, Ming ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi I don't know if you are asking the question for the same reasons I did, but recently (and ongoing) we have been required to adopt an internationally recognised standard. Being in the bioinformatics field, where open-source software is the beating heart of cutting edge research, we have obviously had to ask ourselves that exact question - "How can we be sure the software we use works?". In science, this doesn't just apply to software though. When someone publishes a paper, how can any of us be sure they did what they said they did? Or that their methods are the correct ones to use? Luckily, there is a two word answer that we hope will satisfy our auditors, and that is "Peer Review". In the context of R, I would say that you could put a confidence measure on any package based on the number of people who use it; the more people who use a package, the more likely they are to find and remove bugs. I won't get into the "open source" vs "commercial" argument, but put simply, all software has bugs at some stage, no matter who has written it. Given that fact, I prefer the code to be open so I can see them, not closed so that I can't. The fact that we can see all code relating to R is surely the biggest quality measure of all? Cheers Mick -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of msck9 at mizzou.edu Sent: 27 January 2005 05:10 To: r-help at stat.math.ethz.ch Subject: [R] A "rude" question Dear all, I am beginner using R. I have a question about it. When you use it, since it is written by so many authors, how do you know that the results are trustable?(I don't want to affend anyone, also I trust people). But I think this should be a question. Thanks, Ming ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Ming,> results are trustable?(I don't want to affend anyone, also I trust > people).Years ago I read about a simplified formula to answer whether I trust someone, and in turn, something: Trustworthiness = Competence + Character. I think a bit of research, as the other R-help posters have so comprehensively covered in their replies to your original question, will convince you or anyone else you need to convince that the R-core team and the core product of R itself rates at the top of the scale on both character and competence. Packages of course will not be as consistently high in the trustworthiness continuum, but rest assured there are several that are high, which again, you can verify yourself for your and/or your audience's needs. Best Regards, Bill ------------------------------- Bill Pikounis, PhD Nonclinical Statistics Centocor, Inc. 200 Great Valley Parkway MailStop C4-1 Malvern, PA 19355 610 240 8498 fax 610 651 6717> -----Original Message----- > From: msck9@mizzou.edu [mailto:msck9@mizzou.edu] > Sent: Thursday, January 27, 2005 12:10 AM > To: r-help@stat.math.ethz.ch > Subject: [R] A "rude" question > > > Dear all, > I am beginner using R. I have a question about it. When you use it, > since it is written by so many authors, how do you know that the > results are trustable?(I don't want to affend anyone, also I trust > people). But I think this should be a question. > > Thanks, > Ming > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >[[alternative HTML version deleted]]
Ming, You have received a number of excellent replies to your question and should really consider them. Here is another point--really extending a Bill Venables comment: "If people were out to push dodgy software, this is not the way they'd go about it." Definitely! Look at the requirements for submitting a package to R. While the required documentation and uniform approach mandated do not automatically equate to V&V'ed code it is a strong indication of commitment of the R core and contributing communities. The imposition of these standards by the core team and the time committed to the project vis-a-vis development, the help list, etc. speaks volumes about the quality of R. Rest assured such commitment is not the norm. That being said, I do respectfully disagree with Dr. Rossini in one minor detail ;O). It is not 'extremely paranoid' to re-code in another language and definitely not so to do hand calculations! Murphy's Law is relentless in all matters! If you are like most of us (all of us?) you will find errors in your own coding and maybe rarely an R bug. BTW, since you are starting out in R...voraciously read the documentation, helplist, newletter, and other free and commercial material on R, work thru the examples relevant to you area of endeavor, read more, code more, read more, code more, read more, code more.... The facility with R that you gain as a result will reward you multifold down the road. Best regards, Michael Grant P.S. Whenever you upgrade R, read the CHANGES, NEWS files, etc. R does evolve--even the core--although it is very controlled and managed. (You will learn of bugfixes there too.) --- msck9 at mizzou.edu wrote:> Dear all, > I am beginner using R. I have a question about it. > When you use it, > since it is written by so many authors, how do you > know that the > results are trustable?(I don't want to affend > anyone, also I trust > people). But I think this should be a question. > > Thanks, > Ming > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
I'm in agreement with Tom with respect to all the points he made but two in particular: Open code: very useful and much easier (than other software) to make sure the trustworthiness of the function/library. I often do go into the code and make sure this is what I want and it is a good way to find out the "meaning" of certain parts of the output and to learn others' programming tricks. And that's the power of R. Pedigree of the contributors: top-notch. I remember finding a "bug" (having to do with detecting heteroscedasticity) in SAS back in the early 90s and communicated to a SAS tech. SAS was considered the industry's standard back then, but contributed mostly by professonal programmers. In comparison, R's libraries are contributed by statisticians who are at the forefront of statistical methods research. Tim ---- Original message ---->Date: Thu, 27 Jan 2005 14:15:31 +0800 >From: "Mulholland, Tom" <Tom.Mulholland at dpi.wa.gov.au> >Subject: RE: [R] A "rude" question >To: <msck9 at mizzou.edu>, <r-help at stat.math.ethz.ch> > >What makes you trust any software? > >There are some obvious points. First of all the code is openso if you know enough you can actually read the code and make sure it does what you want. Secondly you can replicate a process using two pieces of software and compare the results. You can check the archives and you will find a number of posts that talk about the results produced by R and how they compare with other software. Typically R versus Excel or R versus SPSS / SAS. Just be careful as different answers does not automatically mean one is wrong, and it certainly doesn't mean R is wrong.> >Excel computes =ROUND(2.5,0) to be 3 >R computes round(2.5) to be 2 > >As I understand it both are right, they are just usingdifferent standards. I however have always used the latter and rounded to the even number where the figure to be rounded lies exactly at the halfway mark.> >Hang around this list for a short time and it will becomeevident that if this software didn't work; the people using it would have stopped using it long ago.> >Forget the commercial versus open software arguments thatraise their head from time to time. The question is how well a piece of software is written / maintained & supported and not issues of payment or the greater good. There is some woeful freeware, just as there is some woeful commercial products.> >The pedigree of the contributors to the base package is hardto beat. I wouldn't know the pedigree of those who write the other stats programmes, but I assume that R contributors are right in there, with the best.> >As to packages. They must vary with quality, and people domake mistakes. If you have something that in modern parlance is "mission critical" it wouldn't matter which product you had, you would test it to see that it fitted your requirements.> >You have raised a question that is often ignored or assumed.But to really know the answer for yourself you need to test it yourself or rely upon others that you trust. Whenever I start using a package I make sure it does not just what it states it can do, but also that it does what I want it to do.> >Tom > > >> -----Original Message----- >> From: msck9 at mizzou.edu [mailto:msck9 at mizzou.edu] >> Sent: Thursday, 27 January 2005 1:10 PM >> To: r-help at stat.math.ethz.ch >> Subject: [R] A "rude" question >> >> >> Dear all, >> I am beginner using R. I have a question about it. Whenyou use it,>> since it is written by so many authors, how do you knowthat the>> results are trustable?(I don't want to affend anyone, alsoI trust>> people). But I think this should be a question. >> >> Thanks, >> Ming >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html
One point that did not get mentioned in this discussion, and I believe deserves much more publicity, is the impact of packages tests. The design of the package system allows package developers to put tests in packages, and these are checked regularly (see <http://cran.at.r-project.org/contrib/checkSummary.html>). These are intended to test the package functionality, but also give R what is perhaps the largest test suite of any statistical software (certainly the most quickly growing). While any single package's test will never guarantee that the package works perfectly, the ensemble goes a long way toward ensuring that core R functionality behaves as intended. It seems unlikely to me that any commercial effort will ever be able to catch up. There are several ways that tests can add to our confidence that calculations can be trusted. They can - check against theoretical results - check against published results - check against results from other software - check that calculations done in different ways give the same result - check that monte carlo experiments give distributions that are consistent with expected results Some of these are relatively time consuming to set up and check the first time, but after that they can be automatic. If you have particular calculations with specific packages that you are especially concerned about, I encourage you to participate by devising good tests and sending them to the package developers. (But first check the tests they are already doing in the package tests directory.) Paul Gilbert msck9 at mizzou.edu wrote:>Dear all, > I am beginner using R. I have a question about it. When you use it, > since it is written by so many authors, how do you know that the > results are trustable?(I don't want to affend anyone, also I trust > people). But I think this should be a question. > > Thanks, > Ming > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > >
Paul Gilbert wrote:> The design of the package system allows package developers to put > tests in packages,> and these are checked regularly (see ...This link should have been <http://cran.at.r-project.org/src/contrib/checkSummary.html>. Paul Gilbert