Hi All I am really very interested in starting to use R within our company. I particularly like the open source nature of the product. My company is a medical research company which is part of the University of London. We conduct contract virology research for large pharma companies. My question is how do we validate this software? I wonder if anyone else has had the problem and might be able to comment. Thanks Rob Robert Lambkin BSc (Hon's), MRPharmS, PhD Director and General Manager Retroscreen Limited Retroscreen Virology Limited The Medical Building, Queen Mary, University of London, 327 Mile End Road, London, E1 4NS Tel: 020 7882 7624 Fax: 020 7882 6990 (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Retroscreen Virology Limited will not be liable for any action taken in reliance on the data contained in this e-mail as it may not have been quality assessed or assured.
The national institute of standards and technology offers reference data sets and expected results for various statistical procedures using these data sets. From the web site: "The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software." http://www.itl.nist.gov/div898/strd/ -------Original Message------- From: "Pikounis, Bill" <v_bill_pikounis at merck.com> Sent: 04/17/03 07:53 AM To: 'Rob Lambkin' <r.lambkin at retroscreen.com>, r-help at stat.math.ethz.ch Subject: RE: [R] Validation of R> > Hi Rob,> We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment.Notwithstanding the disclaimer automatically appended below by my "Big Pharma" member IT dept on all my email sends, I have not had this problem, or perhaps more accurately, not allowed it to be a problem when work I have done in R has made it into drug application filings and responses to FDA and European regulatory agencies. "Validation" of software is an ill-defined concept, so I am afraid I cannot offer anything like a concrete "how-to", not would I be surprised if anyone else can. What I would like to suggest is to (1) ask your vendor companies what specifically they are concerned about, (2) benchmark some guidelines on how you all or others have "validated" other software. If you are looking for extensive documentation on whats/hows/whys of R, it already has it. If you are looking for it to compute the same values as "validated" software within realistic numeric accuracy for your procedures, that is straightforward to do. And the ultimate key is that anyone can look at the source code and have a high probability to get it to run on any reasonably current system, and even many systems not so current. On a visible, continuous (daily), *OPEN* basis, there is ongoing review and input from the R user community, as well as all the highest standards of software engineering that are met by the R core team and other developers. R clearly stands up to rigorous, scholastic scrutiny. In my very grateful view, this makes R at least as reliable as commercial vendor software that claims "validation" or "compliance", etc., ...and probably, more reliable. Hope that helps. Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis at merck.com Phone: 732 594 3913 Fax: 732 594 1565> -----Original Message----- > From: Rob Lambkin [mailto:r.lambkin at retroscreen.com] > Sent: Thursday, April 17, 2003 4:51 AM > To: r-help at stat.math.ethz.ch > Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann > Subject: [R] Validation of R > > > Hi All > > I am really very interested in starting to use R within our company. I > particularly like the open source nature of the product. My > company is a > medical research company which is part of the University of London. > > We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment. > > Thanks > > Rob > > > Robert Lambkin BSc (Hon's), MRPharmS, PhD > Director and General Manager > Retroscreen Limited > Retroscreen Virology Limited > The Medical Building, Queen Mary, University of London, 327 Mile End > Road, London, E1 4NS > Tel: 020 7882 7624 Fax: 020 7882 6990 > (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) > > The information contained in this message is confidential and is > intended for the addressee(s) only. If you have received this > message in > error or there are any problems please notify the originator > immediately. > > Unauthorised use, disclosure, copying or alteration of this message is > strictly forbidden. > > Retroscreen Virology Limited will not be liable for any > action taken in > reliance on the data contained in this e-mail as it may not have been > quality assessed or assured. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help>
Hi Rob,> We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment.Notwithstanding the disclaimer automatically appended below by my "Big Pharma" member IT dept on all my email sends, I have not had this problem, or perhaps more accurately, not allowed it to be a problem when work I have done in R has made it into drug application filings and responses to FDA and European regulatory agencies. "Validation" of software is an ill-defined concept, so I am afraid I cannot offer anything like a concrete "how-to", not would I be surprised if anyone else can. What I would like to suggest is to (1) ask your vendor companies what specifically they are concerned about, (2) benchmark some guidelines on how you all or others have "validated" other software. If you are looking for extensive documentation on whats/hows/whys of R, it already has it. If you are looking for it to compute the same values as "validated" software within realistic numeric accuracy for your procedures, that is straightforward to do. And the ultimate key is that anyone can look at the source code and have a high probability to get it to run on any reasonably current system, and even many systems not so current. On a visible, continuous (daily), *OPEN* basis, there is ongoing review and input from the R user community, as well as all the highest standards of software engineering that are met by the R core team and other developers. R clearly stands up to rigorous, scholastic scrutiny. In my very grateful view, this makes R at least as reliable as commercial vendor software that claims "validation" or "compliance", etc., ...and probably, more reliable. Hope that helps. Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis at merck.com Phone: 732 594 3913 Fax: 732 594 1565> -----Original Message----- > From: Rob Lambkin [mailto:r.lambkin at retroscreen.com] > Sent: Thursday, April 17, 2003 4:51 AM > To: r-help at stat.math.ethz.ch > Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann > Subject: [R] Validation of R > > > Hi All > > I am really very interested in starting to use R within our company. I > particularly like the open source nature of the product. My > company is a > medical research company which is part of the University of London. > > We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment. > > Thanks > > Rob > > > Robert Lambkin BSc (Hon's), MRPharmS, PhD > Director and General Manager > Retroscreen Limited > Retroscreen Virology Limited > The Medical Building, Queen Mary, University of London, 327 Mile End > Road, London, E1 4NS > Tel: 020 7882 7624 Fax: 020 7882 6990 > (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) > > The information contained in this message is confidential and is > intended for the addressee(s) only. If you have received this > message in > error or there are any problems please notify the originator > immediately. > > Unauthorised use, disclosure, copying or alteration of this message is > strictly forbidden. > > Retroscreen Virology Limited will not be liable for any > action taken in > reliance on the data contained in this e-mail as it may not have been > quality assessed or assured. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
I suspect that there is no easy answer to this. The first step will be to write a user specification for what you want to use the software for. In most cases, I believe that you will want to use specific functions and scripts. Define those functions and scripts up front in the user specification. Next will be to create those scripts you wish to use and documenting the creation (I'm thinking of a library here) Once this is created, you would need to create a standard dataset(s), the more the better, for the testing of the functions defined in the user requirement specification. This is for comparison with a known result and used if the software is upgraded in the future. I would propose to do the analysis (on the standard data set) the first time with a known package such as SAS, and compare that with R. Once the data is been documented to match, this becomes your standard setup. I would then use a program such as gnu's diff for any changes in printouts from the two applications. Graphics are harder, but I believe that Paul Murrell and Kurt Hornik are working on this by the paper: Quality Assurance for Graphics in R, from the DSC 2003 Working Papers. Hope this helps. -----Original Message----- From: Rob Lambkin [mailto:r.lambkin at retroscreen.com] Sent: Thursday, April 17, 2003 3:51 AM To: r-help at stat.math.ethz.ch Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann Subject: [R] Validation of R Hi All I am really very interested in starting to use R within our company. I particularly like the open source nature of the product. My company is a medical research company which is part of the University of London. We conduct contract virology research for large pharma companies. My question is how do we validate this software? I wonder if anyone else has had the problem and might be able to comment. Thanks Rob Robert Lambkin BSc (Hon's), MRPharmS, PhD Director and General Manager Retroscreen Limited Retroscreen Virology Limited The Medical Building, Queen Mary, University of London, 327 Mile End Road, London, E1 4NS Tel: 020 7882 7624 Fax: 020 7882 6990 (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) The information contained in this message is confidential and is... {{dropped}}
Remember also, that there is an extensive series of tests available when installing R from source by executing "make check". Some time ago there was discussion of this topic in r-help (see the r-help archives). R. Woodrow Setzer, Jr. Phone: (919) 541-0128 Experimental Toxicology Division Fax: (919) 541-4284 Pharmacokinetics Branch NHEERL B143-05; US EPA; RTP, NC 27711 Shawn Way <sway at tanox.com> To: 'Rob Lambkin' <r.lambkin at retroscreen.com> Sent by: cc: "'r-help at stat.math.ethz.ch'" <r-help at stat.math.ethz.ch> r-help-bounces at stat.m Subject: RE: [R] Validation of R ath.ethz.ch 04/17/03 09:07 AM I suspect that there is no easy answer to this. The first step will be to write a user specification for what you want to use the software for. In most cases, I believe that you will want to use specific functions and scripts. Define those functions and scripts up front in the user specification. Next will be to create those scripts you wish to use and documenting the creation (I'm thinking of a library here) Once this is created, you would need to create a standard dataset(s), the more the better, for the testing of the functions defined in the user requirement specification. This is for comparison with a known result and used if the software is upgraded in the future. I would propose to do the analysis (on the standard data set) the first time with a known package such as SAS, and compare that with R. Once the data is been documented to match, this becomes your standard setup. I would then use a program such as gnu's diff for any changes in printouts from the two applications. Graphics are harder, but I believe that Paul Murrell and Kurt Hornik are working on this by the paper: Quality Assurance for Graphics in R, from the DSC 2003 Working Papers. Hope this helps. -----Original Message----- From: Rob Lambkin [mailto:r.lambkin at retroscreen.com] Sent: Thursday, April 17, 2003 3:51 AM To: r-help at stat.math.ethz.ch Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann Subject: [R] Validation of R Hi All I am really very interested in starting to use R within our company. I particularly like the open source nature of the product. My company is a medical research company which is part of the University of London. We conduct contract virology research for large pharma companies. My question is how do we validate this software? I wonder if anyone else has had the problem and might be able to comment. Thanks Rob Robert Lambkin BSc (Hon's), MRPharmS, PhD Director and General Manager Retroscreen Limited Retroscreen Virology Limited The Medical Building, Queen Mary, University of London, 327 Mile End Road, London, E1 4NS Tel: 020 7882 7624 Fax: 020 7882 6990 (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) The information contained in this message is confidential and is... {{dropped}}
However, the perception out there is the "SAS is the accepted software" especially for regulatory submission and especially in the US. Thus, I think validation usually means "Yeah, but did you use SAS to get the answer" , no matter how irrelevant the question is. For a non-statistician, or a person doing validation certain software do not need validation (Microsoft Word, SAS etc.) certain other , perhaps more so for open source, validation is essential. I do recall a thread last year or so that talked about use of open software for regulatory purposes. "Pikounis, Bill" <v_bill_pikounis@merck.com> Sent by: r-help-bounces@stat.math.ethz.ch 04/17/2003 08:53 AM To: "'Rob Lambkin'" <r.lambkin@retroscreen.com>, r-help@stat.math.ethz.ch cc: Shobana Balasingam <s.balasingam@retroscreen.com>, Seb Bossuyt <s.bossuyt@retroscreen.com>, Katie Benjamin <k.benjamin@retroscreen.com>, Alex Mann <a.mann@retroscreen.com> Subject: RE: [R] Validation of R Hi Rob,> We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment.Notwithstanding the disclaimer automatically appended below by my "Big Pharma" member IT dept on all my email sends, I have not had this problem, or perhaps more accurately, not allowed it to be a problem when work I have done in R has made it into drug application filings and responses to FDA and European regulatory agencies. "Validation" of software is an ill-defined concept, so I am afraid I cannot offer anything like a concrete "how-to", not would I be surprised if anyone else can. What I would like to suggest is to (1) ask your vendor companies what specifically they are concerned about, (2) benchmark some guidelines on how you all or others have "validated" other software. If you are looking for extensive documentation on whats/hows/whys of R, it already has it. If you are looking for it to compute the same values as "validated" software within realistic numeric accuracy for your procedures, that is straightforward to do. And the ultimate key is that anyone can look at the source code and have a high probability to get it to run on any reasonably current system, and even many systems not so current. On a visible, continuous (daily), *OPEN* basis, there is ongoing review and input from the R user community, as well as all the highest standards of software engineering that are met by the R core team and other developers. R clearly stands up to rigorous, scholastic scrutiny. In my very grateful view, this makes R at least as reliable as commercial vendor software that claims "validation" or "compliance", etc., ...and probably, more reliable. Hope that helps. Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis@merck.com Phone: 732 594 3913 Fax: 732 594 1565> -----Original Message----- > From: Rob Lambkin [mailto:r.lambkin@retroscreen.com] > Sent: Thursday, April 17, 2003 4:51 AM > To: r-help@stat.math.ethz.ch > Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann > Subject: [R] Validation of R > > > Hi All > > I am really very interested in starting to use R within our company. I > particularly like the open source nature of the product. My > company is a > medical research company which is part of the University of London. > > We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment. > > Thanks > > Rob > > > Robert Lambkin BSc (Hon's), MRPharmS, PhD > Director and General Manager > Retroscreen Limited > Retroscreen Virology Limited > The Medical Building, Queen Mary, University of London, 327 Mile End > Road, London, E1 4NS > Tel: 020 7882 7624 Fax: 020 7882 6990 > (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) > > The information contained in this message is confidential and is > intended for the addressee(s) only. If you have received this > message in > error or there are any problems please notify the originator > immediately. > > Unauthorised use, disclosure, copying or alteration of this message is > strictly forbidden. > > Retroscreen Virology Limited will not be liable for any > action taken in > reliance on the data contained in this e-mail as it may not have been > quality assessed or assured. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. [[alternate HTML version deleted]]
Is there any way to subset a time series without converting the result to a matrix or vector? I would like to replace some values in the time series to see the effect on forecasts.
Martin Maechler wrote:> >It seems you want to *replace* rather than *subset* >(otherwise, try to be more specific), and >there's no problem with that, e.g., with first two lines from >example(ts) : > > > >>gnp <- ts(cumsum(1 + round(rnorm(100), 2)), start = c(1954,7), frequency= 12) >>plot(gnp) >>gnp[20] <- 80 >>str(gnp) >> >> > Time-Series [1:100] from 1954 to 1963: 2.47 3.26 4.50 5.18 6.31 ... > > >>lines(gnp, col=2) >> >> > >------------ > >If you really want to subset, you can use window(.) on your >time series, but only for those kinds of subsetting. >General subsetting doesn't give regularly spaced series. > >{"thinning" (e.g. taking every 2nd value) would be one kind of > subsetting that could be made to work ... > --> proposals please to R-devel at lists.R-project.org :-) >} > >Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ >Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 >ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND >phone: x-41-1-632-3408 fax: ...-1228 <>< > > > >There does seem to be a problem in 1.6.2 - or am I missing something? First, x is a ts object. Once I use subsetting notation to replace one value, x does not seem to print correctly, and is.ts reports FALSE: > x <- ts(1:12,start=c(2003,1),frequency=12) > x Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2003 1 2 3 4 5 6 7 8 9 10 11 12 > x[1] [1] 1 > x[1] <- 9 > x [1] 9 2 3 4 5 6 7 8 9 10 11 12 attr(,"tsp") [1] 2003.000 2003.917 12.000 attr(,"class") [1] "ts" > x [1] 9 2 3 4 5 6 7 8 9 10 11 12 attr(,"tsp") [1] 2003.000 2003.917 12.000 attr(,"class") [1] "ts" > as.ts(x) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2003 9 2 3 4 5 6 7 8 9 10 11 12 > is.ts(x) [1] FALSE Is it supposed to work this way? Rick B.
This discussion reminds me of the hall-grousing about QA requirements that I hear (and participate in at times!) in my Laboratory. QA requirements generally address issues such as record-keeping, equipment maintenance and calibration, and standardizing methods that are used repeatedly. By no means does satisfying such QA requirements guarantee that conclusions drawn under such conditions are correct! They do help to eliminate some common sources of error, and the cost is generally not so great. With respect to software validation, certainly ensuring that the package is installed properly and gives generally correct results for calculations helps to allay concerns of non-specialists who have to rely on the results of the package. In some applications, the fact that the package used for a data analysis is completely open is a definite advantage (for example, in a government regulatory setting, such as here at the US EPA), since reviewers can have complete access to the computational machinery used in the analysis. R. Woodrow Setzer, Jr. Phone: (919) 541-0128 Experimental Toxicology Division Fax: (919) 541-4284 Pharmacokinetics Branch NHEERL B143-05; US EPA; RTP, NC 27711
I agree with your points and if you notice I share your philosophical view. I was commenting more on what you call "mind" share. It is still real. However, also a minor point - there is mention in the regs regarding COTS software (which I believe stands for Commercial of the Shelf software) .. Frank E Harrell Jr <fharrell@virginia.edu> 04/17/2003 12:10 PM To: partha_bagchi@hgsi.com cc: v_bill_pikounis@merck.com, k.benjamin@retroscreen.com, r-help@stat.math.ethz.ch, a.mann@retroscreen.com, s.balasingam@retroscreen.com, r.lambkin@retroscreen.com, s.bossuyt@retroscreen.com Subject: Re: [R] Validation of R On Thu, 17 Apr 2003 10:38:06 -0400 partha_bagchi@hgsi.com wrote:> However, the perception out there is the "SAS is the accepted software" > especially for regulatory submission and especially in the US. Thus, I > think validation usually means "Yeah, but did you use SAS to get the > answer" , no matter how irrelevant the question is. For a > non-statistician, or a person doing validation certain software do not > need validation (Microsoft Word, SAS etc.) certain other , perhaps moreso> for open source, validation is essential.SAS is NOT the accepted software for FDA, because FDA does not accept ANY brand of software. This is really a "mind share" issue at pharma companies. SAS is not validated in every sense; there is a huge list of current SAS bugs. Validation is best done on a per-project basis as you can't anticipate all aspects of a particular dataset. The validation can be done by independent calculations of pivotal findings. For R there is an especially good opportunity because if you are using the base packages you can run essentially the same code in S-Plus to get an independent validation of the underlying calculations (but not of your S code). The base code in R is independent of that in S-Plus (this is not true of most add-on packages by users). There is no other "SAS" you can run. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. [[alternate HTML version deleted]]
At Battelle, the QA/QC folks have the philosophy that the FDA will likely hold us responsible for whatever internal standards we set for ourselves, assuming that such standards are "reasonable". For software, our internal standards basically say that (1) COTS (Com'l Off The Shelf software) developed by a company having both a long history of selling high-quality products and good QA doesn't need extensive from-scratch validation, only validation of simpler routines like the computation of means, variances, linear regression models, &etc. (After all, how would anyone really validate what, say, PROC NLMIXED yields in a complex growth-curve application?) (2) Anything free needs to be extensively validated by comparing it with something that fits into (1) This leaves R completely out of our GLP studies, and favors SAS since Insightful hasn't been around as long as the SAS Institute. Like it or not, the perception is that using SAS won't get you into trouble with the FDA or other regulatory agencies. -David Paul -----Original Message----- From: partha_bagchi at hgsi.com [mailto:partha_bagchi at hgsi.com] Sent: Thursday, April 17, 2003 3:32 PM To: Frank E Harrell Jr Cc: k.benjamin at retroscreen.com; r-help at stat.math.ethz.ch; a.mann at retroscreen.com; s.balasingam at retroscreen.com; r.lambkin at retroscreen.com; v_bill_pikounis at merck.com; s.bossuyt at retroscreen.com Subject: Re: [R] Validation of R I agree with your points and if you notice I share your philosophical view. I was commenting more on what you call "mind" share. It is still real. However, also a minor point - there is mention in the regs regarding COTS software (which I believe stands for Commercial of the Shelf software) .. Frank E Harrell Jr <fharrell at virginia.edu> 04/17/2003 12:10 PM To: partha_bagchi at hgsi.com cc: v_bill_pikounis at merck.com, k.benjamin at retroscreen.com, r-help at stat.math.ethz.ch, a.mann at retroscreen.com, s.balasingam at retroscreen.com, r.lambkin at retroscreen.com, s.bossuyt at retroscreen.com Subject: Re: [R] Validation of R On Thu, 17 Apr 2003 10:38:06 -0400 partha_bagchi at hgsi.com wrote:> However, the perception out there is the "SAS is the accepted > software" especially for regulatory submission and especially in the > US. Thus, I think validation usually means "Yeah, but did you use SAS > to get the answer" , no matter how irrelevant the question is. For a > non-statistician, or a person doing validation certain software do not > need validation (Microsoft Word, SAS etc.) certain other , perhaps > moreso> for open source, validation is essential.SAS is NOT the accepted software for FDA, because FDA does not accept ANY brand of software. This is really a "mind share" issue at pharma companies. SAS is not validated in every sense; there is a huge list of current SAS bugs. Validation is best done on a per-project basis as you can't anticipate all aspects of a particular dataset. The validation can be done by independent calculations of pivotal findings. For R there is an especially good opportunity because if you are using the base packages you can run essentially the same code in S-Plus to get an independent validation of the underlying calculations (but not of your S code). The base code in R is independent of that in S-Plus (this is not true of most add-on packages by users). There is no other "SAS" you can run. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. [[alternate HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Perception, perception, perception... This reminds me of what I heard from Frank Harrell: "The different between S and SAS is five years". S has been around much longer than Insightful. Doesn't that count? R also comes with its own set suite ("make check", "make fullcheck", etc.). Doesn't that count? I guess one can try to validate NLMIXED by comparing the output with that of nlme()... Andy> -----Original Message----- > From: Paul, David A [mailto:paulda at BATTELLE.ORG] > Sent: Thursday, April 17, 2003 4:18 PM > To: r-help at stat.math.ethz.ch > Subject: RE: [R] Validation of R > > > At Battelle, the QA/QC folks have the philosophy that the > FDA will likely hold us responsible for whatever internal > standards we set for ourselves, assuming that such standards > are "reasonable". > > For software, our internal standards basically say that > > (1) COTS (Com'l Off The Shelf software) developed by a > company having both a long history of selling high-quality > products and good QA doesn't need extensive from-scratch > validation, only validation of simpler routines like the > computation of means, variances, linear regression models, > &etc. (After all, how would anyone really validate what, > say, PROC NLMIXED yields in a complex growth-curve application?) > > (2) Anything free needs to be extensively validated by > comparing it with something that fits into (1) > > This leaves R completely out of our GLP studies, and favors > SAS since Insightful hasn't been around as long as the SAS > Institute. Like it or not, the perception is that using SAS > won't get you into trouble with the FDA or other regulatory > agencies. > > > -David Paul > > > > -----Original Message----- > From: partha_bagchi at hgsi.com [mailto:partha_bagchi at hgsi.com] > Sent: Thursday, April 17, 2003 3:32 PM > To: Frank E Harrell Jr > Cc: k.benjamin at retroscreen.com; r-help at stat.math.ethz.ch; > a.mann at retroscreen.com; s.balasingam at retroscreen.com; > r.lambkin at retroscreen.com; v_bill_pikounis at merck.com; > s.bossuyt at retroscreen.com > Subject: Re: [R] Validation of R > > > I agree with your points and if you notice I share your philosophical > view. I was commenting more on what you call "mind" share. It > is still > real. > > However, also a minor point - there is mention in the regs > regarding COTS > software (which I believe stands for Commercial of the Shelf > software) .. > > > > > > > Frank E Harrell Jr <fharrell at virginia.edu> > 04/17/2003 12:10 PM > > > To: partha_bagchi at hgsi.com > cc: v_bill_pikounis at merck.com, > k.benjamin at retroscreen.com, > r-help at stat.math.ethz.ch, a.mann at retroscreen.com, > s.balasingam at retroscreen.com, r.lambkin at retroscreen.com, > s.bossuyt at retroscreen.com > Subject: Re: [R] Validation of R > > > On Thu, 17 Apr 2003 10:38:06 -0400 > partha_bagchi at hgsi.com wrote: > > > However, the perception out there is the "SAS is the accepted > > software" especially for regulatory submission and > especially in the > > US. Thus, I think validation usually means "Yeah, but did > you use SAS > > to get the answer" , no matter how irrelevant the question > is. For a > > non-statistician, or a person doing validation certain > software do not > > need validation (Microsoft Word, SAS etc.) certain other , perhaps > > more > so > > for open source, validation is essential. > > SAS is NOT the accepted software for FDA, because FDA does > not accept ANY > brand of software. This is really a "mind share" issue at pharma > companies. SAS is not validated in every sense; there is a > huge list of > current SAS bugs. > > Validation is best done on a per-project basis as you can't > anticipate all > aspects of a particular dataset. The validation can be done by > independent calculations of pivotal findings. For R there is an > especially good opportunity because if you are using the base > packages you > can run essentially the same code in S-Plus to get an independent > validation of the underlying calculations (but not of your S > code). The > base code in R is independent of that in S-Plus (this is not > true of most > add-on packages by users). There is no other "SAS" you can run. > > --- > Frank E Harrell Jr Prof. of Biostatistics & Statistics > Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. > Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > > > [[alternate HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
This I totally agree with. My belief is that more education is needed to make open source software acceptable to regulators. Note I have been using R (since version 0.6!) and usually mention in documents that I am using R version 1.5.1 or greater and SAS version 8.0 or greater. rossini@blindglobe.net (A.J. Rossini) Sent by: r-help-bounces@stat.math.ethz.ch 04/17/2003 04:55 PM Please respond to rossini To: r-help@stat.math.ethz.ch cc: Subject: Re: [R] Validation of R>> From: Paul, David A [mailto:paulda@BATTELLE.ORG]>> For software, our internal standards basically say that >> >> (1) COTS (Com'l Off The Shelf software) developed by a >> company having both a long history of selling high-quality >> products and good QA doesn't need extensive from-scratch >> validation, only validation of simpler routines like the >> computation of means, variances, linear regression models, >> &etc. (After all, how would anyone really validate what, >> say, PROC NLMIXED yields in a complex growth-curve application?)Too bad that can't be edited just a bit: (1) OTS (Off the Shelf software) developed by a group having both a long history of creating high-quality products and good QA which doesn't need extensive from-scratch validation, only validation of simpler routines like the computation of means, variances, linear regression models, etc. (After all, how would anyone really validate what, say, nlme(), yields in a complex growth-curve application, other than one of the originators of one of the families of NLME algorithms?) best, -tony -- A.J. Rossini rossini@u.washington.edu http://software.biostat.washington.edu/ Biostatistics, U Washington and Fred Hutchinson Cancer Research Center FHCRC:Tu: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW : Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX CONFIDENTIALITY NOTICE: This e-mail message and any attachments ... {{dropped}}
On Fri, 18 Apr 2003 09:50:19 -0400 "Paul, David A" <paulda at battelle.org> wrote: . . .> Now, what would REALLY be nice is if some generous organization > would pay a bunch of programmers to spend 2 - 3 years validating > the algorithms in R, taking the FDA's CFR guidelines into account... > That would truly be a service to everyone. > > > Sincerely, > > David PaulIn re-reviewing FDA guidelines there are no guidelines (either usage or validation guidelines) for statistical analysis software, only guidelines for database management software and software used in medical devices. And the only place where SAS is mentioned is related to data archiving using SAS Version 5 transport files. That (very problematic) format is currently preferred by FDA but experts expect that will be replaced with XML in the not-too-distant future. What is truly important is that data analysts check their work no matter what they are doing. By the way, I know of an important study in which incorrect final efficacy event rates were reported to FDA and in the scientific literature, due to (what I consider) a bug that has been in SAS for 30 years: a missing value is considered to be less than any valid numeric value, so "IF x < y THEN z=1" generates z=1 when x is missing and y is not. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
I agree too. At the end of the day when the cows come home, no amount of validation will turn a nonsignificant p value to a significant value or turn a wrong analysis to a right one. Frank E Harrell Jr <fharrell@virginia.edu> Sent by: r-help-bounces@stat.math.ethz.ch 04/18/2003 12:56 PM To: "Paul, David A" <paulda@battelle.org> cc: rhelp <r-help@stat.math.ethz.ch> Subject: Re: [R] Validation of R On Fri, 18 Apr 2003 09:50:19 -0400 "Paul, David A" <paulda@battelle.org> wrote: . . .> Now, what would REALLY be nice is if some generous organization > would pay a bunch of programmers to spend 2 - 3 years validating > the algorithms in R, taking the FDA's CFR guidelines into account... > That would truly be a service to everyone. > > > Sincerely, > > David PaulIn re-reviewing FDA guidelines there are no guidelines (either usage or validation guidelines) for statistical analysis software, only guidelines for database management software and software used in medical devices. And the only place where SAS is mentioned is related to data archiving using SAS Version 5 transport files. That (very problematic) format is currently preferred by FDA but experts expect that will be replaced with XML in the not-too-distant future. What is truly important is that data analysts check their work no matter what they are doing. By the way, I know of an important study in which incorrect final efficacy event rates were reported to FDA and in the scientific literature, due to (what I consider) a bug that has been in SAS for 30 years: a missing value is considered to be less than any valid numeric value, so "IF x < y THEN z=1" generates z=1 when x is missing and y is not. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat ______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. [[alternate HTML version deleted]]
Like the original poster, I'm in a corporation that interacts with the FDA (submissions for product approval, and potential for auditing of QC procedures). I fully expect to be asked to validate R, in some sense, within the year, maybe two. I have two main comments. First, I would be interested in participating in a small sub-project interested in exploring this in very practical ways, such as 1. Documenting resistance or regulatory needs R users are encountering in this environment, offline from the r-help list, 2. Sharing experiences (what works and what doesn't for assuaging managers' fears), and 3. If any further validation activities are deemed helpful (such as additional test cases and describing what the test cases are intended to test), making sure that these activities are fed back to the R project in a way that others can leverage them in the future. If you would also like to participate in this off-line discussion, I will be happy to collect names and e-mails. Or, if anyone has other ideas or feels motivated to drive something, feel free to step forward. Second, just minutes ago I raised this question with our software testor over lunch. She tests SAS code used to generate reports of clinical trial results, and other software used to get clinical data into a database. In retrospect she is a biased sample (of size 1!) because the open-source software model de-emphasizes the role (and value) of the professional software testor; nonetheless I thought her comments offer a taste of the opposition some may encounter. I'll tell you what she said, and then I'll offer my impressions; please don't argue with her points, because I already did! (A bit of background: we have chosen not to validate SAS procedures, and we say so in our test documentation. In practice, I think our clinical reporting rarely strays far from base SAS--99% of our reporting is just manipulating and tabulating data--and that may be a reason for the decision.) In a nutshell, she thought SAS was more trustworthy than R (to the extent that she thought we should test R's functions) based on two points: 1. SAS has a team of professional software testors who spend their time coming up with test cases that are as esoteric and odd as they can think of (within the limits of their specifications). She was not convinced that a large community of users is sufficient to flush out obscure bugs. In her view (not surprisingly), software testors will look at software with a unique eye. (Which I think is true--but an army of users also does pretty well.) 2. SAS has a long history of quality, and their market niche requires them to pay close attention to quality. This distinguishes them from Microsoft, which has little financial incentive to pay close attention to quality, and does not have a history of quality despite a large group of professional software testors. She and I agreed that if one must know for certain that a particular function works, one must test it or find documentation indicating precisely how someone else tested it. Fortunately R packages come with test cases, but they're not usually test cases designed to check a large number of possible failure mechanisms. My take on this is as follows: 1. There seem to be two varieties of validation involved here. The first provides clear assurance that a specific application does a specific thing. This is what software validation should really be, and no software, not even SAS, is above this. Then there is "warm and fuzzy" validation that offers limited assurance that the software is generally of good quality. This is subjective, a matter of reputation, and there is no testing or documentation that can definitively address this ill-defined criterion. A software package could be excellent, with only one bug, but if your application hits that bug, you have a problem. 2. I think this thread is mainly addressing the "warm and fuzzy" validation model. R is going to encounter skepticism among people who haven't been exposed to it before, especially if they also have not been exposed to other open-source software (OSS). In my experience, people who have not been involved in any software development expect corporate support to lead to quality software ("they have resources!"). We all know this is a fallacy, but you can't argue it away, you just have to demonstrate the software. When they become familiar with it, they'll stop asking for the warm and fuzzy validation. If my reading of the situation is correct, then the right response is to dazzle. The warm-and-fuzzy validation is really an opportunity for a software demo. Demonstrate the functions you're likely to use, especially (following Dr. Harrell's advice) using simulation. Then repeat the simulation but with outliers added, and use robust methods. Read in a CSV file from a network drive, create some beautiful plots, save the data in compressed format and document file size (also document the original CSV's file size), read the data back into a concurrently-running R process and show it's the same. Install a particularly impressive and esoteric package that's remotely related to your problem and document what it does. Generate pseudorandom data using three different generators, from a given seed, and then reproduce the data. Calculate P(Z <= -20) for Z ~ N(0, 1), then calculate P(Z > 20) using lower.tail = F. You will provide only an iota of assurance that a particular future application will work, but you will have removed all doubt that R is a serious, rigorous, powerful package. And that addresses the concerns that may not be voiced, but are underlying. -Jim Garrett Becton Dickinson Diagnostic Systems ********************************************************************** This message is intended only for the designated recipient(s). ... {{dropped}}
In a message dated 4/18/03 1:01:40 PM Eastern Daylight Time, fharrell@virginia.edu writes:> What is truly important is that data analysts check their work no matter > what they are doing. >This has been a fruitful discussion about doing "scientific" research. To the software issues one can add the general issue of reproducability: data availability (even to reviewers), clear description, etc. Following links has a nice write-up on this issue. It is likely to be relevant to a wider audience using statistical methods. <A HREF="http://www.econ.uiuc.edu/~roger/repro.html">http://www.econ.uiuc.edu/~roger/repro.html</A> Incentives in research/academia may also have something to do with this. For this an interesting view (you have to make the connection): <A HREF="http://www.econ.uiuc.edu/~roger/gaps.html">http://www.econ.uiuc.edu/~roger/gaps.html</A> -anupam. [[alternate HTML version deleted]]
In a message dated 4/18/03 6:29:52 PM Eastern Daylight Time, rossini@blindglobe.net writes:> my work on ESS and Noweb (literate statistical practice), as described > in my DSC-2001 conference paper, as well as University of Washington > Biostat TechRep #163 http://www.bepress.com/uwbiostat/paper194/ and > related Chance article with Fritz Leisch (to appear in next issue). >I have recently started to use Sweave, with Noweb. Quite nice. The article in R-news was a good intro. Thanks. [[alternate HTML version deleted]]
In a message dated 4/21/03 3:07:20 PM Eastern Daylight Time, pgilbert@bank-banque-canada.ca writes:> There may be less work involved in doing (un-official) validation than there > is > in advertising how much is actually being done. Perhaps the simplest > approach is > for individuals to put together packages of tests with descriptions that > explain > the extent of the testing which is done, and then submit the packages to > CRAN. >Another suggestion to address the issue of validation in the long-term: 1) "Validation" of scientific research takes the form of peer-review, perhaps it is possible to come-up with a similar (not same) process for publishing software, while maintaining the openness. 2) We can also think of contribution to open source software as academic contribution---it definitely furthers scientific inquiry by providing basic tools for it, quite broadly and across many disciplines. So it may be possible to think about a framework that provides academic/research credit, like peer-reviewed publications, for *publishing* software, may be even using peer-review in a way that allows continuous review, continuous feedback, continuous improvement. Perhaps R community can take a lead in coming up with the framework, or the issues that need to be addressed. 3) Meets Popperian criteria for science (even Lakatos' ): a result (esp difficult numerical optimization) is falsifiable, and may be falsified when a bug is found or a better procedure is implemented. ---Anupam. [[alternate HTML version deleted]]