Hi All I am really very interested in starting to use R within our company. I particularly like the open source nature of the product. My company is a medical research company which is part of the University of London. We conduct contract virology research for large pharma companies. My question is how do we validate this software? I wonder if anyone else has had the problem and might be able to comment. Thanks Rob Robert Lambkin BSc (Hon's), MRPharmS, PhD Director and General Manager Retroscreen Limited Retroscreen Virology Limited The Medical Building, Queen Mary, University of London, 327 Mile End Road, London, E1 4NS Tel: 020 7882 7624 Fax: 020 7882 6990 (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Retroscreen Virology Limited will not be liable for any action taken in reliance on the data contained in this e-mail as it may not have been quality assessed or assured.
The national institute of standards and technology offers reference data sets
and expected results for various statistical procedures using these data sets.
From the web site:
"The purpose of this project is to improve the accuracy of statistical
software by providing reference datasets with certified computational results
that enable the objective evaluation of statistical software."
http://www.itl.nist.gov/div898/strd/
-------Original Message-------
From: "Pikounis, Bill" <v_bill_pikounis at merck.com>
Sent: 04/17/03 07:53 AM
To: 'Rob Lambkin' <r.lambkin at retroscreen.com>, r-help at
stat.math.ethz.ch
Subject: RE: [R] Validation of R
>
> Hi Rob,
> We conduct contract virology research for large pharma companies. My
> question is how do we validate this software? I wonder if anyone else
> has had the problem and might be able to comment.
Notwithstanding the disclaimer automatically appended below by my "Big
Pharma" member IT dept on all my email sends, I have not had this problem,
or perhaps more accurately, not allowed it to be a problem when work I
have
done in R has made it into drug application filings and responses to FDA
and
European regulatory agencies.
"Validation" of software is an ill-defined concept, so I am afraid I
cannot
offer anything like a concrete "how-to", not would I be surprised if
anyone
else can. What I would like to suggest is to (1) ask your vendor
companies
what specifically they are concerned about, (2) benchmark some guidelines
on
how you all or others have "validated" other software.
If you are looking for extensive documentation on whats/hows/whys of R, it
already has it. If you are looking for it to compute the same values as
"validated" software within realistic numeric accuracy for your
procedures,
that is straightforward to do. And the ultimate key is that anyone can
look
at the source code and have a high probability to get it to run on any
reasonably current system, and even many systems not so current.
On a visible, continuous (daily), *OPEN* basis, there is ongoing review
and
input from the R user community, as well as all the highest standards of
software engineering that are met by the R core team and other developers.
R
clearly stands up to rigorous, scholastic scrutiny. In my very grateful
view, this makes R at least as reliable as commercial vendor software that
claims "validation" or "compliance", etc., ...and probably,
more reliable.
Hope that helps.
Bill
----------------------------------------
Bill Pikounis, Ph.D.
Biometrics Research Department
Merck Research Laboratories
PO Box 2000, MailDrop RY84-16
126 E. Lincoln Avenue
Rahway, New Jersey 07065-0900
USA
v_bill_pikounis at merck.com
Phone: 732 594 3913
Fax: 732 594 1565
> -----Original Message-----
> From: Rob Lambkin [mailto:r.lambkin at retroscreen.com]
> Sent: Thursday, April 17, 2003 4:51 AM
> To: r-help at stat.math.ethz.ch
> Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann
> Subject: [R] Validation of R
>
>
> Hi All
>
> I am really very interested in starting to use R within our company. I
> particularly like the open source nature of the product. My
> company is a
> medical research company which is part of the University of London.
>
> We conduct contract virology research for large pharma companies. My
> question is how do we validate this software? I wonder if anyone else
> has had the problem and might be able to comment.
>
> Thanks
>
> Rob
>
>
> Robert Lambkin BSc (Hon's), MRPharmS, PhD
> Director and General Manager
> Retroscreen Limited
> Retroscreen Virology Limited
> The Medical Building, Queen Mary, University of London, 327 Mile End
> Road, London, E1 4NS
> Tel: 020 7882 7624 Fax: 020 7882 6990
> (Retroscreen Virology Ltd. Registered in England & Wales No:2326557)
>
> The information contained in this message is confidential and is
> intended for the addressee(s) only. If you have received this
> message in
> error or there are any problems please notify the originator
> immediately.
>
> Unauthorised use, disclosure, copying or alteration of this message is
> strictly forbidden.
>
> Retroscreen Virology Limited will not be liable for any
> action taken in
> reliance on the data contained in this e-mail as it may not have been
> quality assessed or assured.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help>
Hi Rob,> We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment.Notwithstanding the disclaimer automatically appended below by my "Big Pharma" member IT dept on all my email sends, I have not had this problem, or perhaps more accurately, not allowed it to be a problem when work I have done in R has made it into drug application filings and responses to FDA and European regulatory agencies. "Validation" of software is an ill-defined concept, so I am afraid I cannot offer anything like a concrete "how-to", not would I be surprised if anyone else can. What I would like to suggest is to (1) ask your vendor companies what specifically they are concerned about, (2) benchmark some guidelines on how you all or others have "validated" other software. If you are looking for extensive documentation on whats/hows/whys of R, it already has it. If you are looking for it to compute the same values as "validated" software within realistic numeric accuracy for your procedures, that is straightforward to do. And the ultimate key is that anyone can look at the source code and have a high probability to get it to run on any reasonably current system, and even many systems not so current. On a visible, continuous (daily), *OPEN* basis, there is ongoing review and input from the R user community, as well as all the highest standards of software engineering that are met by the R core team and other developers. R clearly stands up to rigorous, scholastic scrutiny. In my very grateful view, this makes R at least as reliable as commercial vendor software that claims "validation" or "compliance", etc., ...and probably, more reliable. Hope that helps. Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis at merck.com Phone: 732 594 3913 Fax: 732 594 1565> -----Original Message----- > From: Rob Lambkin [mailto:r.lambkin at retroscreen.com] > Sent: Thursday, April 17, 2003 4:51 AM > To: r-help at stat.math.ethz.ch > Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann > Subject: [R] Validation of R > > > Hi All > > I am really very interested in starting to use R within our company. I > particularly like the open source nature of the product. My > company is a > medical research company which is part of the University of London. > > We conduct contract virology research for large pharma companies. My > question is how do we validate this software? I wonder if anyone else > has had the problem and might be able to comment. > > Thanks > > Rob > > > Robert Lambkin BSc (Hon's), MRPharmS, PhD > Director and General Manager > Retroscreen Limited > Retroscreen Virology Limited > The Medical Building, Queen Mary, University of London, 327 Mile End > Road, London, E1 4NS > Tel: 020 7882 7624 Fax: 020 7882 6990 > (Retroscreen Virology Ltd. Registered in England & Wales No:2326557) > > The information contained in this message is confidential and is > intended for the addressee(s) only. If you have received this > message in > error or there are any problems please notify the originator > immediately. > > Unauthorised use, disclosure, copying or alteration of this message is > strictly forbidden. > > Retroscreen Virology Limited will not be liable for any > action taken in > reliance on the data contained in this e-mail as it may not have been > quality assessed or assured. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
I suspect that there is no easy answer to this.
The first step will be to write a user specification for what you want to
use the software for. In most cases, I believe that you will want to use
specific functions and scripts. Define those functions and scripts up front
in the user specification.
Next will be to create those scripts you wish to use and documenting the
creation (I'm thinking of a library here)
Once this is created, you would need to create a standard dataset(s), the
more the better, for the testing of the functions defined in the user
requirement specification. This is for comparison with a known result and
used if the software is upgraded in the future. I would propose to do the
analysis (on the standard data set) the first time with a known package such
as SAS, and compare that with R. Once the data is been documented to match,
this becomes your standard setup. I would then use a program such as gnu's
diff for any changes in printouts from the two applications.
Graphics are harder, but I believe that Paul Murrell and Kurt Hornik are
working on this by the paper: Quality Assurance for Graphics in R, from the
DSC 2003 Working Papers.
Hope this helps.
-----Original Message-----
From: Rob Lambkin [mailto:r.lambkin at retroscreen.com]
Sent: Thursday, April 17, 2003 3:51 AM
To: r-help at stat.math.ethz.ch
Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann
Subject: [R] Validation of R
Hi All
I am really very interested in starting to use R within our company. I
particularly like the open source nature of the product. My company is a
medical research company which is part of the University of London.
We conduct contract virology research for large pharma companies. My
question is how do we validate this software? I wonder if anyone else has
had the problem and might be able to comment.
Thanks
Rob
Robert Lambkin BSc (Hon's), MRPharmS, PhD
Director and General Manager
Retroscreen Limited
Retroscreen Virology Limited
The Medical Building, Queen Mary, University of London, 327 Mile End
Road, London, E1 4NS
Tel: 020 7882 7624 Fax: 020 7882 6990
(Retroscreen Virology Ltd. Registered in England & Wales No:2326557)
The information contained in this message is confidential and is... {{dropped}}
Remember also, that there is an extensive series of tests available when
installing R from source by executing "make check". Some time ago
there
was discussion of this topic in r-help (see the r-help archives).
R. Woodrow Setzer, Jr. Phone: (919) 541-0128
Experimental Toxicology Division Fax: (919) 541-4284
Pharmacokinetics Branch
NHEERL B143-05; US EPA; RTP, NC 27711
Shawn Way
<sway at tanox.com> To: 'Rob
Lambkin' <r.lambkin at retroscreen.com>
Sent by: cc: "'r-help
at stat.math.ethz.ch'" <r-help at stat.math.ethz.ch>
r-help-bounces at stat.m Subject: RE: [R]
Validation of R
ath.ethz.ch
04/17/03 09:07 AM
I suspect that there is no easy answer to this.
The first step will be to write a user specification for what you want
to
use the software for. In most cases, I believe that you will want to
use
specific functions and scripts. Define those functions and scripts up
front
in the user specification.
Next will be to create those scripts you wish to use and documenting the
creation (I'm thinking of a library here)
Once this is created, you would need to create a standard dataset(s),
the
more the better, for the testing of the functions defined in the user
requirement specification. This is for comparison with a known result
and
used if the software is upgraded in the future. I would propose to do
the
analysis (on the standard data set) the first time with a known package
such
as SAS, and compare that with R. Once the data is been documented to
match,
this becomes your standard setup. I would then use a program such as
gnu's
diff for any changes in printouts from the two applications.
Graphics are harder, but I believe that Paul Murrell and Kurt Hornik are
working on this by the paper: Quality Assurance for Graphics in R, from
the
DSC 2003 Working Papers.
Hope this helps.
-----Original Message-----
From: Rob Lambkin [mailto:r.lambkin at retroscreen.com]
Sent: Thursday, April 17, 2003 3:51 AM
To: r-help at stat.math.ethz.ch
Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann
Subject: [R] Validation of R
Hi All
I am really very interested in starting to use R within our company. I
particularly like the open source nature of the product. My company is a
medical research company which is part of the University of London.
We conduct contract virology research for large pharma companies. My
question is how do we validate this software? I wonder if anyone else
has
had the problem and might be able to comment.
Thanks
Rob
Robert Lambkin BSc (Hon's), MRPharmS, PhD
Director and General Manager
Retroscreen Limited
Retroscreen Virology Limited
The Medical Building, Queen Mary, University of London, 327 Mile End
Road, London, E1 4NS
Tel: 020 7882 7624 Fax: 020 7882 6990
(Retroscreen Virology Ltd. Registered in England & Wales No:2326557)
The information contained in this message is confidential and is... {{dropped}}
However, the perception out there is the "SAS is the accepted
software"
especially for regulatory submission and especially in the US. Thus, I
think validation usually means "Yeah, but did you use SAS to get the
answer" , no matter how irrelevant the question is. For a
non-statistician, or a person doing validation certain software do not
need validation (Microsoft Word, SAS etc.) certain other , perhaps more so
for open source, validation is essential.
I do recall a thread last year or so that talked about use of open
software for regulatory purposes.
"Pikounis, Bill" <v_bill_pikounis@merck.com>
Sent by: r-help-bounces@stat.math.ethz.ch
04/17/2003 08:53 AM
To: "'Rob Lambkin'"
<r.lambkin@retroscreen.com>, r-help@stat.math.ethz.ch
cc: Shobana Balasingam <s.balasingam@retroscreen.com>, Seb
Bossuyt
<s.bossuyt@retroscreen.com>, Katie Benjamin
<k.benjamin@retroscreen.com>,
Alex Mann <a.mann@retroscreen.com>
Subject: RE: [R] Validation of R
Hi Rob,
> We conduct contract virology research for large pharma companies. My
> question is how do we validate this software? I wonder if anyone else
> has had the problem and might be able to comment.
Notwithstanding the disclaimer automatically appended below by my "Big
Pharma" member IT dept on all my email sends, I have not had this problem,
or perhaps more accurately, not allowed it to be a problem when work I
have
done in R has made it into drug application filings and responses to FDA
and
European regulatory agencies.
"Validation" of software is an ill-defined concept, so I am afraid I
cannot
offer anything like a concrete "how-to", not would I be surprised if
anyone
else can. What I would like to suggest is to (1) ask your vendor
companies
what specifically they are concerned about, (2) benchmark some guidelines
on
how you all or others have "validated" other software.
If you are looking for extensive documentation on whats/hows/whys of R, it
already has it. If you are looking for it to compute the same values as
"validated" software within realistic numeric accuracy for your
procedures,
that is straightforward to do. And the ultimate key is that anyone can
look
at the source code and have a high probability to get it to run on any
reasonably current system, and even many systems not so current.
On a visible, continuous (daily), *OPEN* basis, there is ongoing review
and
input from the R user community, as well as all the highest standards of
software engineering that are met by the R core team and other developers.
R
clearly stands up to rigorous, scholastic scrutiny. In my very grateful
view, this makes R at least as reliable as commercial vendor software that
claims "validation" or "compliance", etc., ...and probably,
more reliable.
Hope that helps.
Bill
----------------------------------------
Bill Pikounis, Ph.D.
Biometrics Research Department
Merck Research Laboratories
PO Box 2000, MailDrop RY84-16
126 E. Lincoln Avenue
Rahway, New Jersey 07065-0900
USA
v_bill_pikounis@merck.com
Phone: 732 594 3913
Fax: 732 594 1565
> -----Original Message-----
> From: Rob Lambkin [mailto:r.lambkin@retroscreen.com]
> Sent: Thursday, April 17, 2003 4:51 AM
> To: r-help@stat.math.ethz.ch
> Cc: Shobana Balasingam; Seb Bossuyt; Katie Benjamin; Alex Mann
> Subject: [R] Validation of R
>
>
> Hi All
>
> I am really very interested in starting to use R within our company. I
> particularly like the open source nature of the product. My
> company is a
> medical research company which is part of the University of London.
>
> We conduct contract virology research for large pharma companies. My
> question is how do we validate this software? I wonder if anyone else
> has had the problem and might be able to comment.
>
> Thanks
>
> Rob
>
>
> Robert Lambkin BSc (Hon's), MRPharmS, PhD
> Director and General Manager
> Retroscreen Limited
> Retroscreen Virology Limited
> The Medical Building, Queen Mary, University of London, 327 Mile End
> Road, London, E1 4NS
> Tel: 020 7882 7624 Fax: 020 7882 6990
> (Retroscreen Virology Ltd. Registered in England & Wales No:2326557)
>
> The information contained in this message is confidential and is
> intended for the addressee(s) only. If you have received this
> message in
> error or there are any problems please notify the originator
> immediately.
>
> Unauthorised use, disclosure, copying or alteration of this message is
> strictly forbidden.
>
> Retroscreen Virology Limited will not be liable for any
> action taken in
> reliance on the data contained in this e-mail as it may not have been
> quality assessed or assured.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
[[alternate HTML version deleted]]
Is there any way to subset a time series without converting the result to a matrix or vector? I would like to replace some values in the time series to see the effect on forecasts.
Martin Maechler wrote:> >It seems you want to *replace* rather than *subset* >(otherwise, try to be more specific), and >there's no problem with that, e.g., with first two lines from >example(ts) : > > > >>gnp <- ts(cumsum(1 + round(rnorm(100), 2)), start = c(1954,7), frequency= 12) >>plot(gnp) >>gnp[20] <- 80 >>str(gnp) >> >> > Time-Series [1:100] from 1954 to 1963: 2.47 3.26 4.50 5.18 6.31 ... > > >>lines(gnp, col=2) >> >> > >------------ > >If you really want to subset, you can use window(.) on your >time series, but only for those kinds of subsetting. >General subsetting doesn't give regularly spaced series. > >{"thinning" (e.g. taking every 2nd value) would be one kind of > subsetting that could be made to work ... > --> proposals please to R-devel at lists.R-project.org :-) >} > >Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ >Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 >ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND >phone: x-41-1-632-3408 fax: ...-1228 <>< > > > >There does seem to be a problem in 1.6.2 - or am I missing something? First, x is a ts object. Once I use subsetting notation to replace one value, x does not seem to print correctly, and is.ts reports FALSE: > x <- ts(1:12,start=c(2003,1),frequency=12) > x Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2003 1 2 3 4 5 6 7 8 9 10 11 12 > x[1] [1] 1 > x[1] <- 9 > x [1] 9 2 3 4 5 6 7 8 9 10 11 12 attr(,"tsp") [1] 2003.000 2003.917 12.000 attr(,"class") [1] "ts" > x [1] 9 2 3 4 5 6 7 8 9 10 11 12 attr(,"tsp") [1] 2003.000 2003.917 12.000 attr(,"class") [1] "ts" > as.ts(x) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2003 9 2 3 4 5 6 7 8 9 10 11 12 > is.ts(x) [1] FALSE Is it supposed to work this way? Rick B.
This discussion reminds me of the hall-grousing about QA requirements that I hear (and participate in at times!) in my Laboratory. QA requirements generally address issues such as record-keeping, equipment maintenance and calibration, and standardizing methods that are used repeatedly. By no means does satisfying such QA requirements guarantee that conclusions drawn under such conditions are correct! They do help to eliminate some common sources of error, and the cost is generally not so great. With respect to software validation, certainly ensuring that the package is installed properly and gives generally correct results for calculations helps to allay concerns of non-specialists who have to rely on the results of the package. In some applications, the fact that the package used for a data analysis is completely open is a definite advantage (for example, in a government regulatory setting, such as here at the US EPA), since reviewers can have complete access to the computational machinery used in the analysis. R. Woodrow Setzer, Jr. Phone: (919) 541-0128 Experimental Toxicology Division Fax: (919) 541-4284 Pharmacokinetics Branch NHEERL B143-05; US EPA; RTP, NC 27711
I agree with your points and if you notice I share your philosophical
view. I was commenting more on what you call "mind" share. It is still
real.
However, also a minor point - there is mention in the regs regarding COTS
software (which I believe stands for Commercial of the Shelf software) ..
Frank E Harrell Jr <fharrell@virginia.edu>
04/17/2003 12:10 PM
To: partha_bagchi@hgsi.com
cc: v_bill_pikounis@merck.com, k.benjamin@retroscreen.com,
r-help@stat.math.ethz.ch, a.mann@retroscreen.com,
s.balasingam@retroscreen.com, r.lambkin@retroscreen.com,
s.bossuyt@retroscreen.com
Subject: Re: [R] Validation of R
On Thu, 17 Apr 2003 10:38:06 -0400
partha_bagchi@hgsi.com wrote:
> However, the perception out there is the "SAS is the accepted
software"
> especially for regulatory submission and especially in the US. Thus, I
> think validation usually means "Yeah, but did you use SAS to get the
> answer" , no matter how irrelevant the question is. For a
> non-statistician, or a person doing validation certain software do not
> need validation (Microsoft Word, SAS etc.) certain other , perhaps more
so> for open source, validation is essential.
SAS is NOT the accepted software for FDA, because FDA does not accept ANY
brand of software. This is really a "mind share" issue at pharma
companies. SAS is not validated in every sense; there is a huge list of
current SAS bugs.
Validation is best done on a per-project basis as you can't anticipate all
aspects of a particular dataset. The validation can be done by
independent calculations of pivotal findings. For R there is an
especially good opportunity because if you are using the base packages you
can run essentially the same code in S-Plus to get an independent
validation of the underlying calculations (but not of your S code). The
base code in R is independent of that in S-Plus (this is not true of most
add-on packages by users). There is no other "SAS" you can run.
---
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
[[alternate HTML version deleted]]
At Battelle, the QA/QC folks have the philosophy that the
FDA will likely hold us responsible for whatever internal
standards we set for ourselves, assuming that such standards
are "reasonable".
For software, our internal standards basically say that
(1) COTS (Com'l Off The Shelf software) developed by a
company having both a long history of selling high-quality
products and good QA doesn't need extensive from-scratch
validation, only validation of simpler routines like the
computation of means, variances, linear regression models,
&etc. (After all, how would anyone really validate what,
say, PROC NLMIXED yields in a complex growth-curve application?)
(2) Anything free needs to be extensively validated by
comparing it with something that fits into (1)
This leaves R completely out of our GLP studies, and favors
SAS since Insightful hasn't been around as long as the SAS
Institute. Like it or not, the perception is that using SAS
won't get you into trouble with the FDA or other regulatory
agencies.
-David Paul
-----Original Message-----
From: partha_bagchi at hgsi.com [mailto:partha_bagchi at hgsi.com]
Sent: Thursday, April 17, 2003 3:32 PM
To: Frank E Harrell Jr
Cc: k.benjamin at retroscreen.com; r-help at stat.math.ethz.ch;
a.mann at retroscreen.com; s.balasingam at retroscreen.com;
r.lambkin at retroscreen.com; v_bill_pikounis at merck.com;
s.bossuyt at retroscreen.com
Subject: Re: [R] Validation of R
I agree with your points and if you notice I share your philosophical
view. I was commenting more on what you call "mind" share. It is still
real.
However, also a minor point - there is mention in the regs regarding COTS
software (which I believe stands for Commercial of the Shelf software) ..
Frank E Harrell Jr <fharrell at virginia.edu>
04/17/2003 12:10 PM
To: partha_bagchi at hgsi.com
cc: v_bill_pikounis at merck.com, k.benjamin at retroscreen.com,
r-help at stat.math.ethz.ch, a.mann at retroscreen.com,
s.balasingam at retroscreen.com, r.lambkin at retroscreen.com,
s.bossuyt at retroscreen.com
Subject: Re: [R] Validation of R
On Thu, 17 Apr 2003 10:38:06 -0400
partha_bagchi at hgsi.com wrote:
> However, the perception out there is the "SAS is the accepted
> software" especially for regulatory submission and especially in the
> US. Thus, I think validation usually means "Yeah, but did you use SAS
> to get the answer" , no matter how irrelevant the question is. For a
> non-statistician, or a person doing validation certain software do not
> need validation (Microsoft Word, SAS etc.) certain other , perhaps
> more
so> for open source, validation is essential.
SAS is NOT the accepted software for FDA, because FDA does not accept ANY
brand of software. This is really a "mind share" issue at pharma
companies. SAS is not validated in every sense; there is a huge list of
current SAS bugs.
Validation is best done on a per-project basis as you can't anticipate all
aspects of a particular dataset. The validation can be done by
independent calculations of pivotal findings. For R there is an
especially good opportunity because if you are using the base packages you
can run essentially the same code in S-Plus to get an independent
validation of the underlying calculations (but not of your S code). The
base code in R is independent of that in S-Plus (this is not true of most
add-on packages by users). There is no other "SAS" you can run.
---
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U.
Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
[[alternate HTML version deleted]]
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Perception, perception, perception...
This reminds me of what I heard from Frank Harrell: "The different between
S
and SAS is five years". S has been around much longer than Insightful.
Doesn't that count?
R also comes with its own set suite ("make check", "make
fullcheck", etc.).
Doesn't that count?
I guess one can try to validate NLMIXED by comparing the output with that of
nlme()...
Andy
> -----Original Message-----
> From: Paul, David A [mailto:paulda at BATTELLE.ORG]
> Sent: Thursday, April 17, 2003 4:18 PM
> To: r-help at stat.math.ethz.ch
> Subject: RE: [R] Validation of R
>
>
> At Battelle, the QA/QC folks have the philosophy that the
> FDA will likely hold us responsible for whatever internal
> standards we set for ourselves, assuming that such standards
> are "reasonable".
>
> For software, our internal standards basically say that
>
> (1) COTS (Com'l Off The Shelf software) developed by a
> company having both a long history of selling high-quality
> products and good QA doesn't need extensive from-scratch
> validation, only validation of simpler routines like the
> computation of means, variances, linear regression models,
> &etc. (After all, how would anyone really validate what,
> say, PROC NLMIXED yields in a complex growth-curve application?)
>
> (2) Anything free needs to be extensively validated by
> comparing it with something that fits into (1)
>
> This leaves R completely out of our GLP studies, and favors
> SAS since Insightful hasn't been around as long as the SAS
> Institute. Like it or not, the perception is that using SAS
> won't get you into trouble with the FDA or other regulatory
> agencies.
>
>
> -David Paul
>
>
>
> -----Original Message-----
> From: partha_bagchi at hgsi.com [mailto:partha_bagchi at hgsi.com]
> Sent: Thursday, April 17, 2003 3:32 PM
> To: Frank E Harrell Jr
> Cc: k.benjamin at retroscreen.com; r-help at stat.math.ethz.ch;
> a.mann at retroscreen.com; s.balasingam at retroscreen.com;
> r.lambkin at retroscreen.com; v_bill_pikounis at merck.com;
> s.bossuyt at retroscreen.com
> Subject: Re: [R] Validation of R
>
>
> I agree with your points and if you notice I share your philosophical
> view. I was commenting more on what you call "mind" share. It
> is still
> real.
>
> However, also a minor point - there is mention in the regs
> regarding COTS
> software (which I believe stands for Commercial of the Shelf
> software) ..
>
>
>
>
>
>
> Frank E Harrell Jr <fharrell at virginia.edu>
> 04/17/2003 12:10 PM
>
>
> To: partha_bagchi at hgsi.com
> cc: v_bill_pikounis at merck.com,
> k.benjamin at retroscreen.com,
> r-help at stat.math.ethz.ch, a.mann at retroscreen.com,
> s.balasingam at retroscreen.com, r.lambkin at retroscreen.com,
> s.bossuyt at retroscreen.com
> Subject: Re: [R] Validation of R
>
>
> On Thu, 17 Apr 2003 10:38:06 -0400
> partha_bagchi at hgsi.com wrote:
>
> > However, the perception out there is the "SAS is the accepted
> > software" especially for regulatory submission and
> especially in the
> > US. Thus, I think validation usually means "Yeah, but did
> you use SAS
> > to get the answer" , no matter how irrelevant the question
> is. For a
> > non-statistician, or a person doing validation certain
> software do not
> > need validation (Microsoft Word, SAS etc.) certain other , perhaps
> > more
> so
> > for open source, validation is essential.
>
> SAS is NOT the accepted software for FDA, because FDA does
> not accept ANY
> brand of software. This is really a "mind share" issue at pharma
> companies. SAS is not validated in every sense; there is a
> huge list of
> current SAS bugs.
>
> Validation is best done on a per-project basis as you can't
> anticipate all
> aspects of a particular dataset. The validation can be done by
> independent calculations of pivotal findings. For R there is an
> especially good opportunity because if you are using the base
> packages you
> can run essentially the same code in S-Plus to get an independent
> validation of the underlying calculations (but not of your S
> code). The
> base code in R is independent of that in S-Plus (this is not
> true of most
> add-on packages by users). There is no other "SAS" you can run.
>
> ---
> Frank E Harrell Jr Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U.
> Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
>
> [[alternate HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
This I totally agree with. My belief is that more education is needed to
make open source software acceptable to regulators.
Note I have been using R (since version 0.6!) and usually mention in
documents that I am using R version 1.5.1 or greater and SAS version 8.0
or greater.
rossini@blindglobe.net (A.J. Rossini)
Sent by: r-help-bounces@stat.math.ethz.ch
04/17/2003 04:55 PM
Please respond to rossini
To: r-help@stat.math.ethz.ch
cc:
Subject: Re: [R] Validation of R
>> From: Paul, David A [mailto:paulda@BATTELLE.ORG]
>> For software, our internal standards basically say that
>>
>> (1) COTS (Com'l Off The Shelf software) developed by a
>> company having both a long history of selling high-quality
>> products and good QA doesn't need extensive from-scratch
>> validation, only validation of simpler routines like the
>> computation of means, variances, linear regression models,
>> &etc. (After all, how would anyone really validate what,
>> say, PROC NLMIXED yields in a complex growth-curve application?)
Too bad that can't be edited just a bit:
(1) OTS (Off the Shelf software) developed by a group having both a
long history of creating high-quality products and good QA which
doesn't need extensive from-scratch validation, only validation of
simpler routines like the computation of means, variances, linear
regression models, etc. (After all, how would anyone really validate
what, say, nlme(), yields in a complex growth-curve application, other
than one of the originators of one of the families of NLME
algorithms?)
best,
-tony
--
A.J. Rossini rossini@u.washington.edu http://software.biostat.washington.edu/
Biostatistics, U Washington and Fred Hutchinson Cancer Research Center
FHCRC:Tu: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW : Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
CONFIDENTIALITY NOTICE: This e-mail message and any attachments ... {{dropped}}
On Fri, 18 Apr 2003 09:50:19 -0400 "Paul, David A" <paulda at battelle.org> wrote: . . .> Now, what would REALLY be nice is if some generous organization > would pay a bunch of programmers to spend 2 - 3 years validating > the algorithms in R, taking the FDA's CFR guidelines into account... > That would truly be a service to everyone. > > > Sincerely, > > David PaulIn re-reviewing FDA guidelines there are no guidelines (either usage or validation guidelines) for statistical analysis software, only guidelines for database management software and software used in medical devices. And the only place where SAS is mentioned is related to data archiving using SAS Version 5 transport files. That (very problematic) format is currently preferred by FDA but experts expect that will be replaced with XML in the not-too-distant future. What is truly important is that data analysts check their work no matter what they are doing. By the way, I know of an important study in which incorrect final efficacy event rates were reported to FDA and in the scientific literature, due to (what I consider) a bug that has been in SAS for 30 years: a missing value is considered to be less than any valid numeric value, so "IF x < y THEN z=1" generates z=1 when x is missing and y is not. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
I agree too. At the end of the day when the cows come home, no amount of
validation will turn a nonsignificant p value to a significant value or
turn a wrong analysis to a right one.
Frank E Harrell Jr <fharrell@virginia.edu>
Sent by: r-help-bounces@stat.math.ethz.ch
04/18/2003 12:56 PM
To: "Paul, David A" <paulda@battelle.org>
cc: rhelp <r-help@stat.math.ethz.ch>
Subject: Re: [R] Validation of R
On Fri, 18 Apr 2003 09:50:19 -0400
"Paul, David A" <paulda@battelle.org> wrote:
. . .> Now, what would REALLY be nice is if some generous organization
> would pay a bunch of programmers to spend 2 - 3 years validating
> the algorithms in R, taking the FDA's CFR guidelines into account...
> That would truly be a service to everyone.
>
>
> Sincerely,
>
> David Paul
In re-reviewing FDA guidelines there are no guidelines (either usage or
validation guidelines) for statistical analysis software, only guidelines
for database management software and software used in medical devices. And
the only place where SAS is mentioned is related to data archiving using
SAS Version 5 transport files. That (very problematic) format is
currently preferred by FDA but experts expect that will be replaced with
XML in the not-too-distant future.
What is truly important is that data analysts check their work no matter
what they are doing.
By the way, I know of an important study in which incorrect final efficacy
event rates were reported to FDA and in the scientific literature, due to
(what I consider) a bug that has been in SAS for 30 years: a missing value
is considered to be less than any valid numeric value, so "IF x < y THEN
z=1" generates z=1 when x is missing and y is not.
---
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
[[alternate HTML version deleted]]
Like the original poster, I'm in a corporation that interacts with the FDA
(submissions for product approval, and potential for auditing of QC
procedures). I fully expect to be asked to validate R, in some sense,
within the year, maybe two. I have two main comments.
First, I would be interested in participating in a small sub-project
interested in exploring this in very practical ways, such as
1. Documenting resistance or regulatory needs R users are encountering in
this environment, offline from the r-help list,
2. Sharing experiences (what works and what doesn't for assuaging
managers' fears), and
3. If any further validation activities are deemed helpful (such as
additional test cases and describing what the test cases are intended to
test), making sure that these activities are fed back to the R project in a
way that others can leverage them in the future.
If you would also like to participate in this off-line discussion, I will
be happy to collect names and e-mails. Or, if anyone has other ideas or
feels motivated to drive something, feel free to step forward.
Second, just minutes ago I raised this question with our software testor
over lunch. She tests SAS code used to generate reports of clinical trial
results, and other software used to get clinical data into a database. In
retrospect she is a biased sample (of size 1!) because the open-source
software model de-emphasizes the role (and value) of the professional
software testor; nonetheless I thought her comments offer a taste of the
opposition some may encounter. I'll tell you what she said, and then
I'll
offer my impressions; please don't argue with her points, because I already
did!
(A bit of background: we have chosen not to validate SAS procedures, and
we say so in our test documentation. In practice, I think our clinical
reporting rarely strays far from base SAS--99% of our reporting is just
manipulating and tabulating data--and that may be a reason for the
decision.)
In a nutshell, she thought SAS was more trustworthy than R (to the extent
that she thought we should test R's functions) based on two points:
1. SAS has a team of professional software testors who spend their time
coming up with test cases that are as esoteric and odd as they can think of
(within the limits of their specifications). She was not convinced that a
large community of users is sufficient to flush out obscure bugs. In her
view (not surprisingly), software testors will look at software with a
unique eye. (Which I think is true--but an army of users also does pretty
well.)
2. SAS has a long history of quality, and their market niche requires
them to pay close attention to quality. This distinguishes them from
Microsoft, which has little financial incentive to pay close attention to
quality, and does not have a history of quality despite a large group of
professional software testors.
She and I agreed that if one must know for certain that a particular
function works, one must test it or find documentation indicating precisely
how someone else tested it. Fortunately R packages come with test cases,
but they're not usually test cases designed to check a large number of
possible failure mechanisms.
My take on this is as follows:
1. There seem to be two varieties of validation involved here. The first
provides clear assurance that a specific application does a specific thing.
This is what software validation should really be, and no software, not
even SAS, is above this. Then there is "warm and fuzzy" validation
that
offers limited assurance that the software is generally of good quality.
This is subjective, a matter of reputation, and there is no testing or
documentation that can definitively address this ill-defined criterion. A
software package could be excellent, with only one bug, but if your
application hits that bug, you have a problem.
2. I think this thread is mainly addressing the "warm and fuzzy"
validation model. R is going to encounter skepticism among people who
haven't been exposed to it before, especially if they also have not been
exposed to other open-source software (OSS). In my experience, people who
have not been involved in any software development expect corporate support
to lead to quality software ("they have resources!"). We all know
this is
a fallacy, but you can't argue it away, you just have to demonstrate the
software. When they become familiar with it, they'll stop asking for the
warm and fuzzy validation.
If my reading of the situation is correct, then the right response is to
dazzle. The warm-and-fuzzy validation is really an opportunity for a
software demo. Demonstrate the functions you're likely to use, especially
(following Dr. Harrell's advice) using simulation. Then repeat the
simulation but with outliers added, and use robust methods. Read in a CSV
file from a network drive, create some beautiful plots, save the data in
compressed format and document file size (also document the original CSV's
file size), read the data back into a concurrently-running R process and
show it's the same. Install a particularly impressive and esoteric package
that's remotely related to your problem and document what it does.
Generate pseudorandom data using three different generators, from a given
seed, and then reproduce the data. Calculate P(Z <= -20) for Z ~ N(0, 1),
then calculate P(Z > 20) using lower.tail = F.
You will provide only an iota of assurance that a particular future
application will work, but you will have removed all doubt that R is a
serious, rigorous, powerful package. And that addresses the concerns that
may not be voiced, but are underlying.
-Jim Garrett
Becton Dickinson Diagnostic Systems
**********************************************************************
This message is intended only for the designated recipient(s). ... {{dropped}}
In a message dated 4/18/03 1:01:40 PM Eastern Daylight Time, fharrell@virginia.edu writes:> What is truly important is that data analysts check their work no matter > what they are doing. >This has been a fruitful discussion about doing "scientific" research. To the software issues one can add the general issue of reproducability: data availability (even to reviewers), clear description, etc. Following links has a nice write-up on this issue. It is likely to be relevant to a wider audience using statistical methods. <A HREF="http://www.econ.uiuc.edu/~roger/repro.html">http://www.econ.uiuc.edu/~roger/repro.html</A> Incentives in research/academia may also have something to do with this. For this an interesting view (you have to make the connection): <A HREF="http://www.econ.uiuc.edu/~roger/gaps.html">http://www.econ.uiuc.edu/~roger/gaps.html</A> -anupam. [[alternate HTML version deleted]]
In a message dated 4/18/03 6:29:52 PM Eastern Daylight Time, rossini@blindglobe.net writes:> my work on ESS and Noweb (literate statistical practice), as described > in my DSC-2001 conference paper, as well as University of Washington > Biostat TechRep #163 http://www.bepress.com/uwbiostat/paper194/ and > related Chance article with Fritz Leisch (to appear in next issue). >I have recently started to use Sweave, with Noweb. Quite nice. The article in R-news was a good intro. Thanks. [[alternate HTML version deleted]]
In a message dated 4/21/03 3:07:20 PM Eastern Daylight Time, pgilbert@bank-banque-canada.ca writes:> There may be less work involved in doing (un-official) validation than there > is > in advertising how much is actually being done. Perhaps the simplest > approach is > for individuals to put together packages of tests with descriptions that > explain > the extent of the testing which is done, and then submit the packages to > CRAN. >Another suggestion to address the issue of validation in the long-term: 1) "Validation" of scientific research takes the form of peer-review, perhaps it is possible to come-up with a similar (not same) process for publishing software, while maintaining the openness. 2) We can also think of contribution to open source software as academic contribution---it definitely furthers scientific inquiry by providing basic tools for it, quite broadly and across many disciplines. So it may be possible to think about a framework that provides academic/research credit, like peer-reviewed publications, for *publishing* software, may be even using peer-review in a way that allows continuous review, continuous feedback, continuous improvement. Perhaps R community can take a lead in coming up with the framework, or the issues that need to be addressed. 3) Meets Popperian criteria for science (even Lakatos' ): a result (esp difficult numerical optimization) is falsifiable, and may be falsified when a bug is found or a better procedure is implemented. ---Anupam. [[alternate HTML version deleted]]