Hello All, I am new to R, and I am writing to seek your advice on how best to use it to run R's various normality tests in an automated way. In a nutshell, my situation is as follows. I work in an investment bank, and my team and I are concerned that the assumption we make in our models that the returns of assets are normally distributed may not be justified for certain asset classes. We are keen to check this statistically. To this end, we have an Excel document which contains historical data on the returns of the asset classes we want to investigate, and we would like to run R's multiple normality tests on these data to check whether any asset classes are flagged up as being statistically non-normal. I see from the R documentation that there are several R commands to test for this, but is it possible to progamme a tool which can (i) convert the Excel data into a format which R can read, then (ii) run all the relevant tests from R, then (iii) compare the results (such as the p-values) with a user-defined benchmark, and (iv) output a file which shows for each asset class, which tests reveal that the null hypothesis of normality is rejected? My team and I would be very grateful for your advice on this. Yours sincerely, Alex.
Alexandre Christie wrote:> Hello All, > > I am new to R, and I am writing to seek your advice on how best to use it to run > R's various normality tests in an automated way. > > In a nutshell, my situation is as follows. I work in an investment bank, and my > team and I are concerned that the assumption we make in our models that the > returns of assets are normally distributed may not be justified for certain > asset classes. We are keen to check this statistically. > > To this end, we have an Excel document which contains historical data on the > returns of the asset classes we want to investigate, and we would like to run > R's multiple normality tests on these data to check whether any asset classes > are flagged up as being statistically non-normal. > > I see from the R documentation that there are several R commands to test for > this, but is it possible to progamme a tool which can (i) convert the Excel data > into a format which R can read, then (ii) run all the relevant tests from R, > then (iii) compare the results (such as the p-values) with a user-defined > benchmark, and (iv) output a file which shows for each asset class, which tests > reveal that the null hypothesis of normality is rejected? > > My team and I would be very grateful for your advice on this. > > Yours sincerely, > > Alex.Alex it would be good to work closely with a local statistician. Models do not require raw data to be normally distributed, only residuals or conditional distributions. And there are many models that don't require strong distributional assumptions (e.g. Cox regression, proportional odds model, transform-both-sides generalized additive models). Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Hi Alexandre, Alexandre Christie wrote:> > I am new to R, and I am writing to seek your advice on how best to use it to run > R's various normality tests in an automated way. > > In a nutshell, my situation is as follows. I work in an investment bank, and my > team and I are concerned that the assumption we make in our models that the > returns of assets are normally distributed may not be justified for certain > asset classes. We are keen to check this statistically. > > To this end, we have an Excel document which contains historical data on the > returns of the asset classes we want to investigate, and we would like to run > R's multiple normality tests on these data to check whether any asset classes > are flagged up as being statistically non-normal. > > I see from the R documentation that there are several R commands to test for > this, but is it possible to progamme a tool which can (i) convert the Excel data > into a format which R can read, then (ii) run all the relevant tests from R, > then (iii) compare the results (such as the p-values) with a user-defined > benchmark, and (iv) output a file which shows for each asset class, which tests > reveal that the null hypothesis of normality is rejected?The short answer is `yes, this is perfectly possible' by putting all the pieces in an R script file and sourcing it or processing it in batch mode. ad (i): there are several ways of accessing Excel files. Using RODBC is one of them. Section 8 of the R Data Import / Export gives an overview of all options. ad (ii): this is a matter of conducting the tests and storing the test results in appropriate data structure ad (iii): straightforward ad (iv): you did not specify> > My team and I would be very grateful for your advice on this. > > Yours sincerely, > > Alex. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
I'm sorry, I pressed a wrong button and sent an incomplete answer. Below follows the completed e-mail.> I am new to R, and I am writing to seek your advice on how best to use it to run > R's various normality tests in an automated way. > > In a nutshell, my situation is as follows. I work in an investment bank, and my > team and I are concerned that the assumption we make in our models that the > returns of assets are normally distributed may not be justified for certain > asset classes. We are keen to check this statistically. > > To this end, we have an Excel document which contains historical data on the > returns of the asset classes we want to investigate, and we would like to run > R's multiple normality tests on these data to check whether any asset classes > are flagged up as being statistically non-normal. > > I see from the R documentation that there are several R commands to test for > this, but is it possible to progamme a tool which can (i) convert the Excel data > into a format which R can read, then (ii) run all the relevant tests from R, > then (iii) compare the results (such as the p-values) with a user-defined > benchmark, and (iv) output a file which shows for each asset class, which tests > reveal that the null hypothesis of normality is rejected?The short answer is `yes, this is perfectly possible' by putting all the pieces in an R script file and sourcing it or processing it in batch mode. ad (i): there are several ways of accessing Excel files. Using RODBC is one of them. Section 8 of the R Data Import / Export gives an overview of all options. http://cran.r-project.org/doc/manuals/R-data.html#Reading-Excel-spreadsheets Here's a simple example for RODBC: library(RODBC) z <- odbcConnectExcel("rexceltest.xls") dd <- sqlFetch(z, "Sheet1") close(z) ad (ii): this is a matter of conducting the tests and storing (what you would like to keep from) the test results in an appropriate data structure. ad (iii): should be straightforward as well. ad (iv): you did not specify the output format, but R could write to a.o. a text file, an HTML file, a LaTeX file and if needed an Excel file. Relevant packages include xtable, R2HTML and rcom. HTH, Tobias P.S. It is always a good idea to define small functions for each step in the process and then use these in the function definition of one big function that would be something like checkAssetNormality(file = "myassets.xls, otherarg1, otherarg2, outfile = "res_myassets.html", outdir = ".") P.P.S. R has very neat and powerful graphical capabilities. It is quite easy to rapidly produce large grids of QQ-plots for all the assets concerned. This would give you additional information about the nature of the deviation from normality.> My team and I would be very grateful for your advice on this. > > Yours sincerely, > > Alex. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >