Nathapong Samlamjiag
2000-Jun-30 11:55 UTC
[R] I have a dream of creating a program on statistical analyses.
<html><DIV id=cdiv> <P>To whom it may concern, <BR><BR>My name is Nathapong Samlamjiag. I am a student from Bangkok, Thailand. I am studying Political Science, of which a few courses concern statistics and research methodology. In Thailand, it is believed that SPSS is the most popular program used to conduct statistical analyses. Although I admit that SPSS is an excellent program capable of performing numerous techniques, I personally believe that SPSS is not researcher-friendly. For example, before researchers conduct a regression analysis, they must first check certain assumptions of regression. Some assumptions are very likely to be violated but can usually be corrected by data transformations. Thus, a regression analysis should be carried out in particular steps: First, relevant assumptions are tested; then, data transformations are conducted; and finally the regression analysis is carried. Because it is usually difficult and inconvenient to test and correct assumptions in SPSS, many (Thai) researchers tend to ignore assumptions of a statistical technique. Ignoring assumptions that may be untrue lead to research conclusions that may be unsound. As far as I know, at present there are no statistical programs that are easy to not only conduct statistical techniques but also check (and, if necessary, correct) the techniques'' assumptions. With this in mind, <STRONG>I have a dream of creating a statistical program </STRONG>capable of helping political-science researchers conduct statistical analysis that will yield valid conclusions. <BR><BR>Therefore, I have searched the Internet to find information about computer programming. I have found that Visual Studio 6 can be used to easily create a destop application and that R language is a programming language for statistical computation and graphics. Accordingly, <STRONG>I have begun dreaming of using Visual Studio 6 and R language to develop the desired program for statistical analyses</STRONG>. <BR><BR>However, I feel doubtful about some issues. Thus, I would like to ask you a few questions and request your invaluable suggestions: <BR><BR>1. I tend to develop my statistical stand-alone program by first using Visual Basic to create a user-friendly graphical user interface and Visual C++ to create a database component and then using R language to conduct statistical computation. Thus, <STRONG>I would like to know whether R-language functions can be called by an application developed by the BASIC and C++ languages? <BR><BR></STRONG>2. I know that R language is available as Free Software under the terms of the Free Software Foundation''s GNU General Public License in source code form. I have read GNU General Public License Version 2, June 1991, but I still do not understand it completely. Thus, I would like to ask you whether my statistical program which calls R-language functions must also be a freeware. Must my statistical program be distributed freely? Can I sell my program? More specifically, <STRONG>I am not sure about what I can do and what I cannot do without infringing on the copyrights</STRONG>. Would you please clarify this concern of mine? <BR><BR>Thank you very much for your R language. I will look forward to your suggestions. <BR><BR>Sincerely yours, <BR>Nathapong Samlamjiag <BR></P></DIV><p><hr>Get Your Private, Free E-mail from MSN Hotmail at <a href="http://www.hotmail.com/">http://www.hotmail.com</a><br></html> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Bill Venables
2000-Jul-01 02:08 UTC
[R] I have a dream of creating a program on statistical analyses.
Nathapong Samlamjiag writes:> My name is Nathapong Samlamjiag. I am a student from Bangkok, > Thailand. I am studying Political Science, of which a few > courses concern statistics and research methodology. In > Thailand, it is believed that SPSS is the most popular program > used to conduct statistical analyses. Although I admit that > SPSS is an excellent program capable of performing ming > numerous techniques, I personally believe that SPSS is not > researcher-friendly.In a sense you are right: many statistical packages attempt to be user-friendly by making it dead easy for people to conduct standard stock analyses, like regression analyses, at the expense of making it tedious, difficult or even impossible to adapt the analysis to suit the realities of the situation under study.> For example, before researchers conduct a regression analysis, > they must first check certain assumptions of regression. Some > assumptions are very likely to be violated but can usually be > corrected by data transformations. Thus, a regression analysis > should be carried out in particular steps: First, relevant > assumptions are tested; then, data transformations are > conducted; and finally the regression analysis is carried.Actually, no, it's much more complicated than that. The important distributional properties are of the residuals after the regression, not of the marginal distribution of the response variable. This is what makes checking the assumptions very tricky: you cannot know beforehand whether an apparent non-normality of the response variable, for example, is due to ignored covariates (at that stage) or to a real violation of assumptions. It's a chicken-and-egg situation requiring sometimes very subtle judgments by the analyst using as much ancillary information about the situation as possible. Transformation of the response (or predictors) is not a universal panacea, either. It might be more appropriate to shift to a different kind of model, such as a generalized linear model, to handle some kinds of distributional properties, particularly if ultimately you need an analysis in the original scale.> Because it is usually difficult and inconvenient to test and > correct assumptions in SPSS, many (Thai) researchers tend to > ignore assumptions of a statistical technique.In that they are certainly not alone!> Ignoring assumptions that may be untrue lead to research > conclusions that may be unsound. As far as I know, at present > there are no statistical programs that are easy to not only > conduct statistical techniques but also check (and, if > necessary, correct) the techniques' assumptions. With this in > mind, I have a dream of creating a statistical program capable > of helping political-science researchers conduct statistical > analysis that will yield valid conclusions.Much as regret having to put a curb on such enthusiasm, I have to suggest you think very carefully about this before sinking too much energy into it. This amounts to a statistical expert system, something that has been tried several times before, always with disappointing results. The consensus seems to be that the statistical contribution to a piece of research is a genuine contribution requiring just as much judgment and creativity as any other part of the work and not something that can be automated (at least not yet). What you are suggesting sounds dangerously like just a different kind of SPSS where instead of the preferences built into that package the researcher gets your prejudices and preferences, which may be a little more elaborate but are in the end just as inflexible. The closest we have come to providing an optimal support system for the data analyst seems in fact to be software environments like R which provide coherent suites of tools that can be used as they are as components or easily extended. The researcher has a number of choices: do a proper course in applied statistics (not just a quickie on methods) and become familiar enough with the real data analysis issues to handle it with the help of an environment like R (or even SPSS for that matter), collaborate with a statistician, employ a consultant or just wing it and hope for the best when it comes to referees.> Therefore, I have searched the Internet to find information > about computer programming. I have found that Visual Studio 6 > can be used to easily create a desktop application and that R > language is a programming language for statistical computation > and graphics.As I point out above, it is that but much more as well. It is a complete software environment for data analysis and graphics that offers about as much support for the analyst as can reasonably be offered without inhibiting the creativity or detracting from the responsibility of the analysis process.> Accordingly, I have begun dreaming of using Visual Studio 6 and > R language to develop the desired program for statistical > analyses. However, I feel doubtful about some issues. Thus, I > would like to ask you a few questions and request your > invaluable suggestions:> 1. I tend to develop my statistical stand-alone program by > first using Visual Basic to create a user-friendly graphical > user interface and Visual C++ to create a database abase > component and then using R language to conduct statistical > computation. Thus, I would like to know whether R-language > functions can be called by an application developed by the > BASIC and C++ languages?This is a matter of some interest quite apart from your project. It is (tangentially) aligned with the work of the Omegahat project. You might like to look at http://www.omegahat.org/> 2. I know that R language is available as Free Software under > the terms of the Free Software Foundation's GNU General Public > License in source code form. I have read GNU General Public > License Version 2, June 1991, but I still do not understand it > completely. Thus, I would like to ask you whether my > statistical program which calls R-language functions must also > be a freeware. Must my statistical program be distributed > freely? Can I sell my program? More specifically, I am not > sure about what I can do and what I cannot do without > infringing on the copyrights. Would you please clarify this > concern of mine? Thank you very much for your R language. I > will look forward to your suggestions.Sorry, I'm no lawyer either... One thing that is clear, though, is if you do issue your code under the GPL as well and make it freely available, you will not violate the conditions. -- Bill Venables, Statistician, CMIS Environmetrics Project CSIRO Marine Labs, PO Box 120, Cleveland, Qld, AUSTRALIA. 4163 Tel: +61 7 3826 7251 Email: Bill.Venables at cmis.csiro.au Fax: +61 7 3826 7304 http://www.cmis.csiro.au/bill.venables/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._