Hello, I''ve been trying to familiarize myself with the computing tools of the trade (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the individual programs, but I''m trying to get a better sense of how to integrate these tools. I''d like to use scripts and create reports in a more organized way. Can anyone recommend books or, better yet free online articles, on this topic? Maybe I should be a little more specific about what I do: I''m a research assistant in clinical epidemiology doing mainly data management and analysis. I do a number of repetitive tasks like updating a research database from the original clinic database and other sources, create reports, create graphical output for individual patients, as well as work on individual research projects. Unfortunately I am not working closely with ''real'' statisticians who have probably developped good work habits using these tools. Any advice on ''the big picture'' would be greatly appreciated. Thanks! Tanya Murphy
On Fri, 28 Mar 2003 11:42:18 -0500 Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote:> Hello, > > I''ve been trying to familiarize myself with the computing tools of the trade > (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the individual > programs, but I''m trying to get a better sense of how to integrate these > tools. I''d like to use scripts and create reports in a more organized way. Can > anyone recommend books or, better yet free online articles, on this topic? > Maybe I should be a little more specific about what I do: I''m a research > assistant in clinical epidemiology doing mainly data management and analysis. > I do a number of repetitive tasks like updating a research database from the > original clinic database and other sources, create reports, create graphical > output for individual patients, as well as work on individual research > projects. Unfortunately I am not working closely with ''real'' statisticians who > have probably developped good work habits using these tools. Any advice on > ''the big picture'' would be greatly appreciated. > > Thanks! > > Tanya Murphy >Take a look at the following: http://hesweb1.med.virginia.edu/biostat/teaching/statcomp/notes.pdf http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf http://hesweb1.med.virginia.edu/biostat/teaching/statcomp http://hesweb1.med.virginia.edu/biostat/presentations/feh/clinreport/dmcreport.pdf For statistical reports you have chosen well, in considering intergrating R and LaTeX. The Alzola-Harrell text also covers a bit about using make and Perl to run scripts (to get data from SAS to R, run R, etc.). -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
I have problems getting predict.smooth.Pspline (part of the pspline package) to work. Here''s an example on artificial data: ########start######### library (Pspline) tt <- seq (0,1,length=20) xt <- tt^3 fit <- smooth.Pspline (tt, xt, norder=3,df=4, method=2) fit <- smooth.Pspline (tt, xt, norder=3,spar=0.0001, method=1) plot (tt,xt) lines(fit) # so far everything works fine - looks fine at least predict.smooth.Pspline (fit, tt, nderiv=0)[,1] ####### end ########## This produces only 20x NaN instead of something similar to xt At the end I''m trying to get 2nd derivatives of quintic spline fits. Does anybody know how to get predict.smooth.Pspline or is there another way to do this? Any help would be greatly appreciated. Cheers, Martin
On Fri, Mar 28, 2003 at 12:43:25PM -0500, Frank E Harrell Jr wrote:> On Fri, 28 Mar 2003 11:42:18 -0500 > Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote: > > > Hello, > > > > I''ve been trying to familiarize myself with the computing tools of the trade > > (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the individual > > programs, but I''m trying to get a better sense of how to integrate these > > tools. I''d like to use scripts and create reports in a more organized way. Can > > anyone recommend books or, better yet free online articles, on this topic? > > Maybe I should be a little more specific about what I do: I''m a research > > assistant in clinical epidemiology doing mainly data management and analysis. > > I do a number of repetitive tasks like updating a research database from the > > original clinic database and other sources, create reports, create graphical > > output for individual patients, as well as work on individual research > > projects. Unfortunately I am not working closely with ''real'' statisticians who > > have probably developped good work habits using these tools. Any advice on > > ''the big picture'' would be greatly appreciated. > > > > Thanks! > > > > Tanya Murphy > > > > Take a look at the following: > > http://hesweb1.med.virginia.edu/biostat/teaching/statcomp/notes.pdf > http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf > http://hesweb1.med.virginia.edu/biostat/teaching/statcomp > http://hesweb1.med.virginia.edu/biostat/presentations/feh/clinreport/dmcreport.pdf > > For statistical reports you have chosen well, in considering intergrating R and LaTeX. The Alzola-Harrell text also covers a bit about using make and Perl to run scripts (to get data from SAS to R, run R, etc.).I have been extremely impressed by the way sweave (combines LaTeX and R), and RODBC (in my case with MySQL) work together for data management, reporting, writing stuff and even creating presentations. I use a LaTeX document class called ''Prosper" that creates PDF presentations with many of the features and appearance of MS Powerpoint presentations. For Sweave, take look at: http://www.ci.tuwien.ac.at/~leisch/Sweave/Sweave-manual-20021007.pdf http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf (pages 28-31) For Prosper take a look at: http://sourceforge.net/projects/prosper/ and then google for: latex prosper and you will find many links to tutorials etc. Dave -- Dave Whiting Dar es Salaam, Tanzania
Thanks to all who have replied to this. I find the advice very encouraging. I''ve been reading the recommended links on Sweave and I think it will answer a major part of my goals. As for Perl vs. Python, I don''t know which would be best. I''ve started out in Perl because someone got me started with a little Perl program, but I''ve looked at Python, too. I''m working in Windows (and that''s not likely to change anytime soon--at the office, anyway) and I think WinEdt serves as a good enhanced editor for the main applications--LaTex, R and Perl--as well as a way to organize the files for a project. The GUI for Pyton seems nice, too, though. Saghir, why do you prefer Python? Is there a fairly easy way to become SAS-free for data management and cleaning? I''m told R is really not ideal for data cleaning. Is this what RODBC is about? Tanya>===== Original Message From "Bashir Saghir (Aztek Global)"<Saghir.Bashir at UCB-Group.com> ====>Dear Tanya,> >Have you considered using Python (www.python.org) instead of Perl? I use >Python, LaTeX, and R for doing what you describe. My process is evolving and >cannot recommend it as being the best. Essentially I am moving towards a >database approach currently using dictionaries in Python. In the longer term >I plan to switch to MySQL. > >In summary I split the problem into bits that link into a relational >database and use meta data to run my reports. So once the data base is set >up I only need to give the key information and my programs find all relevant >information in the database meaning that I never need to modify any programs >to run a report with new data - just the database. > >I don''t know of any references for this bnut if you get any to your original >query I would be interested. > >Best regards, >Saghir > >> -----Original Message----- >> From: Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca] >> Sent: Friday, 28 March, 2003 5:42 PM >> To: r-help >> Subject: [R] Statistical computing >> >> Hello, >> >> I''ve been trying to familiarize myself with the computing tools of the >> trade >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the >> individual >> programs, but I''m trying to get a better sense of how to integrate these >> tools. I''d like to use scripts and create reports in a more organized way. >> Can >> anyone recommend books or, better yet free online articles, on this topic? >> >> Maybe I should be a little more specific about what I do: I''m a research >> assistant in clinical epidemiology doing mainly data management and >> analysis. >> I do a number of repetitive tasks like updating a research database from >> the >> original clinic database and other sources, create reports, create >> graphical >> output for individual patients, as well as work on individual research >> projects. Unfortunately I am not working closely with ''real'' statisticians >> who >> have probably developped good work habits using these tools. Any advice on >> >> ''the big picture'' would be greatly appreciated. >> >> Thanks! >> >> Tanya Murphy >>
<snip>>Saghir, why do you prefer Python?<snip> I was thinking about learning Perl many years ago and I asked my system admin for advice. His enthusiasm for Python steered me away from Perl and I''ve been hooked since. Basically it is easy to learn and program development is quick. Saghir> >===== Original Message From "Bashir Saghir (Aztek Global)" > <Saghir.Bashir@UCB-Group.com> ====> >Dear Tanya, > > > >Have you considered using Python (www.python.org) instead of Perl? I use > >Python, LaTeX, and R for doing what you describe. My process is evolving > and > >cannot recommend it as being the best. Essentially I am moving towards a > >database approach currently using dictionaries in Python. In the longer > term > >I plan to switch to MySQL. > > > >In summary I split the problem into bits that link into a relational > >database and use meta data to run my reports. So once the data base is > set > >up I only need to give the key information and my programs find all > relevant > >information in the database meaning that I never need to modify any > programs > >to run a report with new data - just the database. > > > >I don''t know of any references for this bnut if you get any to your > original > >query I would be interested. > > > >Best regards, > >Saghir > > > >> -----Original Message----- > >> From: Tanya Murphy [SMTP:tmurph6@po-box.mcgill.ca] > >> Sent: Friday, 28 March, 2003 5:42 PM > >> To: r-help > >> Subject: [R] Statistical computing > >> > >> Hello, > >> > >> I''ve been trying to familiarize myself with the computing tools of the > >> trade > >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the > >> individual > >> programs, but I''m trying to get a better sense of how to integrate > these > >> tools. I''d like to use scripts and create reports in a more organized > way. > >> Can > >> anyone recommend books or, better yet free online articles, on this > topic? > >> > >> Maybe I should be a little more specific about what I do: I''m a > research > >> assistant in clinical epidemiology doing mainly data management and > >> analysis. > >> I do a number of repetitive tasks like updating a research database > from > >> the > >> original clinic database and other sources, create reports, create > >> graphical > >> output for individual patients, as well as work on individual research > >> projects. Unfortunately I am not working closely with ''real'' > statisticians > >> who > >> have probably developped good work habits using these tools. Any advice > on > >> > >> ''the big picture'' would be greatly appreciated. > >> > >> Thanks! > >> > >> Tanya Murphy > >> > >--------------------------------------------------------- Legal Notice: This electronic mail and its attachments are intended solely for the person(s) to whom they are addressed and contain information which is confidential or otherwise protected from disclosure, except for the purpose they are intended to. Dissemination, distribution, or reproduction by anyone other than their intended recipients is prohibited and may be illegal. If you are not an intended recipient, please immediately inform the sender and send him/her back the present e-mail and its attachments and destroy any copies which may be in your possession. --------------------------------------------------------- [[alternate HTML version deleted]]
Hi Tanya, You really cannot lose with either Perl or Python. Either of them, along with other tools mentioned, will suffice for making your work SAS-free. But I would also not underestimate R for "data-cleaning"...> Is there a fairly easy way to become SAS-free for data management and > cleaning? I''m told R is really not ideal for data cleaning.I must admit that I am always eager to debunk the myth that SAS is (so much) better than the S language for data management, because to me the myth mostly points out that many statisticians have never used anything else but SAS. Best Regards, Bill ---------------------------------------- Bill Pikounis, Ph.D. Biometrics Research Department Merck Research Laboratories PO Box 2000, MailDrop RY84-16 126 E. Lincoln Avenue Rahway, New Jersey 07065-0900 USA v_bill_pikounis at merck.com Phone: 732 594 3913 Fax: 732 594 1565> -----Original Message----- > From: Tanya Murphy [mailto:tmurph6 at po-box.mcgill.ca] > Sent: Monday, March 31, 2003 9:04 AM > To: Bashir Saghir (Aztek Global); r-help at stat.math.ethz.ch > Subject: RE: [R] Statistical computing > > > Thanks to all who have replied to this. I find the advice > very encouraging. > I''ve been reading the recommended links on Sweave and I think > it will answer a > major part of my goals. > > As for Perl vs. Python, I don''t know which would be best. > I''ve started out in > Perl because someone got me started with a little Perl > program, but I''ve > looked at Python, too. I''m working in Windows (and that''s not > likely to change > anytime soon--at the office, anyway) and I think WinEdt > serves as a good > enhanced editor for the main applications--LaTex, R and > Perl--as well as a way > to organize the files for a project. The GUI for Pyton seems > nice, too, > though. > > Saghir, why do you prefer Python? > > Is there a fairly easy way to become SAS-free for data management and > cleaning? I''m told R is really not ideal for data cleaning. > Is this what RODBC > is about? > > Tanya > > > >===== Original Message From "Bashir Saghir (Aztek Global)" > <Saghir.Bashir at UCB-Group.com> ====> >Dear Tanya, > > > >Have you considered using Python (www.python.org) instead of > Perl? I use > >Python, LaTeX, and R for doing what you describe. My process > is evolving and > >cannot recommend it as being the best. Essentially I am > moving towards a > >database approach currently using dictionaries in Python. In > the longer term > >I plan to switch to MySQL. > > > >In summary I split the problem into bits that link into a relational > >database and use meta data to run my reports. So once the > data base is set > >up I only need to give the key information and my programs > find all relevant > >information in the database meaning that I never need to > modify any programs > >to run a report with new data - just the database. > > > >I don''t know of any references for this bnut if you get any > to your original > >query I would be interested. > > > >Best regards, > >Saghir > > > >> -----Original Message----- > >> From: Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca] > >> Sent: Friday, 28 March, 2003 5:42 PM > >> To: r-help > >> Subject: [R] Statistical computing > >> > >> Hello, > >> > >> I''ve been trying to familiarize myself with the computing > tools of the > >> trade > >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the > >> individual > >> programs, but I''m trying to get a better sense of how to > integrate these > >> tools. I''d like to use scripts and create reports in a > more organized way. > >> Can > >> anyone recommend books or, better yet free online > articles, on this topic? > >> > >> Maybe I should be a little more specific about what I do: > I''m a research > >> assistant in clinical epidemiology doing mainly data management and > >> analysis. > >> I do a number of repetitive tasks like updating a research > database from > >> the > >> original clinic database and other sources, create reports, create > >> graphical > >> output for individual patients, as well as work on > individual research > >> projects. Unfortunately I am not working closely with > ''real'' statisticians > >> who > >> have probably developped good work habits using these > tools. Any advice on > >> > >> ''the big picture'' would be greatly appreciated. > >> > >> Thanks! > >> > >> Tanya Murphy > >> > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >------------------------------------------------------------------------------
On Sat, Mar 29, 2003 at 07:44:47AM -0500, Frank E Harrell Jr wrote:> David - I am a big fan of Sweave also. There is some writeup about it > in one of the documents I referenced. Next I need to get into MySQL. > If you stumble across any tutorials for that that are not on > r-project.org please let me know. -FrankFrank, do you mean for SQL in general or specifically related to R working with SQL/MySQL? I think there are many tutorials on SQL and MySQL on the web and the MySQL manual contains everything you ever wanted to know about MySQL (and more that you''ll probably hope you never want to know about :). There are tutorials within the manual that I think are well-written and clear. For me the only thing I had to think about in putting it together was when installing unixODBC (on linux) and even that was not too hard. I just had to create the .ini files because I had problems with ODBCConfig. If you know a little SQL then the R Data Import/Export manual and the RODBC help page are probably all you really need. Dave -- Dave Whiting Dar es Salaam, Tanzania
I can''t even be called a novice in either Perl or Python, but... I believe one of the big virtue of Python is code readability. IIRC that was one of the original design goals of Python. (There is a quote: "Python is beautiful, but Perl is more fun.") Cheers, Andy> -----Original Message----- > From: Bashir Saghir (Aztek Global) > [mailto:Saghir.Bashir at ucb-group.com] > Sent: Monday, March 31, 2003 9:24 AM > To: ''Tanya Murphy''; Bashir Saghir (Aztek Global); > r-help at stat.math.ethz.ch > Subject: RE: [R] Statistical computing > > > <snip> > >Saghir, why do you prefer Python? > <snip> > > I was thinking about learning Perl many years ago and I asked > my system > admin for advice. His enthusiasm for Python steered me away > from Perl and > I''ve been hooked since. Basically it is easy to learn and program > development is quick. > Saghir > > > >===== Original Message From "Bashir Saghir (Aztek Global)" > > <Saghir.Bashir at UCB-Group.com> ====> > >Dear Tanya, > > > > > >Have you considered using Python (www.python.org) instead > of Perl? I use > > >Python, LaTeX, and R for doing what you describe. My > process is evolving > > and > > >cannot recommend it as being the best. Essentially I am > moving towards a > > >database approach currently using dictionaries in Python. > In the longer > > term > > >I plan to switch to MySQL. > > > > > >In summary I split the problem into bits that link into a > relational > > >database and use meta data to run my reports. So once the > data base is > > set > > >up I only need to give the key information and my programs find all > > relevant > > >information in the database meaning that I never need to modify any > > programs > > >to run a report with new data - just the database. > > > > > >I don''t know of any references for this bnut if you get any to your > > original > > >query I would be interested. > > > > > >Best regards, > > >Saghir > > > > > >> -----Original Message----- > > >> From: Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca] > > >> Sent: Friday, 28 March, 2003 5:42 PM > > >> To: r-help > > >> Subject: [R] Statistical computing > > >> > > >> Hello, > > >> > > >> I''ve been trying to familiarize myself with the > computing tools of the > > >> trade > > >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting > somewhere with the > > >> individual > > >> programs, but I''m trying to get a better sense of how to > integrate > > these > > >> tools. I''d like to use scripts and create reports in a > more organized > > way. > > >> Can > > >> anyone recommend books or, better yet free online > articles, on this > > topic? > > >> > > >> Maybe I should be a little more specific about what I do: I''m a > > research > > >> assistant in clinical epidemiology doing mainly data > management and > > >> analysis. > > >> I do a number of repetitive tasks like updating a > research database > > from > > >> the > > >> original clinic database and other sources, create > reports, create > > >> graphical > > >> output for individual patients, as well as work on > individual research > > >> projects. Unfortunately I am not working closely with ''real'' > > statisticians > > >> who > > >> have probably developped good work habits using these > tools. Any advice > > on > > >> > > >> ''the big picture'' would be greatly appreciated. > > >> > > >> Thanks! > > >> > > >> Tanya Murphy > > >> > > > > > --------------------------------------------------------- > Legal Notice: This electronic mail and its attachments are > intended solely > for the person(s) to whom they are addressed and contain > information which > is confidential or otherwise protected from disclosure, except for the > purpose they are intended to. Dissemination, distribution, or > reproduction > by anyone other than their intended recipients is prohibited > and may be > illegal. If you are not an intended recipient, please > immediately inform the > sender and send him/her back the present e-mail and its > attachments and > destroy any copies which may be in your possession. > --------------------------------------------------------- > > [[alternate HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >------------------------------------------------------------------------------
On Mon, 31 Mar 2003 09:04:25 -0500 Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote:> Thanks to all who have replied to this. I find the advice very encouraging. > I''ve been reading the recommended links on Sweave and I think it will answer a > major part of my goals. > > As for Perl vs. Python, I don''t know which would be best. I''ve started out in > Perl because someone got me started with a little Perl program, but I''ve > looked at Python, too. I''m working in Windows (and that''s not likely to change > anytime soon--at the office, anyway) and I think WinEdt serves as a good > enhanced editor for the main applications--LaTex, R and Perl--as well as a way > to organize the files for a project. The GUI for Pyton seems nice, too, > though. > > Saghir, why do you prefer Python? > > Is there a fairly easy way to become SAS-free for data management and > cleaning? I''m told R is really not ideal for data cleaning. Is this what RODBC > is about? > > TanyaThe S language is actually better than SAS for data manipulation unless you have a massive database. The trouble is that you don''t learn data manipulation by looking at documention of individual functions. Chapter 4 of Alzola and Harrell has attempted to provide several data manipulation/variable recoding examples. The main reason I''m confident in saying that S is better in what many people say SAS is best at is that many manipulation and recoding tasks benefit greatly from vector operations across multiple codes within a variable. Contrast this with multiple IF statements required in many SAS applications. One feature of SAS that is frequently used for data manipulation is BY with FIRST.variable and LAST.variable. As seen in the examples I mentioned above, you handle this in a completely different way in S (using lags, aggregation functions, or for loops). -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
Hi Tanya, I would also like to second Bill''s comment on not underestimating R for "data cleaning". I have had great success with simple R scripts and functions for parsing data that I''ve abandoned my use of Python - not that there is anything wrong with Python! It''s just that I can do all that I need to do in selecting subsets, etc. with R that I found no need for another, supplemental language - not to mention the extra learning curve. FWIW, if you have some rather large files (GB''s), get lots of memory! HTH steve "Pikounis, Bill" wrote:> Hi Tanya, > You really cannot lose with either Perl or Python. Either of them, along > with other tools mentioned, will suffice for making your work SAS-free. But > I would also not underestimate R for "data-cleaning"... > > > Is there a fairly easy way to become SAS-free for data management and > > cleaning? I''m told R is really not ideal for data cleaning. > > I must admit that I am always eager to debunk the myth that SAS is (so much) > better than the S language for data management, because to me the myth > mostly points out that many statisticians have never used anything else but > SAS. > > Best Regards, > Bill > > ---------------------------------------- > Bill Pikounis, Ph.D. > Biometrics Research Department > Merck Research Laboratories > PO Box 2000, MailDrop RY84-16 > 126 E. Lincoln Avenue > Rahway, New Jersey 07065-0900 > USA > > v_bill_pikounis at merck.com > > Phone: 732 594 3913 > Fax: 732 594 1565 > > > -----Original Message----- > > From: Tanya Murphy [mailto:tmurph6 at po-box.mcgill.ca] > > Sent: Monday, March 31, 2003 9:04 AM > > To: Bashir Saghir (Aztek Global); r-help at stat.math.ethz.ch > > Subject: RE: [R] Statistical computing > > > > > > Thanks to all who have replied to this. I find the advice > > very encouraging. > > I''ve been reading the recommended links on Sweave and I think > > it will answer a > > major part of my goals. > > > > As for Perl vs. Python, I don''t know which would be best. > > I''ve started out in > > Perl because someone got me started with a little Perl > > program, but I''ve > > looked at Python, too. I''m working in Windows (and that''s not > > likely to change > > anytime soon--at the office, anyway) and I think WinEdt > > serves as a good > > enhanced editor for the main applications--LaTex, R and > > Perl--as well as a way > > to organize the files for a project. The GUI for Pyton seems > > nice, too, > > though. > > > > Saghir, why do you prefer Python? > > > > Is there a fairly easy way to become SAS-free for data management and > > cleaning? I''m told R is really not ideal for data cleaning. > > Is this what RODBC > > is about? > > > > Tanya > > > > > > >===== Original Message From "Bashir Saghir (Aztek Global)" > > <Saghir.Bashir at UCB-Group.com> ====> > >Dear Tanya, > > > > > >Have you considered using Python (www.python.org) instead of > > Perl? I use > > >Python, LaTeX, and R for doing what you describe. My process > > is evolving and > > >cannot recommend it as being the best. Essentially I am > > moving towards a > > >database approach currently using dictionaries in Python. In > > the longer term > > >I plan to switch to MySQL. > > > > > >In summary I split the problem into bits that link into a relational > > >database and use meta data to run my reports. So once the > > data base is set > > >up I only need to give the key information and my programs > > find all relevant > > >information in the database meaning that I never need to > > modify any programs > > >to run a report with new data - just the database. > > > > > >I don''t know of any references for this bnut if you get any > > to your original > > >query I would be interested. > > > > > >Best regards, > > >Saghir > > > > > >> -----Original Message----- > > >> From: Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca] > > >> Sent: Friday, 28 March, 2003 5:42 PM > > >> To: r-help > > >> Subject: [R] Statistical computing > > >> > > >> Hello, > > >> > > >> I''ve been trying to familiarize myself with the computing > > tools of the > > >> trade > > >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the > > >> individual > > >> programs, but I''m trying to get a better sense of how to > > integrate these > > >> tools. I''d like to use scripts and create reports in a > > more organized way. > > >> Can > > >> anyone recommend books or, better yet free online > > articles, on this topic? > > >> > > >> Maybe I should be a little more specific about what I do: > > I''m a research > > >> assistant in clinical epidemiology doing mainly data management and > > >> analysis. > > >> I do a number of repetitive tasks like updating a research > > database from > > >> the > > >> original clinic database and other sources, create reports, create > > >> graphical > > >> output for individual patients, as well as work on > > individual research > > >> projects. Unfortunately I am not working closely with > > ''real'' statisticians > > >> who > > >> have probably developped good work habits using these > > tools. Any advice on > > >> > > >> ''the big picture'' would be greatly appreciated. > > >> > > >> Thanks! > > >> > > >> Tanya Murphy > > >> > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > ------------------------------------------------------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
"Bashir Saghir (Aztek Global)" <Saghir.Bashir at ucb-group.com> writes:> <snip> > >Saghir, why do you prefer Python? > <snip> > > I was thinking about learning Perl many years ago and I asked my system > admin for advice. His enthusiasm for Python steered me away from Perl and > I''ve been hooked since. Basically it is easy to learn and program > development is quick.Python ''feels'' very much like R to me (and a bit like Java too). Perl is great for the two-minute hack to accomplish an awkward transformation but the saying in the Python community is that "Hell is reading someone else''s perl code". It could also be said that "Purgatory is reading your own perl code from more than a few weeks ago". I like the IDE for Python. For me working in Python is similar to working in R in that I have a window open where I am writing the code and another interactive window where I can send snippets of the code for execution, say to test out exactly what the result of some expression is. Once you get beyond the initial shock of discovering that the indentation of code in python determines the lexical grouping you find that python is a clean, well-structured language. I don''t think the same can be said for perl. Tanya mentioned data management and data cleaning. I think the combination of a relational database management system, such as MySQL or PostgreSQL, and Python and R is very powerful for data cleaning. Python can be used for sequential processing and for loading the database. SQL can be used for examining the structure of the data and for detecting unusual cases. R can be used for model fitting and graphics on the entire data set, if it is not huge, or on a subset, if it is huge. Together I find this combination more powerful than SAS or SPSS and definitely faster. However, using this combination requires learning three different languages.
In a message dated 3/31/03 10:21:20 AM Eastern Standard Time, v_bill_pikounis@merck.com writes:> I must admit that I am always eager to debunk the myth that SAS is (so much) > better than the S language for data management, because to me the myth > mostly points out that many statisticians have never used anything else but > SAS. >Depends on the size of the data and what one is trying to do. I tried to subset a data.frame when memory.size() showed about 20M, nothing else was running other than R, and it reached the memory limit 127M without doing the job. The object is about 10M. Any hints? [[alternate HTML version deleted]]
My two cents on Perl and Python (and stuff) :).... Perl was designed to be "easy" to use but not necessarily the easiest thing to learn (it''s not hard to get started, nonetheless; just perhaps a bit hard to master), while Python was designed to be "obvious" to learn and write and read. Perl to scripting language is like C/C++ to compiled language, whereas Python akin to Java (my feeling anyway). Let me elaborate... Library Size: As far as I know, Perl has twice as large a library of modules (about 4000) as Python, though both are certainly continuing to grow. (Like, you''re probably not going to write your own XML parser.) Don''t expect anything in advanced math in either one (why would you? You have R). As for DB support, both are strong in that regard. Bioinformatics: There''re also vast volume of codes written in Perl and Python in bioinformatics (freely) available on this planet, just so you know--save yourself some time. Capability: If you must write some code on your own, as opposed to stealing it from someone else, Perl and Python both can do a good job, as far as "data cleaning" (be prepared to learn regular expression, though), or task automation goes. But... Flexibility/Readability: Perl is a ridiculously flexible language. Generally, it''s a good thing for an individual programmer, since he can do things however he wants but an issue for a team of programmers and a headache for a project manager, as it''s rather hard to impose consistency in the way people code because there''re so many ways to do it in Perl. So a side effect of Perl''s flexibility is its readability. Python (like Java) tends to be more readable--maybe except for Python''s "print" statement :) Speed: Benchmarks generally place Perl faster than Python. But for small jobs, their speed difference doesn''t matter much. Besides, speed probably depends more on how you write your code than what you write it in. OOP: If you''re going to program extensively, sooner or later, you''ll probably run into OOP. Python''s implementation of OOP is pretty natural (especially if you come from Java or something) and it''s easy to understand. Perl has its own unique implementation of OOP--if you''re a Perl guy/gal, its implementation is brilliant; if not, it''s absolutely, utterly queer--like, for one thing, you could have both procedural codes and OOP codes in same Perl''s module (this flexibility doesn''t mean you should do it--probably few people do, as it will inadvertantly lead to confusion). (But then, S also has its own implementation of OOP.) Basically, Perl is a procedural language that can do OOP in its own weird way; Python is a OOP language that can pretend to be a procedural language. GUI: If you need to write a GUI, you could do it in Perl or Python (or TCL as well, yet another scripting language), not that you should--it''s slow and clumsy. VB or Java would be a better choice--they''re still slow but not as clumsy. If you need speed, perhaps C++ is your only choice. Web-based GUI would be an option, as long as you don''t inadvertantly expose your company''s secret to your competitors via the Web. Survival: If you''re concerned about whether Perl or Python would ever go out of business, my prediction is, they won''t. Python is a continuous rising star, continuously luring users away from Perl and capturing many newcomers, who prefers a cleaner language. But Perl is not going to extinct, just as C is not going to die any time soon. Geeks tend to like Perl; most other normal human beings found Python just fine. Support Group: Both Perl and Python have excellent support groups. If you''re not too antisocial, you should have no problem to find help from total strangers over the Web for most of your daily problems (programming or otherwise). ------------- By the way, if by any chance you need something that can do matrix/numerical computation very, very fast, Octave is a good product too. It''s free. It''s as fast as Matlab, faster than R, but slower than C. And if on some rare (or weird) occasions, you need to do some computatonal group theory stuff, GAP (Groups, Algorithms and Programming) is for you. It''s free too. (You can find it at http://www.gap-system.org) Hope it helps. --cy
On Mon, 31 Mar 2003 TyagiAnupam at aol.com wrote:> In a message dated 3/31/03 10:21:20 AM Eastern Standard Time, > v_bill_pikounis at merck.com writes: > > > I must admit that I am always eager to debunk the myth that SAS is (so much) > > better than the S language for data management, because to me the myth > > mostly points out that many statisticians have never used anything else but > > SAS. > > > > Depends on the size of the data and what one is trying to do. I tried to > subset a data.frame when memory.size() showed about 20M, nothing else was > running other than R, and it reached the memory limit 127M without doing the > job. The object is about 10M. Any hints?Get more memory! 128Mb is very little for a Windows PC these days, and you can buy many Gb for the price of a SAS licence. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
What did you use for subsetting? Spencer Graves TyagiAnupam at aol.com wrote:> In a message dated 3/31/03 10:21:20 AM Eastern Standard Time, > v_bill_pikounis at merck.com writes: > > >>I must admit that I am always eager to debunk the myth that SAS is (so much) >>better than the S language for data management, because to me the myth >>mostly points out that many statisticians have never used anything else but >>SAS. >> > > > Depends on the size of the data and what one is trying to do. I tried to > subset a data.frame when memory.size() showed about 20M, nothing else was > running other than R, and it reached the memory limit 127M without doing the > job. The object is about 10M. Any hints? > > [[alternate HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help