thr3ads.net - R help - [R] Statistical computing [Mar 2003]

If this information is useful, please help other people find it:
Share via:

Tanya Murphy

2003-Mar-28 16:42 UTC

[R] Statistical computing

Hello,

I''ve been trying to familiarize myself with the computing tools of the
trade
(e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with the
individual
programs, but I''m trying to get a better sense of how to integrate
these
tools. I''d like to use scripts and create reports in a more organized
way. Can
anyone recommend books or, better yet free online articles, on this topic? 
Maybe I should be a little more specific about what I do: I''m a
research
assistant in clinical epidemiology doing mainly data management and analysis. 
I do a number of repetitive tasks like updating a research database from the 
original clinic database and other sources, create reports, create graphical 
output for individual patients, as well as work on individual research 
projects. Unfortunately I am not working closely with ''real''
statisticians who
have probably developped good work habits using these tools. Any advice on 
''the big picture'' would be greatly appreciated.

Thanks!

Tanya Murphy

Frank E Harrell Jr

2003-Mar-28 17:43 UTC

head link

[R] Statistical computing

On Fri, 28 Mar 2003 11:42:18 -0500
Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote:
> Hello,
> 
> I''ve been trying to familiarize myself with the computing tools of
the trade
> (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere with
the individual
> programs, but I''m trying to get a better sense of how to integrate
these
> tools. I''d like to use scripts and create reports in a more
organized way. Can
> anyone recommend books or, better yet free online articles, on this topic? 
> Maybe I should be a little more specific about what I do: I''m a
research
> assistant in clinical epidemiology doing mainly data management and
analysis.
> I do a number of repetitive tasks like updating a research database from
the
> original clinic database and other sources, create reports, create
graphical
> output for individual patients, as well as work on individual research 
> projects. Unfortunately I am not working closely with
''real'' statisticians who
> have probably developped good work habits using these tools. Any advice on 
> ''the big picture'' would be greatly appreciated.
> 
> Thanks!
> 
> Tanya Murphy
>
Take a look at the following:

http://hesweb1.med.virginia.edu/biostat/teaching/statcomp/notes.pdf
http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf
http://hesweb1.med.virginia.edu/biostat/teaching/statcomp
http://hesweb1.med.virginia.edu/biostat/presentations/feh/clinreport/dmcreport.pdf

For statistical reports you have chosen well, in considering intergrating R and
LaTeX.  The Alzola-Harrell text also covers a bit about using make and Perl to
run scripts (to get data from SAS to R, run R, etc.).
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Martin Renner

2003-Mar-28 23:02 UTC

head link

[R] trouble with predict.smooth.Pspline

I have problems getting predict.smooth.Pspline (part of the pspline 
package) to work. Here''s an example on artificial data:

########start#########
library (Pspline)
tt <- seq (0,1,length=20)
xt <- tt^3

fit <- smooth.Pspline (tt, xt, norder=3,df=4, method=2)

fit <- smooth.Pspline (tt, xt, norder=3,spar=0.0001, method=1)
plot (tt,xt)
lines(fit)       # so far everything works fine - looks fine at least

predict.smooth.Pspline (fit, tt, nderiv=0)[,1]

####### end ##########

This produces only 20x NaN instead of something similar to xt


At the end I''m trying to get 2nd derivatives of quintic spline fits. 
Does anybody know how to get predict.smooth.Pspline or is there 
another way to do this? Any help would be greatly appreciated. Cheers,

	Martin

david.whiting@ncl.ac.uk

2003-Mar-29 10:08 UTC

head link

[R] Statistical computing

On Fri, Mar 28, 2003 at 12:43:25PM -0500, Frank E Harrell Jr
wrote:> On Fri, 28 Mar 2003 11:42:18 -0500
> Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote:
> 
> > Hello,
> > 
> > I''ve been trying to familiarize myself with the computing
tools of the trade
> > (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere
with the individual
> > programs, but I''m trying to get a better sense of how to
integrate these
> > tools. I''d like to use scripts and create reports in a more
organized way. Can
> > anyone recommend books or, better yet free online articles, on this
topic?
> > Maybe I should be a little more specific about what I do: I''m
a research
> > assistant in clinical epidemiology doing mainly data management and
analysis.
> > I do a number of repetitive tasks like updating a research database
from the
> > original clinic database and other sources, create reports, create
graphical
> > output for individual patients, as well as work on individual research
> > projects. Unfortunately I am not working closely with
''real'' statisticians who
> > have probably developped good work habits using these tools. Any
advice on
> > ''the big picture'' would be greatly appreciated.
> > 
> > Thanks!
> > 
> > Tanya Murphy
> >
> 
> Take a look at the following:
> 
> http://hesweb1.med.virginia.edu/biostat/teaching/statcomp/notes.pdf
> http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf
> http://hesweb1.med.virginia.edu/biostat/teaching/statcomp
>
http://hesweb1.med.virginia.edu/biostat/presentations/feh/clinreport/dmcreport.pdf
> 
> For statistical reports you have chosen well, in considering intergrating R
and LaTeX.  The Alzola-Harrell text also covers a bit about using make and Perl
to run scripts (to get data from SAS to R, run R, etc.).
I have been extremely impressed by the way sweave (combines LaTeX and
R), and RODBC (in my case with MySQL) work together for data
management, reporting, writing stuff and even creating presentations.
I use a LaTeX document class called ''Prosper" that creates PDF
presentations with many of the features and appearance of MS
Powerpoint presentations.

For Sweave, take look at:
http://www.ci.tuwien.ac.at/~leisch/Sweave/Sweave-manual-20021007.pdf
http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf (pages 28-31)

For Prosper take a look at:
http://sourceforge.net/projects/prosper/
and then google for: latex prosper 
and you will find many links to tutorials etc. 


Dave


-- 
Dave Whiting
Dar es Salaam, Tanzania

Tanya Murphy

2003-Mar-31 14:04 UTC

head link

[R] Statistical computing

Thanks to all who have replied to this. I find the advice very encouraging. 
I''ve been reading the recommended links on Sweave and I think it will
answer a
major part of my goals.

As for Perl vs. Python, I don''t know which would be best. I''ve
started out in
Perl because someone got me started with a little Perl program, but
I''ve
looked at Python, too. I''m working in Windows (and that''s not
likely to change
anytime soon--at the office, anyway) and I think WinEdt serves as a good 
enhanced editor for the main applications--LaTex, R and Perl--as well as a way 
to organize the files for a project. The GUI for Pyton seems nice, too, 
though.

Saghir, why do you prefer Python?

Is there a fairly easy way to become SAS-free for data management and 
cleaning? I''m told R is really not ideal for data cleaning. Is this
what RODBC
is about?

Tanya

>===== Original Message From "Bashir Saghir (Aztek Global)" <Saghir.Bashir at UCB-Group.com> ====>Dear
Tanya,>
>Have you considered using Python (www.python.org) instead of Perl? I use
>Python, LaTeX, and R for doing what you describe. My process is evolving and
>cannot recommend it as being the best. Essentially I am moving towards a
>database approach currently using dictionaries in Python. In the longer term
>I plan to switch to MySQL.
>
>In summary I split the problem into bits that link into a relational
>database and use meta data to run my reports. So once the data base is set
>up I only need to give the key information and my programs find all relevant
>information in the database meaning that I never need to modify any programs
>to run a report with new data - just the database.
>
>I don''t know of any references for this bnut if you get any to your
original
>query I would be interested.
>
>Best regards,
>Saghir
>
>> -----Original Message-----
>> From:	Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca]
>> Sent:	Friday, 28 March, 2003 5:42 PM
>> To:	r-help
>> Subject:	[R] Statistical computing
>>
>> Hello,
>>
>> I''ve been trying to familiarize myself with the computing
tools of the
>> trade
>> (e.g. SAS, R, Perl, LaTex) and I''ve been getting somewhere
with the
>> individual
>> programs, but I''m trying to get a better sense of how to
integrate these
>> tools. I''d like to use scripts and create reports in a more
organized way.
>> Can
>> anyone recommend books or, better yet free online articles, on this
topic?
>>
>> Maybe I should be a little more specific about what I do: I''m
a research
>> assistant in clinical epidemiology doing mainly data management and
>> analysis.
>> I do a number of repetitive tasks like updating a research database
from
>> the
>> original clinic database and other sources, create reports, create
>> graphical
>> output for individual patients, as well as work on individual research
>> projects. Unfortunately I am not working closely with
''real'' statisticians
>> who
>> have probably developped good work habits using these tools. Any advice
on
>>
>> ''the big picture'' would be greatly appreciated.
>>
>> Thanks!
>>
>> Tanya Murphy
>>

Bashir Saghir (Aztek Global)

2003-Mar-31 14:23 UTC

head link

[R] Statistical computing

<snip>>Saghir, why do you prefer Python?<snip>

I was thinking about learning Perl many years ago and I asked my system
admin for advice. His enthusiasm for Python steered me away from Perl and
I''ve been hooked since. Basically it is easy to learn and program
development is quick. 
Saghir
> >===== Original Message From "Bashir Saghir (Aztek Global)" 
> <Saghir.Bashir@UCB-Group.com> ====> >Dear Tanya,
> >
> >Have you considered using Python (www.python.org) instead of Perl? I
use
> >Python, LaTeX, and R for doing what you describe. My process is
evolving
> and
> >cannot recommend it as being the best. Essentially I am moving towards
a
> >database approach currently using dictionaries in Python. In the longer
> term
> >I plan to switch to MySQL.
> >
> >In summary I split the problem into bits that link into a relational
> >database and use meta data to run my reports. So once the data base is
> set
> >up I only need to give the key information and my programs find all
> relevant
> >information in the database meaning that I never need to modify any
> programs
> >to run a report with new data - just the database.
> >
> >I don''t know of any references for this bnut if you get any to
your
> original
> >query I would be interested.
> >
> >Best regards,
> >Saghir
> >
> >> -----Original Message-----
> >> From:	Tanya Murphy [SMTP:tmurph6@po-box.mcgill.ca]
> >> Sent:	Friday, 28 March, 2003 5:42 PM
> >> To:	r-help
> >> Subject:	[R] Statistical computing
> >>
> >> Hello,
> >>
> >> I''ve been trying to familiarize myself with the computing
tools of the
> >> trade
> >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting
somewhere with the
> >> individual
> >> programs, but I''m trying to get a better sense of how to
integrate
> these
> >> tools. I''d like to use scripts and create reports in a
more organized
> way.
> >> Can
> >> anyone recommend books or, better yet free online articles, on
this
> topic?
> >>
> >> Maybe I should be a little more specific about what I do:
I''m a
> research
> >> assistant in clinical epidemiology doing mainly data management
and
> >> analysis.
> >> I do a number of repetitive tasks like updating a research
database
> from
> >> the
> >> original clinic database and other sources, create reports, create
> >> graphical
> >> output for individual patients, as well as work on individual
research
> >> projects. Unfortunately I am not working closely with
''real''
> statisticians
> >> who
> >> have probably developped good work habits using these tools. Any
advice
> on
> >>
> >> ''the big picture'' would be greatly appreciated.
> >>
> >> Thanks!
> >>
> >> Tanya Murphy
> >>
> 
> --------------------------------------------------------- 
Legal Notice: This electronic mail and its attachments are intended solely
for the person(s) to whom they are addressed and contain information which
is confidential or otherwise protected from disclosure, except for the
purpose they are intended to. Dissemination, distribution, or reproduction
by anyone other than their intended recipients is prohibited and may be
illegal. If you are not an intended recipient, please immediately inform the
sender and send him/her back the present e-mail and its attachments and
destroy any copies which may be in your possession. 
---------------------------------------------------------

	[[alternate HTML version deleted]]

Pikounis, Bill

2003-Mar-31 14:53 UTC

head link

[R] Statistical computing

Hi Tanya,
You really cannot lose with either Perl or Python.  Either of them, along
with other tools mentioned, will suffice for making your work SAS-free. But
I would also not underestimate R for "data-cleaning"...
> Is there a fairly easy way to become SAS-free for data management and 
> cleaning? I''m told R is really not ideal for data cleaning. 
I must admit that I am always eager to debunk the myth that SAS is (so much)
better than the S language for data management, because to me the myth
mostly points out that many statisticians have never used anything else but
SAS.

Best Regards,
Bill

----------------------------------------
Bill Pikounis, Ph.D.
Biometrics Research Department
Merck Research Laboratories
PO Box 2000, MailDrop RY84-16  
126 E. Lincoln Avenue
Rahway, New Jersey 07065-0900
USA

v_bill_pikounis at merck.com

Phone: 732 594 3913
Fax: 732 594 1565

> -----Original Message-----
> From: Tanya Murphy [mailto:tmurph6 at po-box.mcgill.ca]
> Sent: Monday, March 31, 2003 9:04 AM
> To: Bashir Saghir (Aztek Global); r-help at stat.math.ethz.ch
> Subject: RE: [R] Statistical computing
> 
> 
> Thanks to all who have replied to this. I find the advice 
> very encouraging. 
> I''ve been reading the recommended links on Sweave and I think 
> it will answer a 
> major part of my goals.
> 
> As for Perl vs. Python, I don''t know which would be best. 
> I''ve started out in 
> Perl because someone got me started with a little Perl 
> program, but I''ve 
> looked at Python, too. I''m working in Windows (and that''s
not
> likely to change 
> anytime soon--at the office, anyway) and I think WinEdt 
> serves as a good 
> enhanced editor for the main applications--LaTex, R and 
> Perl--as well as a way 
> to organize the files for a project. The GUI for Pyton seems 
> nice, too, 
> though.
> 
> Saghir, why do you prefer Python?
> 
> Is there a fairly easy way to become SAS-free for data management and 
> cleaning? I''m told R is really not ideal for data cleaning. 
> Is this what RODBC 
> is about?
> 
> Tanya
> 
> 
> >===== Original Message From "Bashir Saghir (Aztek Global)" 
> <Saghir.Bashir at UCB-Group.com> ====> >Dear Tanya,
> >
> >Have you considered using Python (www.python.org) instead of 
> Perl? I use
> >Python, LaTeX, and R for doing what you describe. My process 
> is evolving and
> >cannot recommend it as being the best. Essentially I am 
> moving towards a
> >database approach currently using dictionaries in Python. In 
> the longer term
> >I plan to switch to MySQL.
> >
> >In summary I split the problem into bits that link into a relational
> >database and use meta data to run my reports. So once the 
> data base is set
> >up I only need to give the key information and my programs 
> find all relevant
> >information in the database meaning that I never need to 
> modify any programs
> >to run a report with new data - just the database.
> >
> >I don''t know of any references for this bnut if you get any 
> to your original
> >query I would be interested.
> >
> >Best regards,
> >Saghir
> >
> >> -----Original Message-----
> >> From:	Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca]
> >> Sent:	Friday, 28 March, 2003 5:42 PM
> >> To:	r-help
> >> Subject:	[R] Statistical computing
> >>
> >> Hello,
> >>
> >> I''ve been trying to familiarize myself with the computing
> tools of the
> >> trade
> >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting
somewhere with the
> >> individual
> >> programs, but I''m trying to get a better sense of how to 
> integrate these
> >> tools. I''d like to use scripts and create reports in a 
> more organized way.
> >> Can
> >> anyone recommend books or, better yet free online 
> articles, on this topic?
> >>
> >> Maybe I should be a little more specific about what I do: 
> I''m a research
> >> assistant in clinical epidemiology doing mainly data management
and
> >> analysis.
> >> I do a number of repetitive tasks like updating a research 
> database from
> >> the
> >> original clinic database and other sources, create reports, create
> >> graphical
> >> output for individual patients, as well as work on 
> individual research
> >> projects. Unfortunately I am not working closely with 
> ''real'' statisticians
> >> who
> >> have probably developped good work habits using these 
> tools. Any advice on
> >>
> >> ''the big picture'' would be greatly appreciated.
> >>
> >> Thanks!
> >>
> >> Tanya Murphy
> >>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

------------------------------------------------------------------------------

david.whiting@ncl.ac.uk

2003-Mar-31 15:20 UTC

head link

[R] Statistical computing

On Sat, Mar 29, 2003 at 07:44:47AM -0500, Frank E Harrell Jr wrote:
> David - I am a big fan of Sweave also.  There is some writeup about it
> in one of the documents I referenced.  Next I need to get into MySQL.
> If you stumble across any tutorials for that that are not on
> r-project.org please let me know.  -Frank
Frank, do you mean for SQL in general or specifically related to R
working with SQL/MySQL?  

I think there are many tutorials on SQL and MySQL on the web and the
MySQL manual contains everything you ever wanted to know about MySQL
(and more that you''ll probably hope you never want to know about :).
There are tutorials within the manual that I think are well-written
and clear.

For me the only thing I had to think about in putting it together was
when installing unixODBC (on linux) and even that was not too hard.  I
just had to create the .ini files because I had problems with
ODBCConfig.  

If you know a little SQL then the R Data Import/Export manual and the
RODBC help page are probably all you really need.

Dave

-- 
Dave Whiting
Dar es Salaam, Tanzania

Liaw, Andy

2003-Mar-31 15:21 UTC

head link

[R] Statistical computing

I can''t even be called a novice in either Perl or Python, but...

I believe one of the big virtue of Python is code readability.  IIRC that
was one of the original design goals of Python.  (There is a quote: "Python
is beautiful, but Perl is more fun.")

Cheers,
Andy
> -----Original Message-----
> From: Bashir Saghir (Aztek Global) 
> [mailto:Saghir.Bashir at ucb-group.com]
> Sent: Monday, March 31, 2003 9:24 AM
> To: ''Tanya Murphy''; Bashir Saghir (Aztek Global);
> r-help at stat.math.ethz.ch
> Subject: RE: [R] Statistical computing
> 
> 
> <snip>
> >Saghir, why do you prefer Python?
> <snip>
> 
> I was thinking about learning Perl many years ago and I asked 
> my system
> admin for advice. His enthusiasm for Python steered me away 
> from Perl and
> I''ve been hooked since. Basically it is easy to learn and program
> development is quick. 
> Saghir
> 
> > >===== Original Message From "Bashir Saghir (Aztek
Global)"
> > <Saghir.Bashir at UCB-Group.com> ====> > >Dear Tanya,
> > >
> > >Have you considered using Python (www.python.org) instead 
> of Perl? I use
> > >Python, LaTeX, and R for doing what you describe. My 
> process is evolving
> > and
> > >cannot recommend it as being the best. Essentially I am 
> moving towards a
> > >database approach currently using dictionaries in Python. 
> In the longer
> > term
> > >I plan to switch to MySQL.
> > >
> > >In summary I split the problem into bits that link into a 
> relational
> > >database and use meta data to run my reports. So once the 
> data base is
> > set
> > >up I only need to give the key information and my programs find
all
> > relevant
> > >information in the database meaning that I never need to modify
any
> > programs
> > >to run a report with new data - just the database.
> > >
> > >I don''t know of any references for this bnut if you get
any to your
> > original
> > >query I would be interested.
> > >
> > >Best regards,
> > >Saghir
> > >
> > >> -----Original Message-----
> > >> From:	Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca]
> > >> Sent:	Friday, 28 March, 2003 5:42 PM
> > >> To:	r-help
> > >> Subject:	[R] Statistical computing
> > >>
> > >> Hello,
> > >>
> > >> I''ve been trying to familiarize myself with the 
> computing tools of the
> > >> trade
> > >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting 
> somewhere with the
> > >> individual
> > >> programs, but I''m trying to get a better sense of
how to
> integrate
> > these
> > >> tools. I''d like to use scripts and create reports in
a
> more organized
> > way.
> > >> Can
> > >> anyone recommend books or, better yet free online 
> articles, on this
> > topic?
> > >>
> > >> Maybe I should be a little more specific about what I do:
I''m a
> > research
> > >> assistant in clinical epidemiology doing mainly data 
> management and
> > >> analysis.
> > >> I do a number of repetitive tasks like updating a 
> research database
> > from
> > >> the
> > >> original clinic database and other sources, create 
> reports, create
> > >> graphical
> > >> output for individual patients, as well as work on 
> individual research
> > >> projects. Unfortunately I am not working closely with
''real''
> > statisticians
> > >> who
> > >> have probably developped good work habits using these 
> tools. Any advice
> > on
> > >>
> > >> ''the big picture'' would be greatly
appreciated.
> > >>
> > >> Thanks!
> > >>
> > >> Tanya Murphy
> > >>
> > 
> > 
> --------------------------------------------------------- 
> Legal Notice: This electronic mail and its attachments are 
> intended solely
> for the person(s) to whom they are addressed and contain 
> information which
> is confidential or otherwise protected from disclosure, except for the
> purpose they are intended to. Dissemination, distribution, or 
> reproduction
> by anyone other than their intended recipients is prohibited 
> and may be
> illegal. If you are not an intended recipient, please 
> immediately inform the
> sender and send him/her back the present e-mail and its 
> attachments and
> destroy any copies which may be in your possession. 
> ---------------------------------------------------------
> 
> 	[[alternate HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
------------------------------------------------------------------------------

Frank E Harrell Jr

2003-Mar-31 15:39 UTC

head link

[R] Statistical computing

On Mon, 31 Mar 2003 09:04:25 -0500
Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote:
> Thanks to all who have replied to this. I find the advice very encouraging.
> I''ve been reading the recommended links on Sweave and I think it
will answer a
> major part of my goals.
> 
> As for Perl vs. Python, I don''t know which would be best.
I''ve started out in
> Perl because someone got me started with a little Perl program, but
I''ve
> looked at Python, too. I''m working in Windows (and that''s
not likely to change
> anytime soon--at the office, anyway) and I think WinEdt serves as a good 
> enhanced editor for the main applications--LaTex, R and Perl--as well as a
way
> to organize the files for a project. The GUI for Pyton seems nice, too, 
> though.
> 
> Saghir, why do you prefer Python?
> 
> Is there a fairly easy way to become SAS-free for data management and 
> cleaning? I''m told R is really not ideal for data cleaning. Is
this what RODBC
> is about?
> 
> Tanya
The S language is actually better than SAS for data manipulation unless you have
a massive database.  The trouble is that you don''t learn data
manipulation by looking at documention of individual functions.  Chapter 4 of
Alzola and Harrell has attempted to provide several data manipulation/variable
recoding examples.

The main reason I''m confident in saying that S is better in what many
people say SAS is best at is that many manipulation and recoding tasks benefit
greatly from vector operations across multiple codes within a variable. 
Contrast this with multiple IF statements required in many SAS applications.

One feature of SAS that is frequently used for data manipulation is BY with
FIRST.variable and LAST.variable.  As seen in the examples I mentioned above,
you handle this in a completely different way in S (using lags, aggregation
functions, or for loops).
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Stephen C. Upton

2003-Mar-31 15:54 UTC

head link

[R] Statistical computing

Hi Tanya,

I would also like to second Bill''s comment on not underestimating R for
"data
cleaning". I have had great success with simple R scripts and functions for
parsing data that I''ve abandoned my use of Python - not that there is
anything
wrong with Python! It''s just that I can do all that I need to do in
selecting
subsets, etc. with R that I found no need for another, supplemental language -
not
to mention the extra learning curve. FWIW, if you have some rather large files
(GB''s), get lots of memory!

HTH
steve

"Pikounis, Bill" wrote:
> Hi Tanya,
> You really cannot lose with either Perl or Python.  Either of them, along
> with other tools mentioned, will suffice for making your work SAS-free. But
> I would also not underestimate R for "data-cleaning"...
>
> > Is there a fairly easy way to become SAS-free for data management and
> > cleaning? I''m told R is really not ideal for data cleaning.
>
> I must admit that I am always eager to debunk the myth that SAS is (so
much)
> better than the S language for data management, because to me the myth
> mostly points out that many statisticians have never used anything else but
> SAS.
>
> Best Regards,
> Bill
>
> ----------------------------------------
> Bill Pikounis, Ph.D.
> Biometrics Research Department
> Merck Research Laboratories
> PO Box 2000, MailDrop RY84-16
> 126 E. Lincoln Avenue
> Rahway, New Jersey 07065-0900
> USA
>
> v_bill_pikounis at merck.com
>
> Phone: 732 594 3913
> Fax: 732 594 1565
>
> > -----Original Message-----
> > From: Tanya Murphy [mailto:tmurph6 at po-box.mcgill.ca]
> > Sent: Monday, March 31, 2003 9:04 AM
> > To: Bashir Saghir (Aztek Global); r-help at stat.math.ethz.ch
> > Subject: RE: [R] Statistical computing
> >
> >
> > Thanks to all who have replied to this. I find the advice
> > very encouraging.
> > I''ve been reading the recommended links on Sweave and I think
> > it will answer a
> > major part of my goals.
> >
> > As for Perl vs. Python, I don''t know which would be best.
> > I''ve started out in
> > Perl because someone got me started with a little Perl
> > program, but I''ve
> > looked at Python, too. I''m working in Windows (and
that''s not
> > likely to change
> > anytime soon--at the office, anyway) and I think WinEdt
> > serves as a good
> > enhanced editor for the main applications--LaTex, R and
> > Perl--as well as a way
> > to organize the files for a project. The GUI for Pyton seems
> > nice, too,
> > though.
> >
> > Saghir, why do you prefer Python?
> >
> > Is there a fairly easy way to become SAS-free for data management and
> > cleaning? I''m told R is really not ideal for data cleaning.
> > Is this what RODBC
> > is about?
> >
> > Tanya
> >
> >
> > >===== Original Message From "Bashir Saghir (Aztek
Global)"
> > <Saghir.Bashir at UCB-Group.com> ====> > >Dear Tanya,
> > >
> > >Have you considered using Python (www.python.org) instead of
> > Perl? I use
> > >Python, LaTeX, and R for doing what you describe. My process
> > is evolving and
> > >cannot recommend it as being the best. Essentially I am
> > moving towards a
> > >database approach currently using dictionaries in Python. In
> > the longer term
> > >I plan to switch to MySQL.
> > >
> > >In summary I split the problem into bits that link into a
relational
> > >database and use meta data to run my reports. So once the
> > data base is set
> > >up I only need to give the key information and my programs
> > find all relevant
> > >information in the database meaning that I never need to
> > modify any programs
> > >to run a report with new data - just the database.
> > >
> > >I don''t know of any references for this bnut if you get
any
> > to your original
> > >query I would be interested.
> > >
> > >Best regards,
> > >Saghir
> > >
> > >> -----Original Message-----
> > >> From:      Tanya Murphy [SMTP:tmurph6 at po-box.mcgill.ca]
> > >> Sent:      Friday, 28 March, 2003 5:42 PM
> > >> To:        r-help
> > >> Subject:   [R] Statistical computing
> > >>
> > >> Hello,
> > >>
> > >> I''ve been trying to familiarize myself with the
computing
> > tools of the
> > >> trade
> > >> (e.g. SAS, R, Perl, LaTex) and I''ve been getting
somewhere with the
> > >> individual
> > >> programs, but I''m trying to get a better sense of
how to
> > integrate these
> > >> tools. I''d like to use scripts and create reports in
a
> > more organized way.
> > >> Can
> > >> anyone recommend books or, better yet free online
> > articles, on this topic?
> > >>
> > >> Maybe I should be a little more specific about what I do:
> > I''m a research
> > >> assistant in clinical epidemiology doing mainly data
management and
> > >> analysis.
> > >> I do a number of repetitive tasks like updating a research
> > database from
> > >> the
> > >> original clinic database and other sources, create reports,
create
> > >> graphical
> > >> output for individual patients, as well as work on
> > individual research
> > >> projects. Unfortunately I am not working closely with
> > ''real'' statisticians
> > >> who
> > >> have probably developped good work habits using these
> > tools. Any advice on
> > >>
> > >> ''the big picture'' would be greatly
appreciated.
> > >>
> > >> Thanks!
> > >>
> > >> Tanya Murphy
> > >>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
>
>
------------------------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Douglas Bates

2003-Mar-31 16:12 UTC

head link

[R] Statistical computing

"Bashir Saghir (Aztek Global)" <Saghir.Bashir at ucb-group.com>
writes:
> <snip>
> >Saghir, why do you prefer Python?
> <snip>
> 
> I was thinking about learning Perl many years ago and I asked my system
> admin for advice. His enthusiasm for Python steered me away from Perl and
> I''ve been hooked since. Basically it is easy to learn and program
> development is quick. 
Python ''feels'' very much like R to me (and a bit like Java
too).  Perl
is great for the two-minute hack to accomplish an awkward
transformation but the saying in the Python community is that "Hell is
reading someone else''s perl code".  It could also be said that
"Purgatory is reading your own perl code from more than a few weeks
ago".

I like the IDE for Python.  For me working in Python is similar to
working in R in that I have a window open where I am writing the code
and another interactive window where I can send snippets of the code
for execution, say to test out exactly what the result of some
expression is.  Once you get beyond the initial shock of discovering
that the indentation of code in python determines the lexical grouping
you find that python is a clean, well-structured language.  I don''t
think the same can be said for perl.

Tanya mentioned data management and data cleaning.  I think the
combination of a relational database management system, such as MySQL
or PostgreSQL, and Python and R is very powerful for data cleaning.
Python can be used for sequential processing and for loading the
database.  SQL can be used for examining the structure of the data and
for detecting unusual cases.  R can be used for model fitting and
graphics on the entire data set, if it is not huge, or on a subset, if
it is huge.

Together I find this combination more powerful than SAS or SPSS and
definitely faster.  However, using this combination requires learning
three different languages.

TyagiAnupam@aol.com

2003-Mar-31 17:45 UTC

head link

[R] Statistical computing

In a message dated 3/31/03 10:21:20 AM Eastern Standard Time, 
v_bill_pikounis@merck.com writes:
> I must admit that I am always eager to debunk the myth that SAS is (so
much)
> better than the S language for data management, because to me the myth
> mostly points out that many statisticians have never used anything else but
> SAS.
> 
Depends on the size of the data and what one is trying to do. I tried to 
subset a data.frame when memory.size() showed about 20M, nothing else was 
running other than R, and it reached the memory limit 127M without doing the 
job. The object is about 10M. Any hints?

	[[alternate HTML version deleted]]

Chunlou Yung

2003-Mar-31 17:51 UTC

head link

[R] Statistical computing... Perl, Python, Octave, GAP

My two cents on Perl and Python (and stuff) :)....

Perl was designed to be "easy" to use but not necessarily the easiest
thing
to learn (it''s not hard to get started, nonetheless; just perhaps a bit
hard
to master), while Python was designed to be "obvious" to learn and
write and
read. Perl to scripting language is like C/C++ to compiled language, whereas
Python akin to Java (my feeling anyway). Let me elaborate...

Library Size: As far as I know, Perl has twice as large a library of modules
(about 4000) as Python, though both are certainly continuing to grow. (Like,
you''re probably not going to write your own XML parser.) Don''t
expect
anything in advanced math in either one (why would you? You have R). As for
DB support, both are strong in that regard.

Bioinformatics: There''re also vast volume of codes written in Perl and
Python in bioinformatics (freely) available on this planet, just so you
know--save yourself some time.

Capability: If you must write some code on your own, as opposed to stealing
it from someone else, Perl and Python both can do a good job, as far as
"data cleaning" (be prepared to learn regular expression, though), or
task
automation goes. But...

Flexibility/Readability: Perl is a ridiculously flexible language.
Generally, it''s a good thing for an individual programmer, since he can
do
things however he wants but an issue for a team of programmers and a
headache for a project manager, as it''s rather hard to impose
consistency in
the way people code because there''re so many ways to do it in Perl.

So a side effect of Perl''s flexibility is its readability. Python (like
Java) tends to be more readable--maybe except for Python''s
"print" statement
:)

Speed: Benchmarks generally place Perl faster than Python. But for small
jobs, their speed difference doesn''t matter much. Besides, speed
probably
depends more on how you write your code than what you write it in.

OOP: If you''re going to program extensively, sooner or later,
you''ll
probably run into OOP. Python''s implementation of OOP is pretty natural
(especially if you come from Java or something) and it''s easy to
understand.
Perl has its own unique implementation of OOP--if you''re a Perl
guy/gal, its
implementation is brilliant; if not, it''s absolutely, utterly
queer--like,
for one thing, you could have both procedural codes and OOP codes in same
Perl''s module (this flexibility doesn''t mean you should do
it--probably few
people do, as it will inadvertantly lead to confusion). (But then, S also
has its own implementation of OOP.)

Basically, Perl is a procedural language that can do OOP in its own weird
way; Python is a OOP language that can pretend to be a procedural language.

GUI: If you need to write a GUI, you could do it in Perl or Python (or TCL
as well, yet another scripting language), not that you should--it''s
slow and
clumsy. VB or Java would be a better choice--they''re still slow but not
as
clumsy. If you need speed, perhaps C++ is your only choice. Web-based GUI
would be an option, as long as you don''t inadvertantly expose your
company''s
secret to your competitors via the Web.

Survival: If you''re concerned about whether Perl or Python would ever
go out
of business, my prediction is, they won''t. Python is a continuous
rising
star, continuously luring users away from Perl and capturing many newcomers,
who prefers a cleaner language. But Perl is not going to extinct, just as C
is not going to die any time soon. Geeks tend to like Perl; most other
normal human beings found Python just fine.

Support Group: Both Perl and Python have excellent support groups. If
you''re
not too antisocial, you should have no problem to find help from total
strangers over the Web for most of your daily problems (programming or
otherwise).

-------------

By the way, if by any chance you need something that can do matrix/numerical
computation very, very fast, Octave is a good product too. It''s free.
It''s
as fast as Matlab, faster than R, but slower than C.

And if on some rare (or weird) occasions, you need to do some computatonal
group theory stuff, GAP (Groups, Algorithms and Programming) is for you.
It''s free too. (You can find it at http://www.gap-system.org)

Hope it helps.


--cy

Prof. Brian Ripley

2003-Mar-31 18:13 UTC

head link

[R] Statistical computing

On Mon, 31 Mar 2003 TyagiAnupam at aol.com wrote:
> In a message dated 3/31/03 10:21:20 AM Eastern Standard Time, 
> v_bill_pikounis at merck.com writes:
> 
> > I must admit that I am always eager to debunk the myth that SAS is (so
much)
> > better than the S language for data management, because to me the myth
> > mostly points out that many statisticians have never used anything
else but
> > SAS.
> > 
> 
> Depends on the size of the data and what one is trying to do. I tried to 
> subset a data.frame when memory.size() showed about 20M, nothing else was 
> running other than R, and it reached the memory limit 127M without doing
the
> job. The object is about 10M. Any hints?
Get more memory!  128Mb is very little for a Windows PC these days, and
you can buy many Gb for the price of a SAS licence.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Spencer Graves

2003-Mar-31 18:33 UTC

head link

[R] Statistical computing

What did you use for subsetting?

Spencer Graves

TyagiAnupam at aol.com wrote:> In a message dated 3/31/03 10:21:20 AM Eastern Standard Time, 
> v_bill_pikounis at merck.com writes:
> 
> 
>>I must admit that I am always eager to debunk the myth that SAS is (so
much)
>>better than the S language for data management, because to me the myth
>>mostly points out that many statisticians have never used anything else
but
>>SAS.
>>
> 
> 
> Depends on the size of the data and what one is trying to do. I tried to 
> subset a data.frame when memory.size() showed about 20M, nothing else was 
> running other than R, and it reached the memory limit 127M without doing
the
> job. The object is about 10M. Any hints?
> 
> 	[[alternate HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

R help - Mar 2003 - Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] trouble with predict.smooth.Pspline

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing

[R] Statistical computing... Perl, Python, Octave, GAP

[R] Statistical computing

[R] Statistical computing