thr3ads.net - R help - [R] A comment about R: [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Kjetil Halvorsen

2006-Jan-01 14:36 UTC

[R] A comment about R:

Readers of this list might be interested in the following commenta about R.


In a recent report, by Michael N. Mitchell
http://www.ats.ucla.edu/stat/technicalreports/
says about R:


"Perhaps the most notable exception to this discussion is R, a language for
statistical computing and graphics.
R is free to download under the terms of the GNU General Public License (see
http://www.r-project.
org/). Our web site has resources on R and I have tried, sometimes in great
earnest, to learn and understand
R. I have learned and used a number of statistical packages (well over 10)
and a number of programming
languages (over 5), and I regret to say that I have had enormous diffculties
learning and using R. I know
that R has a great fan base composed of skilled and excellent statisticians,
and that includes many people
from the UCLA statistics department. However, I feel like R is not so much
of a statistical package as much
as it is a statistical programming environment that has many new and cutting
edge features. For me learning
R has been very diffcult and I have had a very hard time finding answers to
many questions about using
it. Since the R community tends to be composed of experts deeply enmeshed in
R, I often felt that I was
missing half of the pieces of the puzzle when reading information about the
use of R { it often feels like there
is an assumption that readers are also experts in R. I often found the
documentation for R quite sparse and
many essential terms or constructs were used but not defined or
cross-referenced. While there are mailing
lists regarding R where people can ask questions, there is no offcial
"technical support". Because R is free
and is based on the contributions of the R community, it is extremely
extensible and programmable and I
have been told that it has many cutting edge features, some not available
anywhere else. Although R is free,
it may be more costly in terms of your time to learn, use, and obtain
support for it.
My feeling is that R is much more suited to the sort of statistician who is
oriented towards working
very deeply with it. I think R is the kind of package that you really need
to become immersed in (like a
foreign language) and then need to use on a regular basis. I think that it
is much more diffcult to use it
casually as compared to SAS, Stata or SPSS. But by devoting time and effort
to it would give you access
to a programming environment where you can write R programs and collaborate
with others who are also
using R. Those who are able to access its power, even at an applied level,
would be able to access tools that
may not be found in other packages, but this might come with a serious
investment of time to suffciently
use R and maintain your skills with R."


Kjetil

	[[alternative HTML version deleted]]

Jonathan Baron

2006-Jan-01 15:10 UTC

head link

[R] A comment about R:

On 01/01/06 15:36, Kjetil Halvorsen wrote:> Readers of this list might be interested in the following commenta about R.
> 
> 
> In a recent report, by Michael N. Mitchell
> http://www.ats.ucla.edu/stat/technicalreports/
> says about R: ...
Just a warning to others.  If you go to the site, it asks for
comments, but if you then ask for the LaTeX style file that is
required for sending comments, you get a message saying that the
service does not deal with those outside of UCLA.

Of course I think this is wrong wrong wrong.  It makes some
assumptions about "statisticians" being the ones who use
statistics programs.  But there are some researchers who like to
think of themselves as empirical scientists and who do not have
the kinds of humongous grants required to hire people to do
everything except write grant proposals.  People in these fields
often even do their own data analysis!  Moreover, unlike
statisticians (who consult with a great variety of researchers),
they usually do the same few types of analysis over and over, so
the learning time becomes relatively small, and the other
advantages of R become more compelling.  But I will try
eventually to say this as a comment on the paper itself.

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

Kort, Eric

2006-Jan-02 05:29 UTC

head link

[R] A comment about R:

>Kjetil Halvorsen wrote...
>
>Readers of this list might be interested in the following commenta about R.
>
>
>In a recent report, by Michael N. Mitchell
>http://www.ats.ucla.edu/stat/technicalreports/
>says about R:
>"Perhaps the most notable exception to this discussion is R, a language
for
>statistical computing and graphics.
>-------8<-----------------------------------------

After reading this commentary a couple of times, I can't quite figure 
out if he is damning with faint praise, or praising with faint damnation.

(For example, after observing how many researchers around me approach
statistical analysis, I'd say discouraging "casual" use is a
_feature_.)

-Eric
This email message, including any attachments, is for the so...{{dropped}}

ronggui

2006-Jan-02 16:16 UTC

head link

[R] A comment about R:

That's a good idea.
I will try to give a lexicon on Stata vs R.
	

======= 2006-01-02 23:59:10 伳侜佋佢伬伌佇伵佒佇佇伌伒伬仯伜======>On 1/2/06, Philippe Grosjean <phgrosjean at sciviews.org> wrote:
>> Kort, Eric wrote:
>> >
>> >>Kjetil Halvorsen wrote...
>> >>
>> >>Readers of this list might be interested in the following
commenta about R.
>> >>
>> >>
>> >>In a recent report, by Michael N. Mitchell
>> >>http://www.ats.ucla.edu/stat/technicalreports/
>> >>says about R:
>> >>"Perhaps the most notable exception to this discussion is
R, a language for
>> >>statistical computing and graphics.
>> >>
>> >
>> > -------8<-----------------------------------------
>> >
>> > After reading this commentary a couple of times, I can't quite
figure
>> > out if he is damning with faint praise, or praising with faint
damnation.
>> >
>> > (For example, after observing how many researchers around me
approach
>> > statistical analysis, I'd say discouraging "casual"
use is a _feature_.)
>>
>> There are numerous reasons why people tend to consider R as too
>> complicate for them (or even worse, say peremptively to others that R
is
>> too complicate for them!). But one must decrypt the real reasons behind
>> what they say. Mostly, it is because R imposes to think about the
>> analysis we are doing. As Eric says, it is a _feature_ (well, not
>> discouraging "casual" use, but forcing to think about what we
do, which
>> in turn forces to learn R a little deeper to get results... which in
>> turn may discourage casual users, as an unwanted side-effect).
According
>> to my own experience with teaching to students and to advanced
>> scientists in different environments (academic, industry, etc.), the
>> main basic reason why people are reluctant to use R is lazyness. People
>> are lazy by nature. They like course where they just sit and snooze.
>> Unfortunatelly, this is not the right way to learn R: you have to dwell
>> on the abondant litterature about R and experiment by yourself to
become
>> a good R user. This is the kind of thing people do not like at all!
>> Someone named Dr Brian Ripley wrote once something like:
>> "`They' did write documentation that told you [...], but
`they'
>> can't read it for you."
>>
>> It is already many years that I write and use tools supposed to help
>> beginners to master R: menu/dialog boxes approach, electronic reference
>> cards, graphical object explorer, code tips, completion lists, etc...
>> Everytime I got the same result: either these tools are badly designed
>> because they hide the 'horrible code' those casual users
don't want to
>> see, and they make them *happy bad R users*, or they still force them
to
>> write code and think at what they exactly do (but just help them a
bit),
>> and they make them *good R users, but unhappy, poor, tortured
>> beginners*! So, I tend to agree now: there is probably no way to instil
>> R into lazy and reluctant minds.
>>
>> That said, I think one should interpret Mitchell's paper in a
different
>> way. Obviously, he is an unconditional and happy Stata user (he even
>> wrote a book about graphs programming in Stata). His claim in favor of
>> Stata (versus SAS and SPSS, and also, indirectly, versus R) is to be
>> interpreted the same way as unconditional lovers of Macintoshes or PCs
>> would argue against the other clan. Both architectures are good and
have
>> strengths and weaknesses. Real arguments are more sentimental, and
could
>> resume in: "The more I use it, the more I like it,... and the
aliens are
>> bad, ugly and stupid!" Would this apply to Stata versus R? I
don't know
>> Stata at all, but I imagine it could be the case from what I read in
>> Mitchell's paper...
>
>Probably what is needed is for someone familiar with both Stata and R
>to create a lexicon in the vein of the Octave to R lexicon
>
>   http://cran.r-project.org/doc/contrib/R-and-octave-2.txt
>
>to make it easier for Stata users to understand R.  Ditto for SAS and SPSS.
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
= = = = = = = = = = = = = = = = = = = 			


 

2006-01-03

------
Deparment of Sociology
Fudan University

My new mail addres is ronggui.huang at gmail.com
Blog:http://sociology.yculblog.com

Rau, Roland

2006-Jan-03 10:08 UTC

head link

[R] A comment about R:

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor 
> Grothendieck
> Sent: Monday, January 02, 2006 4:59 PM
> To: Philippe Grosjean
> Cc: Kort, Eric; Kjetil Halvorsen; R-help at stat.math.ethz.ch
> Subject: Re: [R] A comment about R:
>
> 
> Probably what is needed is for someone familiar with both Stata and R
> to create a lexicon in the vein of the Octave to R lexicon
> 
>    http://cran.r-project.org/doc/contrib/R-and-octave-2.txt
> 
> to make it easier for Stata users to understand R.  Ditto for 
> SAS and SPSS.
> 
>IMO this is a very good proposal but I think that the main problem is
not the "translation" of one function in SPSS/Stata/SAS to the
equivalent in R.
Remembering my first contact with R after using SPSS for some years (and
having some experience with Stata and SAS) was that your mental
framework is different. You think in "SPSS-terms" (i.e. you expect
that
data are automatically a rectangular matrix, functions operate on
columns of this matrix, you have always only one dataset available,
...). This is why "jumping" from SPSS to Stata is relatively easy. But
to jump from any of the three to R is much more difficult. 
This mental barrier is also the main obstacle for me now when I try to
encourage the use of R to other people who have a similar background as
I had.
What can be done about it? I guess the only answer is investing time
from the user which implies that R will probably never become the
language of choice for "casual users". But popularity is probably not
the main goal of the R-Project (it would be rather a nice side-effect).

Just a few thoughts ...

Best,
Roland

+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

Philippe Grosjean

2006-Jan-03 10:31 UTC

head link

[R] A comment about R:

Roland,

Yes, indeed, you are perfectly right. The problem is that R richness 
means R complexity: many different data types, "sub-languages" like 
regexp or the formula interface, S3/S4 objects, classical versus lattice 
(versus RGL versus iplots) graphs, etc. During translation of R in 
French, I was thinking of a subset of one or two hundreds of functions 
that would be enough for beginners to start with, and to propose a 
translation of that small subset of the online help in French. This is 
still on my todo list, but I must admit it is not an easy task to decide 
which function should be kept in the subset and which should not!

In fact, that idea could be, perhaps, generalized into the whole online 
help. It would be sufficient to add a flag somewhere (perhaps a keyword) 
telling that page is fundamental and to allow filtering index and pages 
  ("fundamental only" or "full help"). Even for advanced
users, it
should be nice to have such a filter to display only the two or three 
most important functions in a new packages that proposes perhaps hundred 
online help pages...

Using R Commander is also an interesting experiment. R Commander 
simplifies the use of R down to the manipulation of a single data frame 
(the so-called "active dataset") + optionally one or two model
objects.
Just look at all you can do just with one active data frame with R 
Commander, and you will realize that it is perfectly manageable to learn 
R that way.

Best,

Philippe Grosjean

Rau, Roland wrote:>  > -----Original Message-----
> 
>>From: r-help-bounces at stat.math.ethz.ch 
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor 
>>Grothendieck
>>Sent: Monday, January 02, 2006 4:59 PM
>>To: Philippe Grosjean
>>Cc: Kort, Eric; Kjetil Halvorsen; R-help at stat.math.ethz.ch
>>Subject: Re: [R] A comment about R:
>>
>>
>>Probably what is needed is for someone familiar with both Stata and R
>>to create a lexicon in the vein of the Octave to R lexicon
>>
>>   http://cran.r-project.org/doc/contrib/R-and-octave-2.txt
>>
>>to make it easier for Stata users to understand R.  Ditto for 
>>SAS and SPSS.
>>
>>
> 
> IMO this is a very good proposal but I think that the main problem is
> not the "translation" of one function in SPSS/Stata/SAS to the
> equivalent in R.
> Remembering my first contact with R after using SPSS for some years (and
> having some experience with Stata and SAS) was that your mental
> framework is different. You think in "SPSS-terms" (i.e. you
expect that
> data are automatically a rectangular matrix, functions operate on
> columns of this matrix, you have always only one dataset available,
> ...). This is why "jumping" from SPSS to Stata is relatively
easy. But
> to jump from any of the three to R is much more difficult. 
> This mental barrier is also the main obstacle for me now when I try to
> encourage the use of R to other people who have a similar background as
> I had.
> What can be done about it? I guess the only answer is investing time
> from the user which implies that R will probably never become the
> language of choice for "casual users". But popularity is probably
not
> the main goal of the R-Project (it would be rather a nice side-effect).
> 
> Just a few thoughts ...
> 
> Best,
> Roland
> 
> +++++
> This mail has been sent through the MPI for Demographic Rese...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
>

Peter Flom

2006-Jan-03 11:27 UTC

head link

[R] A comment about R:

>>> "Rau, Roland" <Rau at demogr.mpg.de>  >>>
wrote<<<
IMO this is a very good proposal but I think that the main problem is
not the "translation" of one function in SPSS/Stata/SAS to the
equivalent in R.
Remembering my first contact with R after using SPSS for some years (and
having some experience with Stata and SAS) was that your mental
framework is different. You think in "SPSS-terms" (i.e. you expect
that
data are automatically a rectangular matrix, functions operate on
columns of this matrix, you have always only one dataset available,
...). This is why "jumping" from SPSS to Stata is relatively easy. But
to jump from any of the three to R is much more difficult. 
This mental barrier is also the main obstacle for me now when I try to
encourage the use of R to other people who have a similar background as
I had.
What can be done about it? I guess the only answer is investing time
from the user which implies that R will probably never become the
language of choice for "casual users". But popularity is probably not
the main goal of the R-Project (it would be rather a nice
side-effect).>>>>


As someone who uses SAS qutie a bit and R somewhat less, I think Roland 
makes some excellent points.  Going from SPSS to SAS (which I once did)
is like going from Spansih to French.  Going from SAS to R (which I am
trying to do) is like going from English to Chinese.

But it's more than that.  

Beyond the obvious differences in the languages is a difference in how
they are written about;
and how they are improved.  SAS documentation is much lengthier than
R's.  Some people like
the terseness of R's help.  Some like the verboseness of SAS's.  SOme of
this difference is doubtless
due to the fact that SAS is commercial, and pays people to write the
documentation.  I have tremednous
appreciation for the unpaid effort that goes into R, and nothing I say
here should be seen as detracting from that.

As to how they are improved, the fact that R is extended (in part) by
packages written by many many different
people is good, becuase it means that the latest techniques can be
written up, often by the people who
invent the techniques (and, again, I appreciate this tremendously), but
it does mean that a) It is hard to know what
is out there at any given time; b) the styles of pacakages difer
somewhat.

In addition, I think the distinction between 'casual user' and serious
user is something of a false dichotomy.
It's really a continuum, or, probably, several continua, that make R
harder or easier for people to learn.

I like R.  I like it a lot.  I like that it's free.  I like that it's
cutting edge.  I like that it can do amazing graphics.
I like that the code is open.  I like that I can write my own functions
in the same language.  And, again,
I am amazed at the amount of time and effort people put into it.

 But I do think that the link in the original post made some good
points, and the writer
of that post is not the only one who has found R difficult to learn.


Peter

Patrick Burns

2006-Jan-03 15:31 UTC

head link

[R] A comment about R:

I have had an email conversation with the author of the
technical report from which the quote was taken.  I am
formulating a comment to the report that will be posted
with the technical report.

I would be pleased if this thread continued, so I will know
better what I want to say.  Plus I should be able to reference
this thread in the comment.

Regards,

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Rau, Roland wrote:
> > -----Original Message-----
>  
>
>>From: r-help-bounces at stat.math.ethz.ch 
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor 
>>Grothendieck
>>Sent: Monday, January 02, 2006 4:59 PM
>>To: Philippe Grosjean
>>Cc: Kort, Eric; Kjetil Halvorsen; R-help at stat.math.ethz.ch
>>Subject: Re: [R] A comment about R:
>>
>>
>>Probably what is needed is for someone familiar with both Stata and R
>>to create a lexicon in the vein of the Octave to R lexicon
>>
>>   http://cran.r-project.org/doc/contrib/R-and-octave-2.txt
>>
>>to make it easier for Stata users to understand R.  Ditto for 
>>SAS and SPSS.
>>
>>
>>    
>>
>IMO this is a very good proposal but I think that the main problem is
>not the "translation" of one function in SPSS/Stata/SAS to the
>equivalent in R.
>Remembering my first contact with R after using SPSS for some years (and
>having some experience with Stata and SAS) was that your mental
>framework is different. You think in "SPSS-terms" (i.e. you expect
that
>data are automatically a rectangular matrix, functions operate on
>columns of this matrix, you have always only one dataset available,
>...). This is why "jumping" from SPSS to Stata is relatively easy.
But
>to jump from any of the three to R is much more difficult. 
>This mental barrier is also the main obstacle for me now when I try to
>encourage the use of R to other people who have a similar background as
>I had.
>What can be done about it? I guess the only answer is investing time
>from the user which implies that R will probably never become the
>language of choice for "casual users". But popularity is probably
not
>the main goal of the R-Project (it would be rather a nice side-effect).
>
>Just a few thoughts ...
>
>Best,
>Roland
>
>+++++
>This mail has been sent through the MPI for Demographic Rese...{{dropped}}
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
>
>
>  
>

Ben Fairbank

2006-Jan-03 17:42 UTC

head link

[R] A comment about R:

One implicit point in Kjetil's message is the difficulty of learning
enough of R to make its use a natural and desired "first choice
alternative," which I see as the point at which real progress and
learning commence with any new language.  I agree that the long learning
curve is a serious problem, and in the past I have discussed, off list,
with one of the very senior contributors to this list the possibility of
splitting the list into sections for newcomers and for advanced users.
He gave some very cogent reasons for not splitting, such as the
possibility of newcomers' getting bad advice from others only slightly
more advanced than themselves.  And yet I suspect that a newcomers'
section would encourage the kind of mutually helpful collegiality among
newcomers that now characterizes the exchanges of the more experienced
users on this list.  I know that I have occasionally been reluctant to
post issues that seem too elementary or trivial to vex the others on the
list with and so have stumbled around for an hour or so seeking the
solution to a simple problem.  Had I the counsel of others similarly
situated progress might have been far faster.  Have other newcomers or
occasional users had the same experience?

Is it time to reconsider splitting this list into two sections?
Certainly the volume of traffic could justify it.

Ben Fairbank

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kjetil Halvorsen
Sent: Sunday, January 01, 2006 8:37 AM
To: R-help at stat.math.ethz.ch
Subject: [R] A comment about R:


Readers of this list might be interested in the following commenta about
R.


In a recent report, by Michael N. Mitchell
http://www.ats.ucla.edu/stat/technicalreports/
says about R:


"Perhaps the most notable exception to this discussion is R, a language
for statistical computing and graphics. R is free to download under the
terms of the GNU General Public License (see http://www.r-project.
org/). Our web site has resources on R and I have tried, sometimes in
great earnest, to learn and understand R. I have learned and used a
number of statistical packages (well over 10) and a number of
programming languages (over 5), and I regret to say that I have had
enormous diffculties learning and using R. I know that R has a great fan
base composed of skilled and excellent statisticians, and that includes
many people from the UCLA statistics department. However, I feel like R
is not so much of a statistical package as much as it is a statistical
programming environment that has many new and cutting edge features. For
me learning R has been very diffcult and I have had a very hard time
finding answers to many questions about using it. Since the R community
tends to be composed of experts deeply enmeshed in R, I often felt that
I was missing half of the pieces of the puzzle when reading information
about the use of R { it often feels like there is an assumption that
readers are also experts in R. I often found the documentation for R
quite sparse and many essential terms or constructs were used but not
defined or cross-referenced. While there are mailing lists regarding R
where people can ask questions, there is no offcial "technical
support".
Because R is free and is based on the contributions of the R community,
it is extremely extensible and programmable and I have been told that it
has many cutting edge features, some not available anywhere else.
Although R is free, it may be more costly in terms of your time to
learn, use, and obtain support for it. My feeling is that R is much more
suited to the sort of statistician who is oriented towards working very
deeply with it. I think R is the kind of package that you really need to
become immersed in (like a foreign language) and then need to use on a
regular basis. I think that it is much more diffcult to use it casually
as compared to SAS, Stata or SPSS. But by devoting time and effort to it
would give you access to a programming environment where you can write R
programs and collaborate with others who are also using R. Those who are
able to access its power, even at an applied level, would be able to
access tools that may not be found in other packages, but this might
come with a serious investment of time to suffciently use R and maintain
your skills with R."


Kjetil

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Kort, Eric

2006-Jan-03 18:37 UTC

head link

[R] A comment about R:

Berton Gunter writes....> Ummmm....
> 
> I cannot say how easy or hard R is to learn, but in response to the
UCLA> commentary:
> 
> > However, I
> > feel like R
> > is not so much of a statistical package as much as it is a
statistical> > programming environment that has many new and cutting edge
> > features.
> 
> Please note: the first sentence of the Preface of THE Green Book
> (PROGRAMMING WITH DATA: A GUIDE TO THE S LANGUAGE) by John Chambers,
the> inventor of the S Language, explicitly states:
> 
> "S is a programming language and environment for all kinds of
computing> involving data."
> 
> I think this says that R is **not** meant to be a statistical package
in> the
> conventional sense and should not be considered one. As computing
> involving
> data is a complex and frequently messy business on both technical
> (statistics), practical (messy data), and aesthetic (graphics, tables)
> levels, it is perhaps to be expected that "a programming language and
> environment for all kinds of computing involving data"  is complex.
> Personally, I find that (Chambers's next sentence) R's ability
"To
turn> ideas into software, quickly and faithfully," to be a boon.
<snip>
Right.  

So in 2 months I will finish my MD program here in the U.S.  I also have
a master's degree in Epidemiology (in which we used SAS)--but that
hardly qualifies me as statistics expert.  Nonetheless, I have learned
to use R out of necessity without undue difficulty.  So have multiple of
my colleagues around me with MDs, PhDs, and Master's degrees.  We do
mainly microarray analysis, so the availability of a rapidly developing
and customizable toolset (BioC packages) is essential to our work.

And, in the same vein of others' comments, R's "nuts and
bolts"
characteristics make me think, learn, and improve.  And the fear of
getting Ripleyed on the mailing list also makes me think, read, and
improve before submitting half baked questions to the list.

So in sum, I use R because it encourages thoughtful analysis, it is
flexible and extensible, and it is free.  I feel that these are
strengths of the environment, not weaknesses.  So if an individual finds
another tool better suited for their work that is obviously just fine,
but I hardly think these characteristics of R are grounds for criticism,
excellent proposals for evolution of documentation and mailing lists
notwithstanding.

-Eric
This email message, including any attachments, is for the so...{{dropped}}

Berton Gunter

2006-Jan-03 19:26 UTC

head link

[R] R fortunes candidate? (was "A comment about R")

A candidate for the fortunes package? 

(Perhaps the highest honor one can receive: being "verbified" :-) )
>  And the fear of
> getting Ripleyed on the mailing list also makes me think, read, and
> improve before submitting half baked questions to the list.
> 
>	-- Eric Kort


Cheers,
Bert

Bob Green

2006-Jan-04 01:36 UTC

head link

[R] A comment about R:

>Hello,
>Unlike most posts on the R mailing list I feel qualified to comment on 
>this one.  For about 3 months I have been trying to learn use R,  after 
>having  used various versions of SPSS for about  10 years.

I think it is far too simplistic to ascribe non-use of R to laziness.  This 
may well be the case for some, however, I have read 5-6 books on R, waded 
through on-line resources,  read the documentation and asked multiple 
questions via e-mails - and still find even some of the basics very difficult.

There are several reasons for this:

1. For some tasks R is extremely user-unfriendly.  Some comparative examples:

(a) In running a chi-square analysis in SPSS the following syntax is included

/STATISTIC=CHISQ
   /CELLS= COUNT EXPECTED ROW COLUMN TOTAL RESID .

this produces expected and observed counts, row & column percentages, 
residuals, chi-square & Fisher's exact  test + other output.

In R, it is a herculean task to produce similar output . It certainly, 
can't be produced in 2 lines as far as I can tell.

(b)  in SPSS if I want to compare multiple variables by a single dependent 
variable this is readily performed

CROSSTABS
   /TABLES=baserdis  baserenh  basersoc baseradd socbest disbest entbest 
addbest worsdis worsphy by group

I used the chi-square example again, but the same applies for a t-test. I 
started looking into how  to do something similar in R, with the t-test 
command but gave up. R does force the user to take a more considered 
approach to analysis.

(c) To obtain a correlation matrix in R with the correlation & p-value is 
no simple task -

In SPSS this is obtained via:

GET
   FILE='D:\a study\data\dat\key data\master data.sav'.
NONPAR CORR
   /VARIABLES= goodnum badnum good5 bad5 avfreq avdayamt
   /PRINT=KENDALL TWOTAIL
   /MISSING=PAIRWISE .

In R something like this is required -

 > by(mydat, mydat$group, function(x) {
+ nm <- names(x)
+ rho <- matrix(, 6, 2)
+ rho.nm <- matrix(, 6, 2)
+ k <- 1
+ for(i in 2:4) {
+ for(j in (i + 1):5) {
+ x.i <- x[, i]
+ x.j <- x[, j]
+ ct <- cor.test(x.i, x.j, method=c("kendall") , alternative
=c("two-sided"))
+ rho[k, 1] <- ct$estimate
+ rho[k, 2] <- round(ct$p-value, 3)
+ rho.nm[k, ] <- c(nm[i], nm[j])
+ k <- k + 1
+ }
+ }
+ rho <- cbind(as.data.frame(rho.nm), as.data.frame(rho))
+ names(rho) <- c("freq.i", "freq.j", "cor",
"p-value")
+ rho
+ })

2) It is not always clear what the output produced by R, is. The 
Mann-Whitney U-test is a good example. In R, it seems a standardised value 
is obtained. I was advised that it is easy enough to check this as R is 
open-source, but at least for me, I don't believe I would understand this 
code anyway. It is confusing when comparative programs such as R and SPSS 
produce dis-similar results. For the user it is important to be able to 
fairly easily reconcile such differences, to engender confidence in results.

3) I find the help files in R quite difficult to understand.  For example, 
see help(t.test).  It is almost assumed by the examples that you know what 
to do. Personally, I would find some form of simple decision tree easier 
-e.g. If you want to perform a t-test with the dependent variable in one 
column and the dependent use in another use t.test(AVFREQ~GROUP) . If you 
want to perform a t-test with the dependent variable in separate columns 
(each column representing a different group) use - t.test(AVFREQ1, AVFREQ2) .

4) My initial approach to using R, was to run commands I had used commonly 
in SPSS and compare the results. I have only got as far  as basic ANOVA. 
This has been time-consuming and at times it has been difficult to obtain 
advice. Some people on the R list have been extremely generous with their 
time and knowledge, and I have much appreciated this assistance. At other 
times I see responses met  with something like arrogance. With the 
sophistication of R, there is also an elitism.  This is a barrier to R 
being more widely accepted and used.

5) differences in terminology - this is just part of the learning process, 
but I still found it took quite some time to work out simple commands and 
what different analyses were called.

6) system administrators may be wary of freeware.

No doubt for the sophisticated user, my comments may seem trite and easily 
resolved, however I believe my comments have some relevance as to why R is 
not more readily used or accepted.


Bob Green

TEMPL Matthias

2006-Jan-04 08:47 UTC

head link

[R] A comment about R:

Hello,

One additional example how easy are simple calculations in R.

Calculate the mean of data htinches, multiply it with 2.54 and round the result:

In R:
round( 2.54 * mean( htinches ) )

In SAS could this be done in 2 data steps and 2 proc steps:
DATA new; SET old;
htcm = htinches * 2.54;
PROC means; VAR htcm;
output out=new2 mean=htcm;
DATA new2; set new2;
htcm=round(htcm);
PROC fsview; run;

(you can do this also in one data step, but the code would be longer and more(!)
cryptic (or say horrible).
And, of course, you can do this with the help of SAS??s SQL approach, but note
that the syntax
is different (!) (comma??s,...) as the "normal" syntax in a data
step.)

--> useR!

Matthias


> Patrick Burns <pburns at pburns.seanet.com> writes:
> 
> > I have had an email conversation with the author of the technical 
> > report from which the quote was taken.  I am formulating a 
> comment to 
> > the report that will be posted with the technical report.
> > 
> > I would be pleased if this thread continued, so I will know better 
> > what I want to say.  Plus I should be able to reference 
> this thread in 
> > the comment.
> 
> One thing that is often overlooked, and hasn't yet been 
> mentioned in the thread, is how much *simpler* R can be for 
> certain completely basic tasks of practical or pedagogical 
> relevance: Calculate a simple derived statistic, confidence 
> intervals from estimate and SE, percentage points of the 
> binomial distribution - using dbinom or from the formula, 
> take the sum of each of 10 random samples from a set of 
> numbers, etc. This is where other packages get stuck in the
> procedure+dataset mindset.
> 
> For much the same reason, those packages make you tend to 
> treat practical data analysis as something distinct from 
> theoretical understanding of the methods: You just don't use 
> SAS or SPSS or Stata to illustrate the concept of a random 
> sample by setting up a small simulation study as the first 
> thing you do in a statistics class, whereas you could quite 
> conceivably do it in R. (What *is* the equivalent of 
> rnorm(25) in those languages, actually?)
> 
> Even when using SAS in teaching, I sometimes fire up R just 
> to calculate simple things like
> 
>   pbar <- (p1+p2)/2
>   sqrt(pbar*(1-pbar))
> 
> which you need to cheat SAS Analyst's sample size calculator 
> to work with proportions rather than means. SAS leaves you no 
> way to do this short of setting up a new data set. The 
> Windows calculator will do it, of course, but the students 
> can't see what you are doing then.
> 
> 
> -- 
>    O__  ---- Peter Dalgaard             ??ster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: 
> (+45) 35327907
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read 
> the posting guide! http://www.R-project.org/posting-guide.html
>

Alberto Murta

2006-Jan-04 09:25 UTC

head link

[R] A comment about R:

Mensagem original de Patrick Burns (Ter??a, 3 de Janeiro de 2006
19:28):> Wensui Liu wrote:
> >Another big difference between R and other computing language such as
> >SPSS/SAS/STATA.
> >You can easily get a job using SPSS/SAS/STATA. But it is extremely
> > difficult to find a job using R. ^_^.
>
> Actually in finance it is getting easier all the time for
> knowledge of R to be a significant benefit.
>
That is also true in fisheries assessment and modelling (at least in Europe).

-- 
 Alberto G. Murta
Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) 
Av. Brasilia, 1449-006 Lisboa, Portugal
Phone: +351 213027120 | Fax:+351 213015948

Ruben Roa

2006-Jan-04 10:28 UTC

head link

[R] A comment about R:

> -----Original Message-----
> From:	r-help-bounces at stat.math.ethz.ch [SMTP:r-help-bounces at
stat.math.ethz.ch] On Behalf Of David Forrest
> Sent:	Tuesday, January 03, 2006 6:16 PM
> To:	Gabor Grothendieck
> Cc:	Thomas Lumley; R-help at stat.math.ethz.ch; Patrick Burns; Peter
Dalgaard
> Subject:	Re: [R] A comment about R:
> 
> On Tue, 3 Jan 2006, Gabor Grothendieck wrote:
> ...
> > In fact there are some things that are very easy
> > to do in Stata and can be done in R but only with more difficulty.
> > For example, consider this introductory session in Stata:
> >
> > http://www.stata.com/capabilities/session.html
> >
> > Looking at the first few queries,
> > see how easy it is to take the top few in Stata whereas in R one would
> > have a complex use of order.  Its not hard in R to write a function
> > that would make it just as easy but its not available off the top
> > of one's head though RSiteSearch("sort.data.frame") will
find one
> > if one knew what to search for.
> 
> This sort of thing points to an opportunity for documentation.  Building a
> tutorial session in R on how one would do a similar analysis would provide
> another method of learning R.  "An Introduction to R" is a good
bottom-up
> introduction, which if you work through it does teach you how to do
> several things.  Adapting other tutorials or extended problems, like the
> Stata session, to R would give additional entry points.  A few end-to-end
> tutorials on some interesting analyses would be helpful.
> 
> Any volunteers?
> 
> Dave
> -- 
>  Dr. David Forrest
>  drf at vims.edu                                    (804)684-7900w
>  drf5n at maplepark.com                             (804)642-0662h
>                                    http://maplepark.com/~drf5n/
> --------
I am not volunteering but i would like to point out that Paulo Ribeiro's
illustrative session
on package geoR and Ole Christensen's similar document for geoRglm are, IMO,
excellent examples on how to make things easier for beginners.
Ruben

Peter Muhlberger

2006-Jan-04 19:43 UTC

head link

[R] A comment about R:

I'm someone who from time to time comes to R to do applied stats for social
science research.  I think the R language is excellent--much better than
Stata for writing complex statistical programs.  I am thrilled that I can do
complex stats readily in R--sem, maximum likelihood, bootstrapping, some
Bayesian analysis.  I wish I could make R my main statistical package, but
find that a few stats that are important to my work are difficult to find or
produce in R.  Before I list some examples, I recognize that people view R
not as a statistical package but rather as a statistical programming
environment.  That said, however, it seems, from my admittedly limited
perspective, that it would be fairly easy to make a few adjustments to R
that would make it a lot more practical and friendly for a broader range of
people--including people like me who from time to time want to do
statistical programming but more often need to run canned procedures.  I'm
not a statistician, so I don't want to have to learn everything there is to
know about common procedures I use, including how to write them from
scratch.  I want to be able to focus my efforts on more novel problems w/o
reinventing the wheel.  I would also prefer not to have to work through a
couple books on R or S+ to learn how to meet common needs in R.  If R were
extended a bit in the direction of helping people like me, I wonder whether
it would not acquire a much broader audience.  Then again, these may just be
the rantings of someone not sufficiently familiar w/ R or the community of
stat package users--so take my comments w/ a grain of salt.

Some examples of statistics I typically use that are difficult to find and /
or produce or produce in a usefully formatted way in R--

Ex. 1)  Wald tests of linear hypotheses after max. likelihood or even after
a regression.  "Wald" does not even appear in my standard R package on
a
search.  There's no comment in the lm help or optim help about what function
to use for hypothesis tests.  I know that statisticians prefer likelihood
ratio tests, but Wald tests are still useful and indeed crucial for
first-pass analysis.  After searching with Google for some time, I found
several Wald functions in various contributed R packages I did not have
installed.  One confusion was which one would be relevant to my needs.  This
took some time to resolve.  I concluded, perhaps on insufficient evidence,
that package car's Wald test would be most helpful.  To use it, however, one
has to put together a matrix for the hypotheses, which can be arduous for a
many-term regression or a complex hypothesis.  In comparison, in Stata one
simply states the hypothesis in symbolic terms.  I also don't know for
certain that this function in car will work or work properly w/ various
kinds of output, say from lm or from optim.  To be sure, I'd need to run
time-consuming tests comparing it with Stata output or examine the
function's code.  In Stata the test is easy to find, and there's no
uncertainty about where it can be run or its accuracy.  Simply having a
comment or "see also" in lm help or mle or optim help pointing the
user to
the right Wald function would be of enormous help.

Ex. 2) Getting neat output of a regression with Huberized variance matrix.
I frequently have to run regressions w/ robust variances.  In Stata, one
simply adds the word "robust" to the end of the command or
"cluster(cluster.variable)" for a cluster-robust error.  In R, there
are two
functions, robcov and hccm.  I had to run tests to figure out what the
relationship is between them and between them and Stata (robcov w/o cluster
gives hccm's hc0; hccm's hc1 is equivalent to Stata's
'robust' w/o cluster;
etc.).  A single sentence in hccm's help saying something to the effect that
statisticians prefer hc3 for most types of data might save me from having to
scramble through the statistical literature to try to figure out which of
these I should be using.  A few sentences on what the differences are
between these methods would be even better.  Then, there's the problem of
output.  Given that hc1 or hc3 are preferred for non-clustered data, I'd
need to be able to get regression output of the form summary(lm) out of
hccm, for any practical use.  Getting this, however, would require
programming my own function.  Huberized t-stats for regressions are
commonplace needs, an R oriented a little toward more everyday needs would
not require programming of such needs.  Also, I'm not sure yet how well any
of the existing functions handle missing data.

Ex. 3)  I need to do bootstrapping w/ clustered data, again a common
statistical need.  I wasted a good deal of time reading the help contents of
boot and Bootstrap, only to conclude that I'd need to write my own, probably
inefficient, function to bootstrap clustered data if I were to use boot.
It's odd that boot can't handle this more directly.  After more digging,
I
learned that bootcov in package Design would handle the cluster bootstrap
and save the parameters.  I wouldn't have found this if I had not needed
bootcov for another purpose.  Again, maybe a few words in the boot help
saying that 'for clustered data, you could use bootcov or program a function
in boot' would be very helpful.  I still don't know whether I can feed
the
results of bootcov back into functions in the boot package for further
analysis.

My 2 bits for what they're worth,

Peter

Milos Zarkovic

2006-Jan-04 22:31 UTC

head link

[R] A comment about R:

I am just beginning to use R. And I am just clinical endocrinologist, not 
statistician.

R is definitively not for casual user. Learning curve is very steep and 
previous experience in programming is essential. Therefore, some kind of 
menu system is extremely useful. I use combination of R-Commander and 
SciViews which is good, but some more functionality would be nice. On the 
other hand, function returning an object is great, as is simultaneous 
presence of multiple data sets.

Introductory documentation is excellent, both in electronic and paper form 
(books by Verzani, Dalgaard, Venables et al, Maindonald etc - not to forget 
Zoonekynd and The R Graph Gallery). However, package documentation is 
consistently cryptic (written for experts?)  - examples with explanations 
would be nice. I believe that database of packages and methods would help to 
find appropriate package.

This list is impressive. People are knowledgable, opinionated, ready to help 
and to flame you for asking elementary question or asking how to use type 
III SSQ. So, speak softly and carry a beagle. Seriously, sometimes it would 
be quicker just to give an answer, than to flame a poor soul.





Milos Zarkovic



******************************************************
Milos Zarkovic MD, Ph.D.
Associate Professor of Internal Medicine
Institute of Endocrinology
Dr Subotica 13
11000 Beograd
Serbia

Tel +381-63-202-925
Fax +381-11-685-357

Email mzarkov at eunet.yu
******************************************************



----- Original Message ----- 
From: "Kjetil Halvorsen" <kjetilbrinchmannhalvorsen at
gmail.com>
To: <R-help at stat.math.ethz.ch>
Sent: Sunday, January 01, 2006 3:36 PM
Subject: [R] A comment about R:

> Readers of this list might be interested in the following commenta about 
> R.
>
>
> In a recent report, by Michael N. Mitchell
> http://www.ats.ucla.edu/stat/technicalreports/
> says about R:
>
>
> "Perhaps the most notable exception to this discussion is R, a
language
> for
> statistical computing and graphics.
> R is free to download under the terms of the GNU General Public License 
> (see
> http://www.r-project.
> org/). Our web site has resources on R and I have tried, sometimes in 
> great
> earnest, to learn and understand
> R. I have learned and used a number of statistical packages (well over 10)
> and a number of programming
> languages (over 5), and I regret to say that I have had enormous 
> diffculties
> learning and using R. I know
> that R has a great fan base composed of skilled and excellent 
> statisticians,
> and that includes many people
> from the UCLA statistics department. However, I feel like R is not so much
> of a statistical package as much
> as it is a statistical programming environment that has many new and 
> cutting
> edge features. For me learning
> R has been very diffcult and I have had a very hard time finding answers 
> to
> many questions about using
> it. Since the R community tends to be composed of experts deeply enmeshed 
> in
> R, I often felt that I was
> missing half of the pieces of the puzzle when reading information about 
> the
> use of R { it often feels like there
> is an assumption that readers are also experts in R. I often found the
> documentation for R quite sparse and
> many essential terms or constructs were used but not defined or
> cross-referenced. While there are mailing
> lists regarding R where people can ask questions, there is no offcial
> "technical support". Because R is free
> and is based on the contributions of the R community, it is extremely
> extensible and programmable and I
> have been told that it has many cutting edge features, some not available
> anywhere else. Although R is free,
> it may be more costly in terms of your time to learn, use, and obtain
> support for it.
> My feeling is that R is much more suited to the sort of statistician who 
> is
> oriented towards working
> very deeply with it. I think R is the kind of package that you really need
> to become immersed in (like a
> foreign language) and then need to use on a regular basis. I think that it
> is much more diffcult to use it
> casually as compared to SAS, Stata or SPSS. But by devoting time and 
> effort
> to it would give you access
> to a programming environment where you can write R programs and 
> collaborate
> with others who are also
> using R. Those who are able to access its power, even at an applied level,
> would be able to access tools that
> may not be found in other packages, but this might come with a serious
> investment of time to suffciently
> use R and maintain your skills with R."
>
>
> Kjetil
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

McClatchie, Sam (PIRSA-SARDI)

2006-Jan-04 22:42 UTC

head link

[R] A comment about R:

>From: Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>Subject: Re: [R] A comment about R:
>
>One thing that is often overlooked, and hasn't yet been mentioned in
>the thread, is how much *simpler* R can be for certain completely
>basic tasks of practical or pedagogical relevance: Calculate a simple
>derived statistic, confidence intervals from estimate and SE,
>percentage points of the binomial distribution - using dbinom or from
>the formula, take the sum of each of 10 random samples from a set of
>numbers, etc. This is where other packages get stuck in the
>procedure+dataset mindset.
>
Colleagues

I really agree with Peter's comment. Matlab is much the same, in this sense.


I've had a lot of trouble getting people at my lab to take on learning R. It
is interesting to me that the one person who has taken the plunge was
educated in Singapore.  I read recently that on measures of science
performance, Singapore schools do very well.  Perhaps students now expect
science to be easier.

I have found also that it makes it much clearer for people to decide if they
need to learn R to be clear that it is like learning a language. You start
with a few packaged functions (e.g. mean), and move on from there. You have
to take a structured approach to learning it, and need to use it very
frequently. You have to learn the syntax, and just like learning Spanish,
don't  expect to read Vargas Llosa before you can say buenas tardes
correctly. Once they understand that, they can decide if they need R, before
they decide whether to invest in learning it. 

Suerte
Good luck!

Sam
----
Sam McClatchie,
Oceanography subprogram 
South Australian Aquatic Sciences Centre
PO Box 120, Henley Beach 5022
Adelaide, South Australia
email <mcclatchie.sam at saugov.sa.gov.au>
Cellular: 0431 304 497 
Telephone: (61-8) 8207 5448
FAX: (61-8) 8207 5481
Research home page <http://www.members.iinet.net.au/~s.mcclatchie/>
  
                   /\
      ...>><xX(??> 
                //// \\\\
                   <??)Xx><<
              /////  \\\\\\
                        ><(((??> 
  >><(((??>   ...>><xX(??>O<??)Xx><<

Leif Kirschenbaum

2006-Jan-06 03:02 UTC

head link

[R] A comment about R:

A few thoughts about R vs SAS:
I started learning SAS 8 years ago at IBM, I believe it was version 6.10.
I started with R 7 months ago.

Learning curve:
  I think I can do everything in R after 7 months that I could do in SAS after
about 4 years.

Bugs:
  I suffered through several SAS version changes, 7.0, 7.1, 7.2, 8.0, 9.0 (I may
have misquoted some version numbers). Every version change gave me headaches, as
every version release (of an expensive commercially produced software set) had
bugs which upset or crashed previously working code. I had code which ran fine
under Windows 2000 and terribly under Windows XP. Most bugs I found were noted
by SAS, but never fixed.
  With R I have encounted very few bugs, except for an occasional crash of R,
which I usually ascribe to some bug in Windows XP.

Help:
  SAS help was OK. As others have mentioned, there is too much. I even had the
set of printed manuals on my desk (stretching 4 feet or so), which were quote
impenetrable. I had almost no support from colleagues: even within IBM the
number of advanced SAS users was small.
  With R this mailing list has been of great help: almost every issue I copy
some program and save it as a "R hint xxxx" file.
--> A REQUEST
I would say that I would appreciate a few more program examples with the help
pages for some functions. For instance, "?Control" tells me about
"if(cond) cons.expr  else  alt.expr", however an example of
   if(i==1) { print("one") 
   } else if(i==2) { print("two")
   } else if(i>2) { print("bigger than two") }
 at the end of that help section would have been very helpful for me a few
months ago.

Functions:
  Writing my own functions in SAS was by use of macros, and usually depended
heavily on macro substitution. Learning SAS's macro language, especially
macro substitution, was very difficult and it took me years to be able to write
complicated functions. Quite different situation in R. Some functions I have
written by dint of copying code from other people's packages, which has been
very helpful.
  I wanted to generate arbitrary k-values (the k-multiplier of sigma for a given
alpha, beta, and N to establish confidence limits around a mean for small
populations). I had a table from a years old microfiche book giving values but
wanted to generate my own. I had to find the correct integrals to approximate
the k-values and then write two SAS macros which iterated to the desired level
of tolerance to generate values. I would guess that there is either an R base
function or a package which will do this for me (when I need to start generating
AQL tables). Given the utility of these numbers, I was disappointed with SAS.

Data manipulation:
  All SAS data is in 2-dimensional datasets, which was very frustrating after
having used variables, arrays, and matrices in BASIC, APL, FORTRAN, C, Pascal,
and LabVIEW. SAS allows you to access only 1 row of a dataset at a time which
was terribly horribly incomprehensibly frustrating. There were so many many
problems I had to solve where I had to work around this SAS paradigm.
  In R, I can access all the elements of a matrix/dataframe at once, and I can
use >2 dimensional matrices. In fact, the limitations of SAS I had ingrained
from 7.5 years has sometimes made me forget how I can do something so easily in
R, like be able to know when a value in a column of a dataframe changes:
  DF$marker <- DF[1:(nrow(DF)-1),icol] != DF[2:nrow(DF),icol]
This was hard to do in SAS...and even after years it was sometimes buggy,
keeping variable values from previous iterations of a SAS program.
  One very nice advantage with SAS is that after data is saved in libraries,
there is a GUI showing all the libraries and the datasets inside the libraries
with sizes and dates. While we can save Rdata objects in an external file, the
base package doesn't seem to have the same capabilities as SAS.

Graphics:
  SAS graphics were quite mediocre, and generating customized labels was
cumbersome. Porting code from one Windows platform to another produced
unpredictable and sometimes unworkable results.
  It has been easier in R: I anticipate that I will be able to port R Windows
code to *NIX and generate the same graphics.

Batch commands:
  I am working on porting some of my R code to our *NIX server to generate
reports and graphs on a scheduled basis. Although a few at IBM did this with
SAS, I would have found doing this fairly daunting.


-Leif

-----------------------------
 Leif Kirschenbaum, Ph.D.
 Senior Yield Engineer
 Reflectivity
 leif at reflectivity.com

Stefan Eichenberger

2006-Jan-06 14:38 UTC

head link

[R] A comment about R:

I just got into R for most of the Xmas vacations and was about to ask for
helping
pointer on how to get a hold of R when I came across this thread. I've read
through
most it and would like to comment from a novice user point of view. I've a
strong
programming background but limited statistical experience and no knowledge on 
competing packages. I'm working as a senior engineer in electronics.

Yes, the learning curve is steep. Most of the docu is extremely terse. Learning
is mostly from examples (a wiki was proposed in another mail...), documentation
uses no graphical elements at all. So, when it comes to things like xyplot in
lattice: where would I get the concepts behind panels, superpanels, and the
like?

ok., this is steep and terse, but after a while I'll get over it...
That's life.
The general concept is great, things can be expressed very densly: Potential 
is here.... I quickly had 200 lines of my own code together, doing what it
should -
or so I believed.

Next I did:
    matrix<-matrix(1:100, 10, 10)    image(matrix)
    locator()
Great: I can interactively work with my graphs... But then:
    filled.contour(matrix)
    locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize
that fitted.contour() has a color bar to the right and scales x wrongly...

Here is what really shocked me:
> str(bar)`data.frame':   206858 obs. of  12 variables:
 ...> str(mean(bar[,6:12])) Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
 ...> str(sd(bar[,6:12])) Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
 ...> prcomp(bar[,6:12])->foo
> str(foo$x) num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ...
 ...> str(mean(foo$x))
 num -1.07e-13> str(sd(foo$x)) Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
 ...

So, sd returns a vector independent on whether the arguement is a matrix or
data.frame,
but mean reacts differently and returns a vector only against a data.frame?

The problem here is not that this is difficult to learn - the problem is the
complete absense
of a concept. Is a data.frame an 'extended' matrix with columns of
different types or
something different? Since the numeric mean (I expected a vector) is recycled
nicely
when used in a vector context, this makes debugging code close to impossible.
Since
sd returns a vector, things like mean + 4*sd vary sufficiently across the data
elements
that I assume working code... I don't get any warning signal that something
is wrong here.

The point in case is the behavior of locator() on a filled.contour() object:
Things apparently
have been programmed and debugged from example rather than concept.

Now, in another posting I read that all this is a feature to discourge
inexperienced users
from statistics and force you to think before you do things. Whilst I support
this concept
of thinking: Did I miss something in statistics? I was in the believe that mean
and sd were
relatively close to each other conceptually... (here, they are even in different
packages...)

I will continue using R for the time being. But whether I can recommend it to my
work
collegues remains to be seen: How could I ever trust results returned?

I'm still impressed by some of the efficiency, but my trust is deeply
shaken...

--------------------------------------------------------------------------------------------------------
Stefan Eichenberger             mailto:Stefan.Eichenberger@se-kleve.com
--------------------------------------------------------------------------------------------------------
	[[alternative HTML version deleted]]

Stefan Eichenberger

2006-Jan-06 15:18 UTC

head link

[R] A comment about R:

~~~~~~~~~~~~~~~
... blame me for not having sent below message initially in
plain text format. Sorry!
~~~~~~~~~~~~~~~

I just got into R for most of the Xmas vacations and was about to ask 
for helping  pointer on how to get a hold of R when I came across this 
thread. I've read through  most it and would like to comment from a 
novice user point of view. I've a strong  programming background but 
limited statistical experience and no knowledge on  competing packages. 
I'm working as a senior engineer in electronics.

Yes, the learning curve is steep. Most of the docu is extremely terse. 
Learning is mostly from examples (a wiki was proposed in another 
mail...), documentation uses no graphical elements at all. So, when it 
comes to things like xyplot in lattice: where would I get the concepts 
behind panels, superpanels, and the like?

ok., this is steep and terse, but after a while I'll get over it... 
That's life. The general concept is great, things can be expressed very 
densly: Potential  is here.... I quickly had 200 lines of my own code 
together, doing what it should -  or so I believed.

Next I did:
  matrix<-matrix(1:100, 10, 10)
  image(matrix)
  locator()
Great: I can interactively work with my graphs... But then:
  filled.contour(matrix)
  locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't 
realize that fitted.contour() has a color bar to the right and scales x 
wrongly...

Here is what really shocked me:
> str(bar) `data.frame':   206858 obs. of  12 variables:  ...
> str(mean(bar[,6:12]))  Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
  ...> str(sd(bar[,6:12]))  Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
  ...> prcomp(bar[,6:12])->foo
> str(foo$x)  num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ...
  ...> str(mean(foo$x))
  num -1.07e-13> str(sd(foo$x))  Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
  ...

So, sd returns a vector independent on whether the arguement is a matrix 
or data.frame, but mean reacts differently and returns a vector only 
against a data.frame?

The problem here is not that this is difficult to learn - the problem is 
the complete absense of a concept. Is a data.frame an 'extended' matrix 
with columns of different types or  something different? Since the 
numeric mean (I expected a vector) is recycled nicely  when used in a 
vector context, this makes debugging code close to impossible. Since  sd 
returns a vector, things like mean + 4*sd vary sufficiently across the 
data elements that I assume working code... I don't get any warning 
signal that something is wrong here.

The point in case is the behavior of locator() on a filled.contour() 
object: Things apparently  have been programmed and debugged from 
example rather than concept.

Now, in another posting I read that all this is a feature to discourge 
inexperienced users from statistics and force you to think before you do 
things. Whilst I support this concept of thinking: Did I miss something 
in statistics? I was in the believe that mean and sd were relatively 
close to each other conceptually... (here, they are even in different 
packages...)

I will continue using R for the time being. But whether I can recommend 
it to my work  collegues remains to be seen: How could I ever trust 
results returned?

I'm still impressed by some of the efficiency, but my trust is deeply 
shaken...
-----------------------------------------------------------------------
Stefan Eichenberger        mailto:Stefan.Eichenberger at se-kleve.com

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jan 2006 - A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] R fortunes candidate? (was "A comment about R")

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

[R] A comment about R:

Possibly Parallel Threads