Graham Wideman
2007-May-01 04:29 UTC
[R] Concepts question: environment, frame, search path
Folks: I'd appreciate if someone could straighten me out on a few concepts which are described a bit ambiguously in the docs. 1. data.frame: ---------------- Refan p84: 'A data frame is a list of variables of the same length with unique row names, given class "data.frame".' I probably don't need to point out how opaque that is! Anyhow, key question: Some places in the docs seem pretty firm that a data.frame is basically a 2-D array with: a) named rows and b) columns whose items within a column be of uniform data type. Elsewhere, it seems like a data.frame can be a collection of arbitrary variables. 2. environment --------------- Refman p122: "Environments consist of a frame, or collection of named objects, and a pointer to an enclosing environment." Is the "or" here explaining parenthetically that a frame is a collection of named objects, or is separating too alternative structures for an environment? If the former, does this imply that a frame can contain arbitrary variables? And "pointer"? Is that a type of thing in R? 3. R search path; attach() ---------------------------- The R search path appears to hold the list of "collections of data" (my term) that can be accessed by a users' commands. Refman p27 tells that search path can hold items that are data.frame, list, environment or R data file (on disk). Yet R-intro p28 describes attach() as taking a "directory name" argument. What is the concept "directory" in this context? ---------------------- Thanks, Graham
Duncan Murdoch
2007-May-01 11:16 UTC
[R] Concepts question: environment, frame, search path
On 01/05/2007 12:29 AM, Graham Wideman wrote:> Folks: > > I'd appreciate if someone could straighten me out on a few concepts which > are described a bit ambiguously in the docs. > > 1. data.frame: > ---------------- > Refan p84: 'A data frame is a list of variables of the same length with > unique row names, given class "data.frame".' > > I probably don't need to point out how opaque that is!Which manual are you looking at? The "reference index" (refman.pdf)? It doesn't usually include statements like that; they are usually found in the Introduction to R (R-intro.pdf) or the R Language Definition (R-lang.pdf). But since the refman is just a collection of man pages, it might be in there somewhere. And since the manuals do get updated, that statement may not be present in the current release. (I did a quick search of the source, and couldn't spot it, but my search might have failed because of line breaks, strange formatting, or looking in the wrong place.) By the way, it's generally best to cite the section name where you found a quote, because the pagination varies from system to system. Even better would be to give a URL to the online HTML version at http://cran.r-project.org/manuals.html. For future reference, if you are suggesting a change, it's best to cite the line number in the source at https://svn.r-project.org/R/trunk/doc/manual in the *.texi files or https://svn.r-project.org/R/trunk/src/library/*/man/*.Rd for man pages, and send such suggestions to the R-devel list.> Anyhow, key question: Some places in the docs seem pretty firm that a > data.frame is basically a 2-D array with: > a) named rows and > b) columns whose items within a column be of uniform data type. > > Elsewhere, it seems like a data.frame can be a collection of arbitrary > variables.The former interpretation is correct. Since the variables all have the same length, things like df[i, j] make sense: they choose the i'th entry from the j'th variable (according to the "refan" definition), or the i'th row, j'th column (according to the 2-D array interpretation.> > 2. environment > --------------- > Refman p122: "Environments consist of a frame, or collection of named > objects, and a pointer to an enclosing environment." > > Is the "or" here explaining parenthetically that a frame is a collection of > named objects, or is separating too alternative structures for an > environment?The former.> > If the former, does this imply that a frame can contain arbitrary variables?Yes, but a frame isn't an R object, it's a concept that appears in descriptions, e.g. part of an environment, or the local variables created during function evaluation, etc.> > And "pointer"? Is that a type of thing in R?No, there are no pointers in R. There are a couple of tricks to fake them (e.g. environment objects aren't copied when assigned, you just get a new reference to the same environment; this allows you to construct something like a pointer by wrapping an object in an environment), but I don't recommend using these routinely.> > 3. R search path; attach() > ---------------------------- > The R search path appears to hold the list of "collections of data" (my > term) that can be accessed by a users' commands. Refman p27 tells that > search path can hold items that are data.frame, list, environment or R data > file (on disk). Yet R-intro p28 describes attach() as taking a "directory > name" argument. What is the concept "directory" in this context?I haven't read the preceding pages carefully, but that looks like an error. The usual argument to attach is a package name, and what gets attached is an environment holding the exports from the package. Packages are stored in directories in the file system, so maybe that's what the author of that line had in mind. Duncan Murdoch
graham wideman
2007-May-02 02:12 UTC
[R] Concepts question: environment, frame, search path
Duncan: Thanks for taking a stab at my questions -- in following up I discovered the root of my difficulties -- I had not noticed document R-lang.pdf ("R Language Definition"). This clarifies a great deal. FWIW, it seems to me that a number of things I was hung up on (and which you discussed) revolved around: 1. Confusion between "frame" and "data.frame". R-lang.pdf has several sections that touch on each of these, from which it's more clear (though not explicit) that these are not the same things. (Problematic: frame is mentioned first, is a more fundamental concept, yet has no entry in the Table of Contents, while data.frame does have an entry). (And the converse is true of the index!). 2. Ambiguity in the docs regarding environment, frame (and also regarding closely-related concepts closure and enclosure). Anyhow, I'm now in a much happier state :-). Regarding your questions:>> 1. data.frame: >> Ref[m]an p84: 'A data frame is a list of variables of the same length with >> unique row names, given class "data.frame".'>Which manual are you looking at? The "reference index" (refman.pdf)? >[...] that statement may not be present in the current releaseYes, the doc titled "R: A Language and Environment for Statistical Computing Reference Index". This is in section I "The base package", subsection "data.frame", which was on page 84 of refman.pdf (which I downloaded yesterday, but now don't know where from) or on page 86 of fullrefman.pdf (downloaded today -- ie: current release). (And point understood on the suggestions about reporting doc issues -- though tracking them down to line numbers in the SVN is a bit optimistic, not to mention a moving target :-) ----------- Anyhow, thanks again for the response. Graham --------------------------------------------------- Graham Wideman Resources for programmable diagramming at: http://www.diagramantics.com Brain-related resources: http://wideman-one.com