thr3ads.net - R help - [R] Data.frame Vs Matrix Vs Array: Definitions Please [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Matt Curcio

2010-Oct-27 00:37 UTC

[R] Data.frame Vs Matrix Vs Array: Definitions Please

Hi All,
I am learning R and having a little trouble with the usage and proper
definitions of data.frames vs. matrix vs vectors. I have read many R
tutorials, and looked over ump-teen 'cheat' sheets and have found that
no one has articulated a really good definition of the differences
between 'data.frames', 'matrix', and 'arrays' and even
'factors'.  I
realize that I might have missed someones R tutorial, and actually
would like to receive 'your' most concise or most useful tutorial.
Any help would be appreciated.

My particular favorite explanation and helpful hint is from the
'R-Inferno'.  Don't get me wrong...  I think this pdf is great and
some tables are excellent. Overall it is a very good primer but this
one section leaves me puzzled.  This quote belies the lack of hard and
fast rules for what and when to use 'data.frames', 'matrix', and
'arrays'.  It discusses ways in which to simplify your work.

Here are a few possibilities for simplifying:
? Don?t use a list when an atomic vector will do.
? Don?t use a data frame when a matrix will do.
? Don?t try to use an atomic vector when a list is needed.
? Don?t try to use a matrix when a data frame is needed.

Cheers,
Matt C

Gabor Grothendieck

2010-Oct-27 00:49 UTC

head link

[R] Data.frame Vs Matrix Vs Array: Definitions Please

On Tue, Oct 26, 2010 at 8:37 PM, Matt Curcio <matt.curcio.ri at gmail.com>
wrote:> Hi All,
> I am learning R and having a little trouble with the usage and proper
> definitions of data.frames vs. matrix vs vectors. I have read many R
> tutorials, and looked over ump-teen 'cheat' sheets and have found
that
> no one has articulated a really good definition of the differences
> between 'data.frames', 'matrix', and 'arrays' and
even 'factors'. ?I
> realize that I might have missed someones R tutorial, and actually
> would like to receive 'your' most concise or most useful tutorial.
> Any help would be appreciated.
>
> My particular favorite explanation and helpful hint is from the
> 'R-Inferno'. ?Don't get me wrong... ?I think this pdf is great
and
> some tables are excellent. Overall it is a very good primer but this
> one section leaves me puzzled. ?This quote belies the lack of hard and
> fast rules for what and when to use 'data.frames',
'matrix', and
> 'arrays'. ?It discusses ways in which to simplify your work.
>
> Here are a few possibilities for simplifying:
> ? Don?t use a list when an atomic vector will do.
> ? Don?t use a data frame when a matrix will do.
> ? Don?t try to use an atomic vector when a list is needed.
> ? Don?t try to use a matrix when a data frame is needed.
>
> Cheers,
> Matt C
Look at their internal representations and it will become clearer.  v,
a vector, has length 6.  m, a matrix, is actually the same as the
vector v except is has dimensions too. Since m is just a vector with
dimensions, m has length 6 as well.  L is a list and has length 2
because its a vector each of whose components is itself a vector.  DF
is a data frame and is the same as L except its 2 components must each
have the same length and it must have row and column names.  If you
don't assign the row and column names they are automatically generated
as we can see.  Note that row.names = c(NA, -3L) is a short form for
row names of 1:3 and .Names internally refers to column names.
> v <- 1:6 # vector
> dput(v)
1:6>
> m <- v; dim(m) <- 2:3 # m is a matrix since we added dimensions
> dput(m)
structure(1:6, .Dim = 2:3)>
> L <- list(1:3, 4:6)
> dput(L)
list(1:3, 4:6)>
> DF <- data.frame(1:3, 4:6)
> dput(DF)structure(list(X1.3 = 1:3, X4.6 = 4:6), .Names = c("X1.3",
"X4.6"
), row.names = c(NA, -3L), class =
"data.frame")>

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Dennis Murphy

2010-Oct-27 12:39 UTC

head link

[R] Data.frame Vs Matrix Vs Array: Definitions Please

Hi:

I'm going to take a different tack from Gabor and Ivan and be strictly
qualitative on the distinctions among vectors, matrices, arrays, data frames
and lists.

As Ivan mentioned, a vector has a single (atomic) mode - i.e., all elements
of a vector must be of the same type. A numeric vector consists strictly of
numbers, a character variable is composed of character entries, and every
element of a logical vector is TRUE or FALSE. Mixtures of types can produce
surprises to the unwary - for example, a single character element in an
otherwise numeric vector coerces its type to character. Vectors are
one-dimensional by definition. Example:
> x1 <- 1:5
> x2 <- c(x1, 'a')
> x1
[1] 1 2 3 4 5> x2[1] "1" "2" "3" "4" "5"
"a"> class(x1)
[1] "integer"> class(x2)[1] "character"

A matrix is a vector with a (two) dimensional attribute, as Gabor noted and
showed by example. Thus, the elements of a matrix must also be of the same
type (numeric, character, logical, etc.).

A data frame is a rectangular object whose columns consist of vectors of the
same length. Columns can be (and usually are) of different types, but all of
the elements within a column are of the same type. The restriction on data
frames is that all columns must have the same length, but this is common in
data sets where each row represents an observation and each column
represents a different datum. Example:
> d <- data.frame(a = LETTERS[1:3], x = 1:3, y = rnorm(3))
> d  a x          y
1 A 1 -1.3417463
2 B 2 -0.7032052
3 C 3 -0.7099726> str(d)'data.frame':   3 obs. of  3 variables:
 $ a: Factor w/ 3 levels "A","B","C": 1 2 3
 $ x: int  1 2 3
 $ y: num  -1.342 -0.703 -0.71

Lists are the most general type of data object. Each list contains one or
more components, but each component may contain subcomponents, which in turn
may contain sub-subcomponents, etc. Each (sub)component can have a different
type, like data frames, but they can also have different lengths, so in this
sense they generalize data frames. The capacity to nest lists within lists
is a further generalization of data frames. For example, the output of a
modeling function (e.g., lm(), glm()) returns a list,
providing an instructive example to learn how lists work and behave. Lists
are difficult to 'get' at first, but it gets easier with experience.
Example: extend above to read in four random normal deviates.
> dd <- data.frame(a = LETTERS[1:3], x = 1:3, y = rnorm(4))Error in data.frame(a = LETTERS[1:3], x = 1:3, y = rnorm(4)) :
  arguments imply differing number of rows: 3, 4> dd <- list(a = LETTERS[1:3], x = 1:3, y = rnorm(4) )
> dd$a
[1] "A" "B" "C"
$x
[1] 1 2 3
$y
[1] -0.02635882  0.50764973  2.02707087  0.01845697

Data frames are special cases of lists where each column represents a list
component and each component is an atomic vector of the same length.

Matrices are generalizations of vectors (vectors with dimensional
attributes), but they can also be thought of as a special case of a data
frame in the sense that each column is of the same type. However, matrices
are not list objects, so the analogy is limited. The function
as.data.frame(matrix) converts a matrix to a data frame.

Arrays are also vectors with dimensional attributes. A one-dimensional array
is a vector and a two-dimensional array is a matrix, but arrays can have
more than one or two dimensions, as Gabor pointed out. The length of the dim
vector determines the number of dimensions of an array. Since an array is a
generalization of a vector, all elements of an array of any dimension must
have the same type.

I'm glad that Ivan described factors for you - these objects are likely to
give you more headaches than any other. Be particularly careful when reading
in data from a file - make sure you know what is being input and what you
want for output, and code accordingly.  Example: the first call reads the
character variable a as a factor (the default behavior), the second
overrides the default.
> d <- data.frame(a = LETTERS[1:3], x = 1:3, y = rnorm(3))
> str(d)'data.frame':   3 obs. of  3 variables:
 $ a    : Factor w/ 3 levels "A","B","C": 1 2 3   
#   <<==== $ x    : int  1 2 3
 $ y    : num  0.926 -1.103 0.554> d <- data.frame(a = LETTERS[1:3], x = 1:3, y = rnorm(3),
+                 stringsAsFactors = FALSE)> str(d)'data.frame':   3 obs. of  3 variables:
 $ a: chr  "A" "B" "C"                            
# <<==== $ x: int  1 2 3
 $ y: num  0.495 0.956 0.628

I'd suggest learning to use the function str() routinely to elucidate the
contents of a particular (class of) object (and its elements). It is
certainly one of the most useful functions in R and a great way to improve
your understanding of the various types of objects you'll encounter in the
language.

This is a general description of the types of objects you wanted to know
about, but special cases arise where an object of one type turns into
another silently. You need to learn these exceptions, sometimes the hard
way. Gabor's list -> vector example is one; another is that a
one-dimensional matrix or array is silently converted into a vector unless
explicitly overwritten. Here's a
small example to illustrate (notice the differences in how the objects are
printed - it provides a clue):
> m <- matrix(1:9, nrow = 3)
> m     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9> class(m)
[1] "matrix"> m2 <- m[1, ]
> m2
[1] 1 4 7> class(m2)
[1] "integer"> is.matrix(m2)
[1] FALSE> is.vector(m2)[1] TRUE
# How to create a (row) vector but keep matrix class:> m3 <- m[1, , drop = FALSE]
> m3     [,1] [,2] [,3]
[1,]    1    4    7> class(m3)[1] "matrix"
# Pay attention to the dimensions> str(m)
 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9> str(m2)
 int [1:3] 1 4 7> str(m3) int [1, 1:3] 1 4 7

HTH,
Dennis


On Tue, Oct 26, 2010 at 5:37 PM, Matt Curcio
<matt.curcio.ri@gmail.com>wrote:
> Hi All,
> I am learning R and having a little trouble with the usage and proper
> definitions of data.frames vs. matrix vs vectors. I have read many R
> tutorials, and looked over ump-teen 'cheat' sheets and have found
that
> no one has articulated a really good definition of the differences
> between 'data.frames', 'matrix', and 'arrays' and
even 'factors'.  I
> realize that I might have missed someones R tutorial, and actually
> would like to receive 'your' most concise or most useful tutorial.
> Any help would be appreciated.
>
> My particular favorite explanation and helpful hint is from the
> 'R-Inferno'.  Don't get me wrong...  I think this pdf is great
and
> some tables are excellent. Overall it is a very good primer but this
> one section leaves me puzzled.  This quote belies the lack of hard and
> fast rules for what and when to use 'data.frames',
'matrix', and
> 'arrays'.  It discusses ways in which to simplify your work.
>
Here are a few possibilities for simplifying:> • Don’t use a list when an atomic vector will do.
> • Don’t use a data frame when a matrix will do.
> • Don’t try to use an atomic vector when a list is needed.
> • Don’t try to use a matrix when a data frame is needed.
>
> Cheers,
> Matt C
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more reasonably related threads

R help - Oct 2010 - Data.frame Vs Matrix Vs Array: Definitions Please

[R] Data.frame Vs Matrix Vs Array: Definitions Please

[R] Data.frame Vs Matrix Vs Array: Definitions Please

[R] Data.frame Vs Matrix Vs Array: Definitions Please

Apparently Analagous Threads