thr3ads.net - R help - [R] matrix manipulations [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Monica Pisica

2011-Jan-17 19:59 UTC

[R] matrix manipulations

Hi,

I am having some difficulties with matrix operations. It is a little hard to
explain it so please bear with me. I have a very large data set, large enough
that it needs to be split in parts in order to deal with. I can work things on
these "parts" but the problem lies in adding together these parts for
the final answer.

So that been said, let's say that i split the data in 2 parts, 1 and 2. Each
part has data belonging to 6 different categories, and each category has 2
different classes, these classes being the same for each category. The classes
are called "land" and "water" and each category is labeled
"cat1" to "cat6". I am using the command (function) table to
tabulate each class for each category, but since i split the data in 2 parts,
one part has only some of the 6 categories, and the other some other of the 6
categories (and not necessarily exclusive).

So let's built some results after i used the table function.

m1 <- matrix(c(32, 35, 36, 12, 15, 16), nrow = 2, ncol = 3, byrow = TRUE,
dimnames = list(c("land", "water"), c("cat2",
"cat5", "cat6")))
> m1     cat2 cat5 cat6
land 32    35   36
water 12   15   16

m2 <- matrix(c(45, 46, 47, 48, 21, 22, 23, 24), nrow = 2, ncol = 4, byrow =
TRUE, dimnames = list(c("land", "water"),
c("cat1", "cat2", "cat3", "cat4")))
> m2     cat1 cat2 cat3 cat4
land  45   46   47   48
water 21   22   23   24

So my end desired result should be a matrix (or a data frame) that has 6 columns
called cat1 to cat6 and 2 rows labeled land and water, and for the category that
appears in both m1 and m2 the end result will be a sum.

results will be m3:

     cat1 cat2 cat3 cat4 cat5 cat6
land  45  78   47    48   35   36
water 21  34   23    24   15   16

To do this i thought in making an empty matrix for each m1 and m2 (called m01
and m02 respectively) with 6 columns and 2 rows, and do a long if else statement
in which i match the name of the first column in m1 with the name of the first
column in m01 and if they match get the data from m1, if not leave it 0 and so
on. Same thing for m2 and m02. This is long and extremely clunky but afterwards
i can add m01 with m02 and get my desired result m3. Is there any way i can do
this more elegantly? My real data is split in 4 parts, but the problem is the
same.

Thanks for all your inputs, and sorry for this long email, but i didn't know
how else i could explain what i wanted to do.
 
Monica

Phil Spector

2011-Jan-17 20:13 UTC

head link

[R] matrix manipulations

Monica -
    Perhaps this small example can demonstrate how factors can
solve your problem:
> d1 =
data.frame(cat=sample(c('cat2','cat5','cat6'),100,replace=TRUE),group=sample(c('land','water'),100,replace=TRUE))
> d2 =
data.frame(cat=sample(c('cat1','cat3','cat4'),100,replace=TRUE),group=sample(c('land','water'),100,replace=TRUE))
> d1$cat =
factor(d1$cat,levels=c('cat1','cat2','cat3','cat4','cat5','cat6'))
> d2$cat =
factor(d2$cat,levels=c('cat1','cat2','cat3','cat4','cat5','cat6'))
> table(d1$group,d1$cat) + table(d2$group,d2$cat)
         cat1 cat2 cat3 cat4 cat5 cat6
   land    14   17   18   22   19   23
   water   19   15   16   11   10   16

This works because when you include all possible levels in a factor, R will 
automatically put zeroes in the right places when you use table():
> table(d1$group,d1$cat)         cat1 cat2 cat3 cat4 cat5 cat6
   land     0   17    0    0   19   23
   water    0   15    0    0   10   16> table(d2$group,d2$cat)         cat1 cat2 cat3 cat4 cat5 cat6
   land    14    0   18   22    0    0
   water   19    0   16   11    0    0

Hope this helps.
 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Mon, 17 Jan 2011, Monica Pisica wrote:
>
> Hi,
>
> I am having some difficulties with matrix operations. It is a little hard
to explain it so please bear with me. I have a very large data set, large enough
that it needs to be split in parts in order to deal with. I can work things on
these "parts" but the problem lies in adding together these parts for
the final answer.
>
> So that been said, let's say that i split the data in 2 parts, 1 and 2.
Each part has data belonging to 6 different categories, and each category has 2
different classes, these classes being the same for each category. The classes
are called "land" and "water" and each category is labeled
"cat1" to "cat6". I am using the command (function) table to
tabulate each class for each category, but since i split the data in 2 parts,
one part has only some of the 6 categories, and the other some other of the 6
categories (and not necessarily exclusive).
>
> So let's built some results after i used the table function.
>
> m1 <- matrix(c(32, 35, 36, 12, 15, 16), nrow = 2, ncol = 3, byrow =
TRUE, dimnames = list(c("land", "water"),
c("cat2", "cat5", "cat6")))
>
>> m1
>     cat2 cat5 cat6
> land 32    35   36
> water 12   15   16
>
> m2 <- matrix(c(45, 46, 47, 48, 21, 22, 23, 24), nrow = 2, ncol = 4,
byrow = TRUE, dimnames = list(c("land", "water"),
c("cat1", "cat2", "cat3", "cat4")))
>
>> m2
>     cat1 cat2 cat3 cat4
> land  45   46   47   48
> water 21   22   23   24
>
> So my end desired result should be a matrix (or a data frame) that has 6
columns called cat1 to cat6 and 2 rows labeled land and water, and for the
category that appears in both m1 and m2 the end result will be a sum.
>
> results will be m3:
>
>     cat1 cat2 cat3 cat4 cat5 cat6
> land  45  78   47    48   35   36
> water 21  34   23    24   15   16
>
> To do this i thought in making an empty matrix for each m1 and m2 (called
m01 and m02 respectively) with 6 columns and 2 rows, and do a long if else
statement in which i match the name of the first column in m1 with the name of
the first column in m01 and if they match get the data from m1, if not leave it
0 and so on. Same thing for m2 and m02. This is long and extremely clunky but
afterwards i can add m01 with m02 and get my desired result m3. Is there any way
i can do this more elegantly? My real data is split in 4 parts, but the problem
is the same.
>
> Thanks for all your inputs, and sorry for this long email, but i didn't
know how else i could explain what i wanted to do.
>
> Monica
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Henrique Dallazuanna

2011-Jan-17 20:16 UTC

head link

[R] matrix manipulations

Try this:

library(reshape)
xtabs(rowSums(cbind(value.x, value.y), na.rm = TRUE) ~ X1 + X2,
merge(melt(m1), melt(m2), by = c('X1', 'X2'), all = TRUE),
exclude = FALSE)


On Mon, Jan 17, 2011 at 5:59 PM, Monica Pisica
<pisicandru@hotmail.com>wrote:
>
> Hi,
>
> I am having some difficulties with matrix operations. It is a little hard
> to explain it so please bear with me. I have a very large data set, large
> enough that it needs to be split in parts in order to deal with. I can work
> things on these "parts" but the problem lies in adding together
these parts
> for the final answer.
>
> So that been said, let's say that i split the data in 2 parts, 1 and 2.
> Each part has data belonging to 6 different categories, and each category
> has 2 different classes, these classes being the same for each category.
The
> classes are called "land" and "water" and each category
is labeled "cat1" to
> "cat6". I am using the command (function) table to tabulate each
class for
> each category, but since i split the data in 2 parts, one part has only
some
> of the 6 categories, and the other some other of the 6 categories (and not
> necessarily exclusive).
>
> So let's built some results after i used the table function.
>
> m1 <- matrix(c(32, 35, 36, 12, 15, 16), nrow = 2, ncol = 3, byrow =
TRUE,
> dimnames = list(c("land", "water"), c("cat2",
"cat5", "cat6")))
>
> > m1
>     cat2 cat5 cat6
> land 32    35   36
> water 12   15   16
>
> m2 <- matrix(c(45, 46, 47, 48, 21, 22, 23, 24), nrow = 2, ncol = 4,
byrow > TRUE, dimnames = list(c("land", "water"),
c("cat1", "cat2", "cat3",
> "cat4")))
>
> > m2
>     cat1 cat2 cat3 cat4
> land  45   46   47   48
> water 21   22   23   24
>
> So my end desired result should be a matrix (or a data frame) that has 6
> columns called cat1 to cat6 and 2 rows labeled land and water, and for the
> category that appears in both m1 and m2 the end result will be a sum.
>
> results will be m3:
>
>     cat1 cat2 cat3 cat4 cat5 cat6
> land  45  78   47    48   35   36
> water 21  34   23    24   15   16
>
> To do this i thought in making an empty matrix for each m1 and m2 (called
> m01 and m02 respectively) with 6 columns and 2 rows, and do a long if else
> statement in which i match the name of the first column in m1 with the name
> of the first column in m01 and if they match get the data from m1, if not
> leave it 0 and so on. Same thing for m2 and m02. This is long and extremely
> clunky but afterwards i can add m01 with m02 and get my desired result m3.
> Is there any way i can do this more elegantly? My real data is split in 4
> parts, but the problem is the same.
>
> Thanks for all your inputs, and sorry for this long email, but i didn't
> know how else i could explain what i wanted to do.
>
> Monica
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jan 2011 - matrix manipulations

[R] matrix manipulations

[R] matrix manipulations

[R] matrix manipulations

Seemingly Similar Threads