thr3ads.net - R help - [R] Compact Indicator Matrices [May 2008]

If this information is useful, please help other people find it:
Share via:

amarkos

2008-May-10 10:27 UTC

[R] Compact Indicator Matrices

An indicator matrix is a binary matrix with orthogonal columns whose
rows sum to 1. A row of this matrix could be [0 1 0 0]. My problem is
to group the similar rows (profiles) so that to create a compact form
of the matrix.

Is there an R function that deals with this problem or do I have to
write it from scratch?

Thanks,
Angelos Markos
Dr. Applied Informatics,
University of Macedonia, Greece

Douglas Bates

2008-May-11 13:47 UTC

head link

[R] Compact Indicator Matrices

On Sat, May 10, 2008 at 5:27 AM, amarkos <amarkos at gmail.com>
wrote:> An indicator matrix is a binary matrix with orthogonal columns whose
> rows sum to 1. A row of this matrix could be [0 1 0 0]. My problem is
> to group the similar rows (profiles) so that to create a compact form
> of the matrix.
I'm not sure exactly what you mean by a compact form of this matrix.
Do you mean that you want to collapse similar rows into a single row
and perhaps a count of the number of times that this row occurs?

In R indicator matrices are typically generated from a factor and
essentially you are asking for the tabulation of the factor, such as
provided by the functions table and xtabs.
> Is there an R function that deals with this problem or do I have to
> write it from scratch?
>
> Thanks,
> Angelos Markos
> Dr. Applied Informatics,
> University of Macedonia, Greece
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

amarkos

2008-May-11 14:49 UTC

head link

[R] Compact Indicator Matrices

On May 11, 4:47 pm, "Douglas Bates" <ba... at stat.wisc.edu>
wrote:
> Do you mean that you want to collapse similar rows into a single row
> and perhaps a count of the number of times that this row occurs?
Let me rephrase the problem by providing an example.

Input:

A       [,1] [,2]
 [1,]    1    1
 [2,]    1    3
 [3,]    2    1
 [4,]    1    2
 [5,]    2    1
 [6,]    1    2
 [7,]    1    1
 [8,]    1    2
 [9,]    1    3
[10,]    2    1

# Indicator matrix
A <- data.frame(lapply(data.frame(obj), as.factor))

nocases <- dim(obj)[1]
novars  <- dim(obj)[2]

# variable levels
levels.n <- sapply(obj, nlevels)
n        <- cumsum(levels.n)

# Indicator matrix calculations
Z        <- matrix(0, nrow = nocases, ncol = n[length(n)])
newdat   <- lapply(obj, as.numeric)
offset   <- (c(0, n[-length(n)]))
for (i in 1:novars)
  Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] <- 1

#######

Output:

Z 
    [,1] [,2] [,3] [,4] [,5]
 [1,]    1    0    1    0    0
 [2,]    1    0    0    0    1
 [3,]    0    1    1    0    0
 [4,]    1    0    0    1    0
 [5,]    0    1    1    0    0
 [6,]    1    0    0    1    0
 [7,]    1    0    1    0    0
 [8,]    1    0    0    1    0
 [9,]    1    0    0    0    1
[10,]    0    1    1    0    0


Z is an indicator matrix in the Multiple Correspondence Analysis
framework.
My problem is to collapse identical rows (e.g. 2 and 9) into a single
row and
store the row ids.

Douglas Bates

2008-May-12 16:55 UTC

head link

[R] Compact Indicator Matrices

On Mon, May 12, 2008 at 11:27 AM, amarkos <amarkos at gmail.com>
wrote:> Thanks, it works!
> Could you please provide the direct method you mentioned for the
> multivariate case?
I'm not sure what you mean.  I looked at what I wrote and I don't see
anything that would fit that description.

May I suggest that you continue to cc: the R-help list on the
discussion.  I can't always respond rapidly to requests and there are
many who read the list that can.
> On May 12, 4:30 pm, "Douglas Bates" <ba... at
stat.wisc.edu> wrote:
>> On Sun, May 11, 2008 at 9:49 AM, amarkos <amar... at gmail.com>
wrote:
>> > On May 11, 4:47 pm, "Douglas Bates" <ba... at
stat.wisc.edu> wrote:
>>
>> >> Do you mean that you want to collapse similar rows into a
single row
>> >> and perhaps a count of the number of times that this row
occurs?
>>
>> > Let me rephrase the problem by providing an example.
>>
>> > Input:
>>
>> > A >> >      [,1] [,2]
>> >  [1,]    1    1
>> >  [2,]    1    3
>> >  [3,]    2    1
>> >  [4,]    1    2
>> >  [5,]    2    1
>> >  [6,]    1    2
>> >  [7,]    1    1
>> >  [8,]    1    2
>> >  [9,]    1    3
>> > [10,]    2    1
>>
>> An important question here is do you start with two or more variables
>> like the columns of your matrix A?  If so, there is a more direct
>> method of getting the answers that you want.  The natural way to store
>> such variables in R is as factors.  I prefer to use letters instead of
>> numbers to represent the levels of a factor (that way I don't
confuse
>> a factor with a numeric variable when I look at rows)  so I would
>> create a data frame with two factors instead of a matrix.
>>
>> > V1 <- factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2])
>> > V2 <- factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3])
>> > df <- data.frame(f1 = V1, f2 = V2)
>> > df
>>
>>    f1 f2
>> 1   A  a
>> 2   A  c
>> 3   B  a
>> 4   A  b
>> 5   B  a
>> 6   A  b
>> 7   A  a
>> 8   A  b
>> 9   A  c
>> 10  B  a
>>
>> You could produce the indicator matrix and check for unique rows, etc.
>> - I will show that below - but all you need is the interaction of the
>> two factors
>>
>> > df$f12 <- with(df, f1:f2)[drop = TRUE]
>> > df
>>
>>    f1 f2 f12
>> 1   A  a A:a
>> 2   A  c A:c
>> 3   B  a B:a
>> 4   A  b A:b
>> 5   B  a B:a
>> 6   A  b A:b
>> 7   A  a A:a
>> 8   A  b A:b
>> 9   A  c A:c
>> 10  B  a B:a> str(df)
>>
>> 'data.frame':   10 obs. of  3 variables:
>>  $ f1 : Factor w/ 2 levels "A","B": 1 1 2 1 2 1 1 1
1 2
>>  $ f2 : Factor w/ 3 levels "a","b","c": 1
3 1 2 1 2 1 2 3 1
>>  $ f12: Factor w/ 4 levels
"A:a","A:b","A:c",..: 1 3 4 2 4 2 1 2 3 4
>>
>> > table(df$f12)
>>
>> A:a A:b A:c B:a
>>   2   3   2   3> as.numeric(df$f12)
>>
>>  [1] 1 3 4 2 4 2 1 2 3 4
>>
>> Notice that this shows you that there are four distinct combinations
>> that occur 2, 3, 2 and 3 times respectively; the first combination
>> occurs in rows 1 and 7, it consists of the first level of f1 and the
>> first level of f2, etc.
>>
>> If you really do want the indicator matrix you could generate it as
>>
>> > (ind <- cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 +
f2, df)))
>>
>>    f1A f1B f2a f2b f2c
>> 1    1   0   1   0   0
>> 2    1   0   0   0   1
>> 3    0   1   1   0   0
>> 4    1   0   0   1   0
>> 5    0   1   1   0   0
>> 6    1   0   0   1   0
>> 7    1   0   1   0   0
>> 8    1   0   0   1   0
>> 9    1   0   0   0   1
>> 10   0   1   1   0   0> unique(ind)
>>
>>   f1A f1B f2a f2b f2c
>> 1   1   0   1   0   0
>> 2   1   0   0   0   1
>> 3   0   1   1   0   0
>> 4   1   0   0   1   0
>>
>> but working with the factors is generally much simpler than working
>> with the indicators.
>>
>>
>>
>> > # Indicator matrix
>> > A <- data.frame(lapply(data.frame(obj), as.factor))
>>
>> > nocases <- dim(obj)[1]
>> > novars  <- dim(obj)[2]
>>
>> > # variable levels
>> > levels.n <- sapply(obj, nlevels)
>> > n        <- cumsum(levels.n)
>>
>> > # Indicator matrix calculations
>> > Z        <- matrix(0, nrow = nocases, ncol = n[length(n)])
>> > newdat   <- lapply(obj, as.numeric)
>> > offset   <- (c(0, n[-length(n)]))
>> > for (i in 1:novars)
>> >  Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] <- 1
>>
>> > #######
>>
>> > Output:
>>
>> > Z >>
>> >    [,1] [,2] [,3] [,4] [,5]
>> >  [1,]    1    0    1    0    0
>> >  [2,]    1    0    0    0    1
>> >  [3,]    0    1    1    0    0
>> >  [4,]    1    0    0    1    0
>> >  [5,]    0    1    1    0    0
>> >  [6,]    1    0    0    1    0
>> >  [7,]    1    0    1    0    0
>> >  [8,]    1    0    0    1    0
>> >  [9,]    1    0    0    0    1
>> > [10,]    0    1    1    0    0
>>
>> > Z is an indicator matrix in the Multiple Correspondence Analysis
>> > framework.
>> > My problem is to collapse identical rows (e.g. 2 and 9) into a
single
>> > row and
>> > store the row ids.
>>
>> > ______________________________________________
>> > R-h... at r-project.org mailing list
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting
guidehttp://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-h... at r-project.org mailing
listhttps://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting
guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> Angelos Markos
> Dr. of Applied Informatics,
> University of Macedonia, Greece
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - May 2008 - Compact Indicator Matrices

[R] Compact Indicator Matrices

[R] Compact Indicator Matrices

[R] Compact Indicator Matrices

[R] Compact Indicator Matrices

Apparently Analagous Threads