thr3ads.net - R help - [R] Pointer to covariates? [Feb 2002]

If this information is useful, please help other people find it:
Share via:

Gabor Grothendieck

2002-Feb-20 21:06 UTC

[R] Pointer to covariates?

In the first line, use the dist function, found in library mva,
to get the distance between each pair of rows.   From this
calculate an incidence matrix for which element i,j is true if 
row i in dat equals row j in dat (and false elsewhere).

In the second line, for each row calculate the indices of 
the matching rows and take the minimum of those as the key.

incid <- as.matrix(dist(dat[,-1],method="max"))==0
keys <- unlist(lapply(apply(incid,1,which),min))

--- Göran Broström <gb@stat.umu.se> wrote:>I have a dataframe ''dat'' with one response and some
covariates. Many
>observations  (rows), but only a few unique combinations of 
>the covariates. Let''s say that the response is in column 1, and 
>the covariates in columns 2:k.
>
>I want to do 
>
>> covar <- unique.data.frame(dat[, 2:k])
>> y <- dat[, 1]
>> keys <- ??????
>
>where ''keys'' should be a vector of length length(y) and
contain the
>row numbers in ''covar'', where the response will find its
covariates.
>
>Example:
>
>> dat
>  y x1 x2
>1 1  1  0
>2 2  0  1
>3 3  1  0
>
>> unique.data.frame(dat[, 2:3])
>  x1 x2
>1  1  0
>2  0  1
>
>> keys
>1  1
>2  2
>3  1
>
>But how do I get ''keys''?
>-- 
> Göran Broström                      tel: +46 90 786 5223
> professor                           fax: +46 90 786 6614
> Department of Statistics            stat.umu.se/egna/gb
> Umeå University
> SE-90187 Umeå, Sweden             e-mail: gb@stat.umu.se
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
_____________________________________________________________

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Göran Broström

2002-Feb-21 08:37 UTC

head link

[R] Pointer to covariates?

On Wed, 20 Feb 2002, Gabor Grothendieck wrote:
> In the first line, use the dist function, found in library mva,
> to get the distance between each pair of rows.   From this
> calculate an incidence matrix for which element i,j is true if 
> row i in dat equals row j in dat (and false elsewhere).
> 
> In the second line, for each row calculate the indices of 
> the matching rows and take the minimum of those as the key.
> 
> incid <- as.matrix(dist(dat[,-1],method="max"))==0
> keys <- unlist(lapply(apply(incid,1,which),min))
Thank you very much! This is very fast, much faster than my attempts
so far, but it has two drawbacks:

1. It  gives pointers to first occurrences in the _original_ data frame,
not the 'unique' version.

2. The first step results in a _huge_ matrix 'incid', too huge for my 
applications.

However, this is a promising first attempt, and I will try to refine
the idea. Again, thanks!

G?ran
> 
> --- G?ran Brostr?m <gb at stat.umu.se> wrote:
> >I have a dataframe 'dat' with one response and some covariates.
Many
> >observations  (rows), but only a few unique combinations of 
> >the covariates. Let's say that the response is in column 1, and 
> >the covariates in columns 2:k.
> >
> >I want to do 
> >
> >> covar <- unique.data.frame(dat[, 2:k])
> >> y <- dat[, 1]
> >> keys <- ??????
> >
> >where 'keys' should be a vector of length length(y) and contain
the
> >row numbers in 'covar', where the response will find its
covariates.
> >
> >Example:
> >
> >> dat
> >  y x1 x2
> >1 1  1  0
> >2 2  0  1
> >3 3  1  0
> >
> >> unique.data.frame(dat[, 2:3])
> >  x1 x2
> >1  1  0
> >2  0  1
> >
> >> keys
> >1  1
> >2  2
> >3  1
> >
> >But how do I get 'keys'?
> >-- 
> > G?ran Brostr?m                      tel: +46 90 786 5223
> > professor                           fax: +46 90 786 6614
> > Department of Statistics            stat.umu.se/egna/gb
> > Ume? University
> > SE-90187 Ume?, Sweden             e-mail: gb at stat.umu.se
> >
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> >r-help mailing list -- Read
ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> >Send "info", "help", or "[un]subscribe"
> >(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 
> _____________________________________________________________
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 
-- 
 G?ran Brostr?m                      tel: +46 90 786 5223
 professor                           fax: +46 90 786 6614
 Department of Statistics            stat.umu.se/egna/gb
 Ume? University
 SE-90187 Ume?, Sweden             e-mail: gb at stat.umu.se

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Gabor Grothendieck

2002-Feb-21 16:44 UTC

head link

[R] Pointer to covariates?

Here''s another idea.  It assumes that dat[,-1] contains only zeros
and ones since this is true in your example.  Some comments on
lifting this restriction are at the end.

dat0 <- 2*matrix(unlist(dat[,-1]),nrow=nrow(dat))-1
u0 <- 2*matrix(unlist(unique(dat[,-1])),ncol=ncol(dat[,-1]))-1
keys <- apply(dat0 %*% t(u0) == ncol(dat0),1,which)

The first line creates a matrix of 1''s and -1''s from the x
variables
such that 1 is mapped to 1 and 0 is mapped to -1.  The second line
extacts the unique rows and performs the same transformation.  The
last line does a matrix multiplication creating an incidence matrix
using the fact that the inner product of a row of dat0
and a row of u0 equals the number of columns of dat0 iff they 
are equal.  We then apply the which function to get the indices.  
We don''t have to use the minimum like we did last time since u0 has
unique rows.

To generalize this to x matrices which have more than just zeros
and ones we would have to define a generalized matrix multiplication
which uses ands instead of plus and == instead of times but is 
otherwise the same.  The *.= operator in the APL language did this.
Creating such an operator would even have a benefit in the 0/1 
case since it would make the mapping to +1/-1 unnecessary.


--- Göran Broström <gb@stat.umu.se> wrote:>On Wed, 20 Feb 2002, Gabor Grothendieck wrote:
>
>> In the first line, use the dist function, found in library mva,
>> to get the distance between each pair of rows.   From this
>> calculate an incidence matrix for which element i,j is true if 
>> row i in dat equals row j in dat (and false elsewhere).
>> 
>> In the second line, for each row calculate the indices of 
>> the matching rows and take the minimum of those as the key.
>> 
>> incid <- as.matrix(dist(dat[,-1],method="max"))==0
>> keys <- unlist(lapply(apply(incid,1,which),min))
>
>Thank you very much! This is very fast, much faster than my attempts
>so far, but it has two drawbacks:
>
>1. It  gives pointers to first occurrences in the _original_ data frame,
>not the ''unique'' version.
>
>2. The first step results in a _huge_ matrix ''incid'', too
huge for my
>applications.
>
>However, this is a promising first attempt, and I will try to refine
>the idea. Again, thanks!
>
>Göran
>
>> 
>> --- Göran Broström <gb@stat.umu.se> wrote:
>> >I have a dataframe ''dat'' with one response and
some covariates. Many
>> >observations  (rows), but only a few unique combinations of 
>> >the covariates. Let''s say that the response is in column
1, and
>> >the covariates in columns 2:k.
>> >
>> >I want to do 
>> >
>> >> covar <- unique.data.frame(dat[, 2:k])
>> >> y <- dat[, 1]
>> >> keys <- ??????
>> >
>> >where ''keys'' should be a vector of length
length(y) and contain the
>> >row numbers in ''covar'', where the response will
find its covariates.
>> >
>> >Example:
>> >
>> >> dat
>> >  y x1 x2
>> >1 1  1  0
>> >2 2  0  1
>> >3 3  1  0
>> >
>> >> unique.data.frame(dat[, 2:3])
>> >  x1 x2
>> >1  1  0
>> >2  0  1
>> >
>> >> keys
>> >1  1
>> >2  2
>> >3  1
>> >
>> >But how do I get ''keys''?
>> >-- 
>> > Göran Broström                      tel: +46 90 786 5223
>> > professor                           fax: +46 90 786 6614
>> > Department of Statistics           
stat.umu.se/egna/gb
>> > Umeå University
>> > SE-90187 Umeå, Sweden             e-mail: gb@stat.umu.se
>> >
>>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>> >r-help mailing list -- Read
ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>> >Send "info", "help", or
"[un]subscribe"
>> >(in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
>>
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>> 
>> _____________________________________________________________
>> 
>>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>> r-help mailing list -- Read
ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>> Send "info", "help", or "[un]subscribe"
>> (in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
>>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>> 
>
>-- 
> Göran Broström                      tel: +46 90 786 5223
> professor                           fax: +46 90 786 6614
> Department of Statistics            stat.umu.se/egna/gb
> Umeå University
> SE-90187 Umeå, Sweden             e-mail: gb@stat.umu.se
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
_____________________________________________________________

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Gabor Grothendieck

2002-Feb-22 14:24 UTC

head link

[R] Pointer to covariates?

Another way to do this is to realize that what is wanted is essentially
a join (in the relational database sense) of dat and the unique rows
of dat.  We use merge to perform the join in the following.  The first
few lines set up the data for merge and the last one unscrambles it
since merge does not preserve ordering.  Note that apply is nowhere
used, suggesting that this solution may have adequate speed.

u <- unique(dat)
dat0 <- cbind( dat, seq(nrow(dat)) )
u0 <- cbind( u, seq(nrow(u)) )
by.arg <- c( rep(T,ncol(dat)), F )
dat.mrg <- merge( dat0,u0, by.x=by.arg, by.y=by.arg, sort=F )
keys <- dat.mrg[,ncol(dat.mrg)][order(dat.mrg[,ncol(dat.mrg)-1])]

First, u becomes the unique rows of dat.
The next two lines append a column of sequence numbers to dat and to u .
The 4th & 5th lines merge dat0 and u0 on all cols but the seq numbers.
At this point the last two columns of dat.mrg contain the sequence 
number of the original data frame, dat, and the corresponding sequence 
number of u.   
However, the rows may be scrambled relative to the original ordering 
in dat since merge does not preserve order so resort to get keys.


_____________________________________________________________

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Feb 2002 - Pointer to covariates?

[R] Pointer to covariates?

[R] Pointer to covariates?

[R] Pointer to covariates?

[R] Pointer to covariates?

Apparently Analagous Threads