thr3ads.net - R help - [R] hairy indexing problem [Jun 2002]

If this information is useful, please help other people find it:
Share via:

Russell Senior

2002-Jun-05 01:00 UTC

[R] hairy indexing problem

I've got a data frame that looks like this:

   subject   foo   bar
      2      1.7   3.2
      2      2.3   4.1
      3      7.6   2.3
      3      7.1   3.3
      3      7.3   2.3
      3      7.4   1.3
      5      6.2   6.1
      5      3.4   6.9
     ...

That is, I've got multiple rows per subject.  I need to compute
summaries within categories where the subject has the same number of
rows.  For example, subject 2 and 5 both have two rows.  I need to
compute mean for those four values of foo.  This looks like a good
candidate for index vectors, but I need some help.  I've tried
something like:

  table(data) -> tmp
 
and:

  tmp[tmp == 2]

and even:

  as.numeric(attr(tmp[tmp == 2],"names"))

to get a vector of subject numbers that have two rows in the original
data frame.  But I am getting stuck there.  I want some kind of
"is.member" function to use in a subsequent index vector expression,
like:

  i <- as.numeric(attr(tmp[tmp == 2],"names"))
  data[is.member($subject,i)]$foo

but there isn't an is.member() function.  Can someone please give me a
pointer on the canonical way to do this?

Thanks!

-- 
Russell Senior         ``The two chiefs turned to each other.        
seniorr at aracnet.com      Bellison uncorked a flood of horrible       
                         profanity, which, translated meant, `This is
                         extremely unusual.' ''
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ian.Saunders@csiro.au

2002-Jun-05 04:31 UTC

head link

[R] hairy indexing problem

There's probably a better way, but ...

	apply(outer(subject,subject,FUN="=="),1,sum)

will give you a vector of the counts for each value of subject, so would be 
	2 2 4 4 4 4 2 2 ...
in your example.

You could add this as a column of the data frame and use gsummary to get the
summary statistics.

Ian.
> -----Original Message-----
> From: Russell Senior [mailto:seniorr at aracnet.com]
> Sent: Wednesday, 5 June 2002 10:31 AM
> To: R-help at stat.math.ethz.ch
> Subject: [R] hairy indexing problem
> 
> 
> 
> I've got a data frame that looks like this:
> 
>    subject   foo   bar
>       2      1.7   3.2
>       2      2.3   4.1
>       3      7.6   2.3
>       3      7.1   3.3
>       3      7.3   2.3
>       3      7.4   1.3
>       5      6.2   6.1
>       5      3.4   6.9
>      ...
> 
> That is, I've got multiple rows per subject.  I need to compute
> summaries within categories where the subject has the same number of
> rows.  For example, subject 2 and 5 both have two rows.  I need to
> compute mean for those four values of foo.  This looks like a good
> candidate for index vectors, but I need some help.  I've tried
> something like:
> 
>   table(data) -> tmp
>  
> and:
> 
>   tmp[tmp == 2]
> 
> and even:
> 
>   as.numeric(attr(tmp[tmp == 2],"names"))
> 
> to get a vector of subject numbers that have two rows in the original
> data frame.  But I am getting stuck there.  I want some kind of
> "is.member" function to use in a subsequent index vector
expression,
> like:
> 
>   i <- as.numeric(attr(tmp[tmp == 2],"names"))
>   data[is.member($subject,i)]$foo
> 
> but there isn't an is.member() function.  Can someone please give me a
> pointer on the canonical way to do this?
> 
> Thanks!
> 
> -- 
> Russell Senior         ``The two chiefs turned to each other.        
> seniorr at aracnet.com      Bellison uncorked a flood of horrible       
>                          profanity, which, translated meant, `This is
>                          extremely unusual.' ''
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

jasont@indigoindustrial.co.nz

2002-Jun-05 05:00 UTC

head link

[R] hairy indexing problem

> I've got a data frame that looks like this:
> 
>    subject   foo   bar
>       2      1.7   3.2
>       2      2.3   4.1
>       3      7.6   2.3
>       3      7.1   3.3
>       3      7.3   2.3
>       3      7.4   1.3
>       5      6.2   6.1
>       5      3.4   6.9
>      ...
> 
> That is, I've got multiple rows per subject.  I need to compute
> summaries within categories where the subject has the same number of
> rows.  For example, subject 2 and 5 both have two rows.  I need to
> compute mean for those four values of foo. 
> Can someone please give me a
> pointer on the canonical way to do this?
Canonical?  Would you settle for "it works for me"?  ;)

I suspect one of the gurus has a tidy, elegant way of doing this, but here's
how I'd do it instead (not being a guru). Run-length encoding works pretty
well
at things like this.
> d1 <- data.frame(subject=c(2,2,3,3,3,3,5,5),foo=c
(1.7,2.3,7.6,7.1,7.3,7.4,6.2,3.4))> d1  subject foo
1       2 1.7
2       2 2.3
3       3 7.6
4       3 7.1
5       3 7.3
6       3 7.4
7       5 6.2
8       5 3.4> d1.subj.rle <- rle(d1$subject[order(d1$subject)])## make a vector of unique numbers of subjects
> n.subj <- unique(d1.subj.rle$lengths)
## now take means based on number of subjects.> 
> n.subj <- unique(d1.subj.rle$lengths)
> sapply(n.subj,function(x,...) { + mean(d1$foo[d1$subject %in% d1.subj.rle$values[d1.subj.rle$lengths == x]])})
[1] 3.40 7.35 
##check the numbers> mean(d1$foo[d1$subject == 2 | d1$subject == 5])
[1] 3.4> mean(d1$foo[d1$subject == 3])
[1] 7.35> 
That could be a *lot* clearer inside the sapply function; maybe in v2.0 of my 
attempt at this ;)

Cheers

Jason

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bill.Venables@cmis.csiro.au

2002-Jun-05 05:03 UTC

head link

[R] hairy indexing problem

>  -----Original Message-----
> From: 	Ian.Saunders at csiro.au [mailto:Ian.Saunders at csiro.au] 
> Sent:	Wednesday, June 05, 2002 2:32 PM
> To:	seniorr at aracnet.com; R-help at stat.math.ethz.ch
> Subject:	RE: [R] hairy indexing problem
> 
> There's probably a better way, but ...
> 
> 	apply(outer(subject,subject,FUN="=="),1,sum)
> 
> will give you a vector of the counts for each value of subject, so would
> be 
> 	2 2 4 4 4 4 2 2 ...
> in your example.	[WNV]  I think there could be a better way.  

	Make sure subject is a factor:

		subject <- as.factor(data$subject)

	and then your "replications class" factor is

		reps <- factor(table(subject)[subject])

	The next step could be 

		fooMeans <- tapply(data$foo, reps, mean) >  
> You could add this as a column of the data frame and use gsummary to get
> the
> summary statistics.	[WNV]  Yep, that too.   gsummary is part of the nlme package which
has to be loaded.
> Ian.	[WNV]  Bill.
> > -----Original Message-----
> > From: Russell Senior [mailto:seniorr at aracnet.com]
> > Sent: Wednesday, 5 June 2002 10:31 AM
> > To: R-help at stat.math.ethz.ch
> > Subject: [R] hairy indexing problem
> > 
> > 
> > 
> > I've got a data frame that looks like this:
> > 
> >    subject   foo   bar
> >       2      1.7   3.2
> >       2      2.3   4.1
> >       3      7.6   2.3
> >       3      7.1   3.3
> >       3      7.3   2.3
> >       3      7.4   1.3
> >       5      6.2   6.1
> >       5      3.4   6.9
> >      ...
> > 
> > That is, I've got multiple rows per subject.  I need to compute
> > summaries within categories where the subject has the same number of
> > rows.  For example, subject 2 and 5 both have two rows.  I need to
> > compute mean for those four values of foo.  This looks like a good
> > candidate for index vectors, but I need some help.  I've tried
> > something like:
> > 
> >   table(data) -> tmp
> >  
> > and:
> > 
> >   tmp[tmp == 2]
> > 
> > and even:
> > 
> >   as.numeric(attr(tmp[tmp == 2],"names"))
> > 
> > to get a vector of subject numbers that have two rows in the original
> > data frame.  But I am getting stuck there.  I want some kind of
> > "is.member" function to use in a subsequent index vector
expression,
> > like:
> > 
> >   i <- as.numeric(attr(tmp[tmp == 2],"names"))
> >   data[is.member($subject,i)]$foo
> > 
> > but there isn't an is.member() function.  Can someone please give
me a
> > pointer on the canonical way to do this?
> > 
> > Thanks!
> > 
> > -- 
> > Russell Senior         ``The two chiefs turned to each other.        
> > seniorr at aracnet.com      Bellison uncorked a flood of horrible
> >                          profanity, which, translated meant, `This is
> >                          extremely unusual.' ''
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > -.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read 
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _.
> _._
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

vito muggeo

2002-Jun-05 06:31 UTC

head link

[R] hairy indexing problem

See ?tapply of course.
E.g.
tapply(foo, subject, mean, na.rm=T) #mean of foo per subj

best,
vito

----- Original Message -----
From: "Russell Senior" <seniorr at aracnet.com>
To: <R-help at stat.math.ethz.ch>
Sent: Wednesday, June 05, 2002 3:00 AM
Subject: [R] hairy indexing problem

>
> I've got a data frame that looks like this:
>
>    subject   foo   bar
>       2      1.7   3.2
>       2      2.3   4.1
>       3      7.6   2.3
>       3      7.1   3.3
>       3      7.3   2.3
>       3      7.4   1.3
>       5      6.2   6.1
>       5      3.4   6.9
>      ...
>
> That is, I've got multiple rows per subject.  I need to compute
> summaries within categories where the subject has the same number of
> rows.  For example, subject 2 and 5 both have two rows.  I need to
> compute mean for those four values of foo.  This looks like a good
> candidate for index vectors, but I need some help.  I've tried
> something like:
>
>   table(data) -> tmp
>
> and:
>
>   tmp[tmp == 2]
>
> and even:
>
>   as.numeric(attr(tmp[tmp == 2],"names"))
>
> to get a vector of subject numbers that have two rows in the original
> data frame.  But I am getting stuck there.  I want some kind of
> "is.member" function to use in a subsequent index vector
expression,
> like:
>
>   i <- as.numeric(attr(tmp[tmp == 2],"names"))
>   data[is.member($subject,i)]$foo
>
> but there isn't an is.member() function.  Can someone please give me a
> pointer on the canonical way to do this?
>
> Thanks!
>
> --
> Russell Senior         ``The two chiefs turned to each other.
> seniorr at aracnet.com      Bellison uncorked a flood of horrible
>                          profanity, which, translated meant, `This is
>                          extremely unusual.' ''
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Agustin Lobo

2002-Jun-05 08:19 UTC

head link

[R] hairy indexing problem

I would use function w.conti that calculates a weighted 
contingency matrix. That is, given 2 vectors of
categorical variables (i.e., species and 
soil type) and a 3rd vector of a
quantitative variable (i.e. biomass), calculates
the sum of the quant. var. for each pair (i.e., 
the total biomass for each species in each soil type).
With your data, as you just have one categorical
variable, just set the second one to a constant to
calculate the sum of foo for each subject:
> matriz<-cbind(sub,foo,bar)
> matriz     sub foo bar
[1,]   2 1.7 3.2
[2,]   2 2.3 4.1
[3,]   3 7.6 2.3
[4,]   3 7.1 3.3
[5,]   3 7.3 2.3
[6,]   3 7.4 1.3
[7,]   5 6.2 6.1
[8,]   5 3.4 6.9>
> a <- w.conti(matriz[,1],rep(1,nrow(matriz)),matriz[,2])
> a   v2
v1     1
  2  4.0
  3 29.4
  5  9.6

Then, using the result of table you can calculate the mean from 
the sum:
> a/as.vector(table(matriz[,1]))   v2
v1     1
  2 2.00
  3 7.35
  5 4.80
>From your question I understand that you want new subjects accordingto their number of rows, so
that subject 2 and 5 would become a new subject:
> new.sub <- as.vector(table(matriz[,1]))
> new.sub
[1] 2 4 2> new.sub <-  rep(new.sub,new.sub)
> new.sub[1] 2 2 4 4 4 4 2 2
> a <- w.conti(new.sub,rep(1,nrow(matriz)),matriz[,2])
> a   v2
v1     1
  2 13.6
  4 29.4> a/as.vector(table(new.sub))   v2
v1     1
  2 3.40
  4 7.35>
w.conti is simply:

function (v1,v2,z)
{
        xtabs(z~v1+v2)
}

(I could use xtabs() directely, but I never remember that expression,
while w.conti is easier to remember)

Of course, if you always need the mean, just add 
the second step to w.conti.

Agus


Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es


On 4 Jun 2002, Russell Senior wrote:
> 
> I've got a data frame that looks like this:
> 
>    subject   foo   bar
>       2      1.7   3.2
>       2      2.3   4.1
>       3      7.6   2.3
>       3      7.1   3.3
>       3      7.3   2.3
>       3      7.4   1.3
>       5      6.2   6.1
>       5      3.4   6.9
>      ...
> 
> That is, I've got multiple rows per subject.  I need to compute
> summaries within categories where the subject has the same number of
> rows.  For example, subject 2 and 5 both have two rows.  I need to
> compute mean for those four values of foo.  This looks like a good
> candidate for index vectors, but I need some help.  I've tried
> something like:
> 
>   table(data) -> tmp
>  
> and:
> 
>   tmp[tmp == 2]
> 
> and even:
> 
>   as.numeric(attr(tmp[tmp == 2],"names"))
> 
> to get a vector of subject numbers that have two rows in the original
> data frame.  But I am getting stuck there.  I want some kind of
> "is.member" function to use in a subsequent index vector
expression,
> like:
> 
>   i <- as.numeric(attr(tmp[tmp == 2],"names"))
>   data[is.member($subject,i)]$foo
> 
> but there isn't an is.member() function.  Can someone please give me a
> pointer on the canonical way to do this?
> 
> Thanks!
> 
> -- 
> Russell Senior         ``The two chiefs turned to each other.        
> seniorr at aracnet.com      Bellison uncorked a flood of horrible       
>                          profanity, which, translated meant, `This is
>                          extremely unusual.' ''
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jun 2002 - hairy indexing problem

[R] hairy indexing problem

[R] hairy indexing problem

[R] hairy indexing problem

[R] hairy indexing problem

[R] hairy indexing problem

[R] hairy indexing problem

Possibly Parallel Threads