thr3ads.net - R help - [R] Sweep out control [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Thaler,Thorn,LAUSANNE,Applied Mathematics

2012-Dec-10 15:29 UTC

[R] Sweep out control

Dear all,

Assume that I have the following data structure:

d <- expand.grid(subj=1:5, time=1:3, treatment=LETTERS[1:3])
d$value <- 10 ^ (as.numeric(d$treatment) + 1) + 10 * d$subj + d$time
d$value2 <- 100000 + d$value

where d$treatment == "C" stands for my control group. What I want to
achieve now is to subtract the values corresponding to d$treatment ==
"C" from all values in order to get the difference between the
treatments. If I do that by hand, it will look like:

va <- rep(d$value[d$treatment == "C"], 3) # don't need to rep
because R would do the recycling for me anyways
d$value - va
va2 <- rep(d$value2[d$treatment == "C"], 3)
d$value2 - va2

This works because the data frame is sorted in the right way and all cases are
present. Furthermore, it would be a bit elaborative if you want to that for more
than a couple of columns and it is not very error prone nor scalable (what if
somebody changes the order of the data frame before, or somebody assumes that
the data frame is in a certain order afterwards? If I want to add some columns
later, I have to  add new lines. What if some cases are missing?) Thus, this
approach is clearly not a good one, especially since I don't like solutions
which depend on a certain order.

So my questions:
1. Is there a ready made solution for that?
2. If not (what I assume), what would be an elegant way of solving this? Is the
only way to sort the data? Not that I have any problem with sorting, but I would
appreciate any solution which works w/o sorting, because I don't want to run
into the risk of having issues downstream with people who assume a certain order
in the data (which is of course anyways a no-go, but I assume that the time to
find a solution w/o altering the order is shorter than the time it takes to
educate these guys [not on the long run though, but this battle has to be fought
later] ;)
3. This solution should be easily extendable to an arbitrary set of columns and
should work with missing cases for the treatments like d <- d[-c(2, 21)]

Thanks for your input, I am looking forward to your suggestions.


Kind Regards,

Thorn Thaler
Mathematician

Applied Mathematics 
Nestec Ltd,
Nestl? Research Center
PO Box 44 
CH-1000 Lausanne 26
Phone: +41 21 785 8220
Fax: +41 21 785 9486

arun

2012-Dec-10 19:03 UTC

head link

[R] Sweep out control

Hi,

Not sure if this helps you:
res<-do.call(rbind,lapply(split(d,d$treatment),function(x)
{x$diff1<-x[,4]-(d[,4][d$treatment=="C"]);
x$diff2<-x[,5]-(d[,5][d$treatment=="C"]); return(x)}))
A.K.



----- Original Message -----
From: "Thaler,Thorn,LAUSANNE,Applied Mathematics" <Thorn.Thaler at
rdls.nestle.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Monday, December 10, 2012 10:29 AM
Subject: [R] Sweep out control

Dear all,

Assume that I have the following data structure:

d <- expand.grid(subj=1:5, time=1:3, treatment=LETTERS[1:3])
d$value <- 10 ^ (as.numeric(d$treatment) + 1) + 10 * d$subj + d$time
d$value2 <- 100000 + d$value

where d$treatment == "C" stands for my control group. What I want to
achieve now is to subtract the values corresponding to d$treatment ==
"C" from all values in order to get the difference between the
treatments. If I do that by hand, it will look like:

va <- rep(d$value[d$treatment == "C"], 3) # don't need to rep
because R would do the recycling for me anyways
d$value - va
va2 <- rep(d$value2[d$treatment == "C"], 3)
d$value2 - va2

This works because the data frame is sorted in the right way and all cases are
present. Furthermore, it would be a bit elaborative if you want to that for more
than a couple of columns and it is not very error prone nor scalable (what if
somebody changes the order of the data frame before, or somebody assumes that
the data frame is in a certain order afterwards? If I want to add some columns
later, I have to? add new lines. What if some cases are missing?) Thus, this
approach is clearly not a good one, especially since I don't like solutions
which depend on a certain order.

So my questions:
1. Is there a ready made solution for that?
2. If not (what I assume), what would be an elegant way of solving this? Is the
only way to sort the data? Not that I have any problem with sorting, but I would
appreciate any solution which works w/o sorting, because I don't want to run
into the risk of having issues downstream with people who assume a certain order
in the data (which is of course anyways a no-go, but I assume that the time to
find a solution w/o altering the order is shorter than the time it takes to
educate these guys [not on the long run though, but this battle has to be fought
later] ;)
3. This solution should be easily extendable to an arbitrary set of columns and
should work with missing cases for the treatments like d <- d[-c(2, 21)]

Thanks for your input, I am looking forward to your suggestions.


Kind Regards,

Thorn Thaler
Mathematician

Applied Mathematics 
Nestec Ltd,
Nestl? Research Center
PO Box 44 
CH-1000 Lausanne 26
Phone: +41 21 785 8220
Fax: +41 21 785 9486

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Dec 2012 - Sweep out control

[R] Sweep out control

[R] Sweep out control

Apparently Analagous Threads