On Sat, Feb 05, 2011 at 11:01:33AM +0100, Sascha Vieweg
wrote:> I have got data with one column indicating the area where the data
> was recorded:
>
> R: n <- 43
> R: df <- data.frame("area"=sample(1:7, n, repl=T),
"dat"=rnorm(n))
>
> In each of the 7 different areas I want to implement one of 7
> specific strategies. The assignment should be random. Therefore, I
> pair 7 areas with 7 strategies randomly by
>
> R: ass <- as.data.frame(cbind("area"=sample(1:7, 7),
> "strategy"=sample(1:7, 7)))
>
> Now I want to create a new variable indicating, which case in the
> original data should be assigned to which strategy. I thought
> about
>
> R: x <- numeric(n)
> R: for(i in 1:7){
> x[df[, "area"]==i] <- ass[ ass[, "area"]==i ,
"strategy"]
> }
>
> and then binding the new variable to the data frame
>
> R: str(df2 <- as.data.frame(cbind(df, "strategy"=x)))
>
> which works fine. My question is whether there is a more elegant
> way?
Hello.
If the table "ass" is sorted according to "area", then its
second
column may be used as a function mapping "area" to
"strategy". This
leads to the following
ass2 <- ass[order(ass[, "area"]), "strategy"]
y <- ass2[df[, "area"]]
identical(x, y + 0)
[1] TRUE
This also suggests that the same distribution on the random assignments
is obtained, if area is created already sorted and only the second
column of "ass" is random
ass <- as.data.frame(cbind("area"=1:7,
"strategy"=sample(1:7, 7)))
Whether creating only this table is sufficient, depends on the application.
Hope this helps.
Petr Savicky.