On Fri, 2007-09-28 at 17:48 -0400, Brian Perron wrote:> Hello all,
>
> An elementary question that I am sure can be easily cracked by an R
> enthusiast. Let's say I have multiple scores (y) on subjects
(x.sample).
> Some subjects have a few more scores than others. Can somebody suggest
some
> code that will select the first score for each subject?
>
> For example, the following code generates scores for 5 subjects:
>
> > x <- c(1:5)
> > x.sample <- sample(x, 20, replace = TRUE)
> > x.sample <- sort(x.sample)
> > y <- rnorm(20)
> > z <- cbind(x.sample, y)
> > z
>
> x.sample y
> [1,] 1 -1.2006469
> [2,] 1 0.7615261
> [3,] 1 -0.1287516
> [4,] 1 - 1.1796474
> [5,] 1 -1.2902519
> [6,] 2 -0.1614918
> [7,] 2 -0.1464773
> [8,] 2 -0.8875417
> [9,] 2 0.3062891
> [10,] 2 0.4398530
> [11,] 3 -0.5717729
> [12,] 3 - 0.2938118
> [13,] 4 -0.2398887
> [14,] 4 0.8425419
> [15,] 4 2.5269801
> [16,] 4 -0.3643613
> [17,] 5 1.1690564
> [18,] 5 -0.7644521
> [19,] 5 1.4178982
> [20,] 5 - 0.8198921
>
> I am only interested in extracting the first score (y) for each unique
> subject (x.sample). So, I would like to generate the following output.
>
> x.sample y
> [1,] 1 -1.2006469
> [2,] 2 -0.1614918
> [3,] 3 -0.5717729
> [4,] 4 -0.2398887
> [5,] 5 1.1690564
>
> Any assistance would be greatly appreciated.
>
> Regards,
> Brian
See ?split, ?sapply and ?unique.
Then try this:
> cbind(unique(z[, 1]), sapply(split(z[, 2], z[, 1]), "[", 1))
[,1] [,2]
1 1 -1.2006469
2 2 -0.1614918
3 3 -0.5717729
4 4 -0.2398887
5 5 1.1690564
The key part of that is:
> split(z[, 2], z[, 1])
$`1`
[1] -1.2006469 0.7615261 -0.1287516 -1.1796474 -1.2902519
$`2`
[1] -0.1614918 -0.1464773 -0.8875417 0.3062891 0.4398530
$`3`
[1] -0.5717729 -0.2938118
$`4`
[1] -0.2398887 0.8425419 2.5269801 -0.3643613
$`5`
[1] 1.1690564 -0.7644521 1.4178982 -0.8198921
which splits 'z' by the values in the first column.
Then we use sapply() to go through the list and subset the first element
in each vector:
> sapply(split(z[, 2], z[, 1]), "[", 1)
1 2 3 4 5
-1.2006469 -0.1614918 -0.5717729 -0.2398887 1.1690564
Then we cbind() that result to the unique values in the first column.
HTH,
Marc Schwartz