On Jul 9, 2009, at 10:40 AM, Juliet Hannah wrote:
> Hi,
>
> #make example data
> dat <- data.frame(matrix(rnorm(15),ncol=5))
> colnames(dat) <-
c("ab","cd","ef","gh","ij")
>
> If I want to get a subset of the data for the middle 3 columns, and I
> know the names of the start column and the end column, I can do this:
>
> mysub <- subset(dat,select=c(cd:gh))
>
> If I wanted to do this just on the column names, without subsetting
> the data, how could I do this?
>
> mynames <- colnames(dat);
>
> #mynames
> #[1] "ab" "cd" "ef" "gh"
"ij"
>
> Is there an easy way to create the vector
c("cd","ef","gh") as I did
> above using something similar to cd:gh?
>
> Thanks,
>
> Juliet
Using the same presumption that the desired values are consecutive in
the vector:
# Use which() to get the indices for the start and end of the subset
> mynames[which(mynames == "cd"):which(mynames == "gh")]
[1] "cd" "ef" "gh"
You can encapsulate that in a function:
subset.vector <- function(x, start, end)
{
x[which(x == start):which(x == end)]
}
> subset.vector(mynames, "cd", "gh")
[1] "cd" "ef" "gh"
Note that you can also do this:
> names(subset(dat, select = cd:gh))
[1] "cd" "ef" "gh"
but that actually goes through the process of subsetting the data
frame first, which potentially introduces a lot of overhead and memory
use if the data frame is large. It also presumes that the desired
vector is a subset of the column names of the initial data frame.
To use the same sequence based approach as is used in
subset.data.frame(), you can do what is used internally within that
function:
subset.vector <- function(x, select)
{
nl <- as.list(1L:length(x))
names(nl) <- x
vars <- eval(substitute(select), nl)
x[vars]
}
> subset.vector(mynames, select = cd:gh)
[1] "cd" "ef" "gh"
BTW, well done on recognizing that you can use the sequence of column
names for the 'select' argument. A lot of folks, even experienced
useRs, don't realize that you can do that... :-)
HTH,
Marc Schwartz