Hi all, This is a process question. How do folks efficiently identify column numbers in a dataframe without manually counting them. For example, if I want to choose columns from the iris dataframe I know of two options. I can do this:> str(iris)'data.frame': 150 obs. of 5 variables:$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... or this:> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"Neither option explicitly identifies the column number so that I can do something like this: iris[,c(2,4)] I feel like there must be a better way to do this so I wanted to ask the collective wisdom here what people do to accomplish this. Obviously this is a trivial example, but the issue really becomes problematic when you have a large dataframe. Thanks in advance! Sam [[alternative HTML version deleted]]
Here are a few ideas. data.frame( seq_along(iris), colnames(iris) ) which(colnames(iris) %in% c("Sepal.Width", "Petal.Width")) grep("\\.Width$", colnames(iris)) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-08-25 17:17 GMT+02:00 Sam Albers <tonightsthenight at gmail.com>:> Hi all, > > This is a process question. How do folks efficiently identify column > numbers in a dataframe without manually counting them. For example, if I > want to choose columns from the iris dataframe I know of two options. I can > do this: > >> str(iris)'data.frame': 150 obs. of 5 variables: > $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... > $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... > $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... > $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 > 1 1 1 1 1 1 ... > > or this: > >> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" > > Neither option explicitly identifies the column number so that I can > do something like this: > > iris[,c(2,4)] > > I feel like there must be a better way to do this so I wanted to ask > the collective wisdom here what people do to accomplish this. > Obviously this is a trivial example, but the issue really becomes > problematic when you have a large dataframe. > > Thanks in advance! > > Sam > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> On Aug 25, 2015, at 10:17 AM, Sam Albers <tonightsthenight at gmail.com> wrote: > > Hi all, > > This is a process question. How do folks efficiently identify column > numbers in a dataframe without manually counting them. For example, if I > want to choose columns from the iris dataframe I know of two options. I can > do this: > >> str(iris)'data.frame': 150 obs. of 5 variables: > $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... > $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... > $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... > $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 > 1 1 1 1 1 1 ... > > or this: > >> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" > > Neither option explicitly identifies the column number so that I can > do something like this: > > iris[,c(2,4)] > > I feel like there must be a better way to do this so I wanted to ask > the collective wisdom here what people do to accomplish this. > Obviously this is a trivial example, but the issue really becomes > problematic when you have a large dataframe. > > Thanks in advance! > > SamJust use ?subset: NewDF <- subset(iris, select = c(Sepal.Width, Petal.Width)) which is the same as: NewDF <- iris[, c(2, 4)] You can also define sequential columns using ?:?, thus: NewDF <- subset(iris, select = c(Sepal.Width:Petal.Width) is the same as: NewDF <- iris[, 2:4] and use combinations of the two approaches as well. You can also negate the selection by using: select = -c(?) That avoids having to worry about using integer indices. Regards, Marc Schwartz
On Aug 25, 2015, at 8:17 AM, Sam Albers wrote:> Hi all, > > This is a process question. How do folks efficiently identify column > numbers in a dataframe without manually counting them. For example, if I > want to choose columns from the iris dataframe I know of two options. I can > do this: > >> str(iris)'data.frame': 150 obs. of 5 variables: > $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... > $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... > $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... > $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 > 1 1 1 1 1 1 ... > > or this: > >> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" > > Neither option explicitly identifies the column number so that I can > do something like this: > > iris[,c(2,4)]The request to "identify column numbers" seems a bit vague at the moment because it misses any criterion for such "identification". If your goal is to construct a vector that "identified" (by number) the names of the columns that contained the text "Width" it would be: grep("Width", names(iris) ) You do need some rule ... which you never articulated.> > I feel like there must be a better way to do this so I wanted to ask > the collective wisdom here what people do to accomplish this. > Obviously this is a trivial example, but the issue really becomes > problematic when you have a large dataframe. > > Thanks in advance! > > Sam > > [[alternative HTML version deleted]]Still posting in HTML? Having trouble finding the Posting Guide? Can't find the mechanism in gmail to send plain text? What is the problem? -- David Winsemius Alameda, CA, USA
?grep I think this will do what you want. #something like a <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10)) toMatch <- c("a", "d") grep(paste(toMatch,collapse="|"), colnames(a)) #to subset a[,grep(paste(toMatch,collapse="|"), colnames(a))] On Tue, Aug 25, 2015 at 10:17 AM, Sam Albers <tonightsthenight at gmail.com> wrote:> Hi all, > > This is a process question. How do folks efficiently identify column > numbers in a dataframe without manually counting them. For example, if I > want to choose columns from the iris dataframe I know of two options. I can > do this: > > > str(iris)'data.frame': 150 obs. of 5 variables: > $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... > $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... > $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... > $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 > 1 1 1 1 1 1 ... > > or this: > > > names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" > "Petal.Width" "Species" > > Neither option explicitly identifies the column number so that I can > do something like this: > > iris[,c(2,4)] > > I feel like there must be a better way to do this so I wanted to ask > the collective wisdom here what people do to accomplish this. > Obviously this is a trivial example, but the issue really becomes > problematic when you have a large dataframe. > > Thanks in advance! > > Sam > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Stephen Sefick ************************************************** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ************************************************** sas0025 at auburn.edu http://www.auburn.edu/~sas0025 ************************************************** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis "A big computer, a complex algorithm and a long time does not equal science." -Robert Gentleman [[alternative HTML version deleted]]
Hi! 25.08.2015, 18:17, Sam Albers wrote:> Hi all, > > This is a process question. How do folks efficiently identify column > numbers in a dataframe without manually counting them. For example, if I > want to choose columns from the iris dataframe I know of two options. I can > do this: > >> str(iris)'data.frame': 150 obs. of 5 variables: > $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... > $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... > $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... > $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 > 1 1 1 1 1 1 ... > > or this: > >> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" > > Neither option explicitly identifies the column number so that I can > do something like this: > > iris[,c(2,4)] > > I feel like there must be a better way to do this so I wanted to ask > the collective wisdom here what people do to accomplish this. > Obviously this is a trivial example, but the issue really becomes > problematic when you have a large dataframe.Maybe with 'which'? > which(colnames(iris)=="Sepal.Length") [1] 1 Or did I somehow misunderstood what you are looking for? HTH, Kimmo
Thierry's answer of: data.frame( seq_along(iris), colnames(iris) ) is exactly what I was looking for. Apologies for vagueness and HTML. It was unintended. Sam On Tue, Aug 25, 2015 at 8:32 AM, stephen sefick <ssefick at gmail.com> wrote:> ?grep > > I think this will do what you want. > > #something like > a <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10)) > > toMatch <- c("a", "d") > > grep(paste(toMatch,collapse="|"), colnames(a)) > > #to subset > a[,grep(paste(toMatch,collapse="|"), colnames(a))] > > > On Tue, Aug 25, 2015 at 10:17 AM, Sam Albers <tonightsthenight at gmail.com> > wrote: >> >> Hi all, >> >> This is a process question. How do folks efficiently identify column >> numbers in a dataframe without manually counting them. For example, if I >> want to choose columns from the iris dataframe I know of two options. I >> can >> do this: >> >> > str(iris)'data.frame': 150 obs. of 5 variables: >> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... >> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... >> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... >> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... >> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 >> 1 1 1 1 1 1 ... >> >> or this: >> >> > names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" >> > "Petal.Width" "Species" >> >> Neither option explicitly identifies the column number so that I can >> do something like this: >> >> iris[,c(2,4)] >> >> I feel like there must be a better way to do this so I wanted to ask >> the collective wisdom here what people do to accomplish this. >> Obviously this is a trivial example, but the issue really becomes >> problematic when you have a large dataframe. >> >> Thanks in advance! >> >> Sam >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Stephen Sefick > ************************************************** > Auburn University > Biological Sciences > 331 Funchess Hall > Auburn, Alabama > 36849 > ************************************************** > sas0025 at auburn.edu > http://www.auburn.edu/~sas0025 > ************************************************** > > Let's not spend our time and resources thinking about things that are so > little or so large that all they really do for us is puff us up and make us > feel like gods. We are mammals, and have not exhausted the annoying little > problems of being mammals. > > -K. Mullis > > "A big computer, a complex algorithm and a long time does not equal > science." > > -Robert Gentleman >