thr3ads.net - R help - [R] Choosing columns by number [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Sam Albers

2015-Aug-25 15:17 UTC

[R] Choosing columns by number

Hi all,

This is a process question. How do folks efficiently identify column
numbers in a dataframe without manually counting them. For example, if I
want to choose columns from the iris dataframe I know of two options. I can
do this:
> str(iris)'data.frame':	150 obs. of  5 variables: $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
1 1 1 1 1 1 ...

or this:
> names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"  "Species"
Neither option explicitly identifies the column number so that I can
do something like this:

iris[,c(2,4)]

I feel like there must be a better way to do this so I wanted to ask
the collective wisdom here what people do to accomplish this.
Obviously this is a trivial example, but the issue really becomes
problematic when you have a large dataframe.

Thanks in advance!

Sam

	[[alternative HTML version deleted]]

Thierry Onkelinx

2015-Aug-25 15:28 UTC

head link

[R] Choosing columns by number

Here are a few ideas.

data.frame(
  seq_along(iris),
  colnames(iris)
)
which(colnames(iris) %in% c("Sepal.Width", "Petal.Width"))
grep("\\.Width$", colnames(iris))

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey


2015-08-25 17:17 GMT+02:00 Sam Albers <tonightsthenight at
gmail.com>:> Hi all,
>
> This is a process question. How do folks efficiently identify column
> numbers in a dataframe without manually counting them. For example, if I
> want to choose columns from the iris dataframe I know of two options. I can
> do this:
>
>> str(iris)'data.frame':        150 obs. of  5 variables:
>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
>  $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
> 1 1 1 1 1 1 ...
>
> or this:
>
>> names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"  "Species"
>
> Neither option explicitly identifies the column number so that I can
> do something like this:
>
> iris[,c(2,4)]
>
> I feel like there must be a better way to do this so I wanted to ask
> the collective wisdom here what people do to accomplish this.
> Obviously this is a trivial example, but the issue really becomes
> problematic when you have a large dataframe.
>
> Thanks in advance!
>
> Sam
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

2015-Aug-25 15:29 UTC

head link

[R] Choosing columns by number

> On Aug 25, 2015, at 10:17 AM, Sam Albers <tonightsthenight at
gmail.com> wrote:
> 
> Hi all,
> 
> This is a process question. How do folks efficiently identify column
> numbers in a dataframe without manually counting them. For example, if I
> want to choose columns from the iris dataframe I know of two options. I can
> do this:
> 
>> str(iris)'data.frame':	150 obs. of  5 variables:
> $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
> $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
> $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
> 1 1 1 1 1 1 ...
> 
> or this:
> 
>> names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"  "Species"
> 
> Neither option explicitly identifies the column number so that I can
> do something like this:
> 
> iris[,c(2,4)]
> 
> I feel like there must be a better way to do this so I wanted to ask
> the collective wisdom here what people do to accomplish this.
> Obviously this is a trivial example, but the issue really becomes
> problematic when you have a large dataframe.
> 
> Thanks in advance!
> 
> Sam

Just use ?subset:

  NewDF <- subset(iris, select = c(Sepal.Width, Petal.Width))

which is the same as:

  NewDF <- iris[, c(2, 4)]

You can also define sequential columns using ?:?, thus:

  NewDF <- subset(iris, select = c(Sepal.Width:Petal.Width)

is the same as:

  NewDF <- iris[, 2:4]

and use combinations of the two approaches as well.

You can also negate the selection by using:

  select = -c(?)

That avoids having to worry about using integer indices.

Regards,

Marc Schwartz

David Winsemius

2015-Aug-25 15:29 UTC

head link

[R] Choosing columns by number

On Aug 25, 2015, at 8:17 AM, Sam Albers wrote:
> Hi all,
> 
> This is a process question. How do folks efficiently identify column
> numbers in a dataframe without manually counting them. For example, if I
> want to choose columns from the iris dataframe I know of two options. I can
> do this:
> 
>> str(iris)'data.frame':	150 obs. of  5 variables:
> $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
> $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
> $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
> 1 1 1 1 1 1 ...
> 
> or this:
> 
>> names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"  "Species"
> 
> Neither option explicitly identifies the column number so that I can
> do something like this:
> 
> iris[,c(2,4)]
The request to "identify column numbers" seems a bit vague at the
moment because it misses any criterion for such "identification". If
your goal is to construct a vector that "identified" (by number) the
names of the columns that contained the text "Width" it would be:

grep("Width",   names(iris) )

You do need some rule ... which you never articulated.
> 
> I feel like there must be a better way to do this so I wanted to ask
> the collective wisdom here what people do to accomplish this.
> Obviously this is a trivial example, but the issue really becomes
> problematic when you have a large dataframe.
> 
> Thanks in advance!
> 
> Sam
> 
> 	[[alternative HTML version deleted]]
Still posting in HTML? Having trouble finding the Posting Guide? Can't find
the mechanism in gmail to send plain text? What is the problem?

-- 


David Winsemius
Alameda, CA, USA

stephen sefick

2015-Aug-25 15:32 UTC

head link

[R] Choosing columns by number

?grep

I think this will do what you want.

#something like
a <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10))

toMatch <- c("a", "d")

grep(paste(toMatch,collapse="|"), colnames(a))

#to subset
a[,grep(paste(toMatch,collapse="|"), colnames(a))]


On Tue, Aug 25, 2015 at 10:17 AM, Sam Albers <tonightsthenight at
gmail.com>
wrote:
> Hi all,
>
> This is a process question. How do folks efficiently identify column
> numbers in a dataframe without manually counting them. For example, if I
> want to choose columns from the iris dataframe I know of two options. I can
> do this:
>
> > str(iris)'data.frame':        150 obs. of  5 variables:
>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
>  $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
> 1 1 1 1 1 1 ...
>
> or this:
>
> > names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length"
> "Petal.Width"  "Species"
>
> Neither option explicitly identifies the column number so that I can
> do something like this:
>
> iris[,c(2,4)]
>
> I feel like there must be a better way to do this so I wanted to ask
> the collective wisdom here what people do to accomplish this.
> Obviously this is a trivial example, but the issue really becomes
> problematic when you have a large dataframe.
>
> Thanks in advance!
>
> Sam
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Stephen Sefick
**************************************************
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**************************************************
sas0025 at auburn.edu
http://www.auburn.edu/~sas0025
**************************************************

Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods.  We are mammals, and have not exhausted the annoying little
problems of being mammals.

                                -K. Mullis

"A big computer, a complex algorithm and a long time does not equal
science."

                              -Robert Gentleman

	[[alternative HTML version deleted]]

K. Elo

2015-Aug-25 15:32 UTC

head link

[R] Choosing columns by number

Hi!

25.08.2015, 18:17, Sam Albers wrote:> Hi all,
>
> This is a process question. How do folks efficiently identify column
> numbers in a dataframe without manually counting them. For example, if I
> want to choose columns from the iris dataframe I know of two options. I can
> do this:
>
>> str(iris)'data.frame':	150 obs. of  5 variables:
>   $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>   $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>   $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
>   $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
>   $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
> 1 1 1 1 1 1 ...
>
> or this:
>
>> names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"  "Species"
>
> Neither option explicitly identifies the column number so that I can
> do something like this:
>
> iris[,c(2,4)]
>
> I feel like there must be a better way to do this so I wanted to ask
> the collective wisdom here what people do to accomplish this.
> Obviously this is a trivial example, but the issue really becomes
> problematic when you have a large dataframe.
Maybe with 'which'?

 > which(colnames(iris)=="Sepal.Length")
[1] 1

Or did I somehow misunderstood what you are looking for?

HTH,
Kimmo

Sam Albers

2015-Aug-25 15:44 UTC

head link

[R] Choosing columns by number

Thierry's answer of:

data.frame(
  seq_along(iris),
  colnames(iris)
)

is exactly what I was looking for. Apologies for vagueness and HTML.
It was unintended.

Sam

On Tue, Aug 25, 2015 at 8:32 AM, stephen sefick <ssefick at gmail.com>
wrote:> ?grep
>
> I think this will do what you want.
>
> #something like
> a <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10), d=rnorm(10))
>
> toMatch <- c("a", "d")
>
> grep(paste(toMatch,collapse="|"), colnames(a))
>
> #to subset
> a[,grep(paste(toMatch,collapse="|"), colnames(a))]
>
>
> On Tue, Aug 25, 2015 at 10:17 AM, Sam Albers <tonightsthenight at
gmail.com>
> wrote:
>>
>> Hi all,
>>
>> This is a process question. How do folks efficiently identify column
>> numbers in a dataframe without manually counting them. For example, if
I
>> want to choose columns from the iris dataframe I know of two options. I
>> can
>> do this:
>>
>> > str(iris)'data.frame':        150 obs. of  5 variables:
>>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
>>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
>>  $ Species     : Factor w/ 3 levels
"setosa","versicolor",..: 1 1 1 1
>> 1 1 1 1 1 1 ...
>>
>> or this:
>>
>> > names(iris)[1] "Sepal.Length" "Sepal.Width" 
"Petal.Length"
>> > "Petal.Width"  "Species"
>>
>> Neither option explicitly identifies the column number so that I can
>> do something like this:
>>
>> iris[,c(2,4)]
>>
>> I feel like there must be a better way to do this so I wanted to ask
>> the collective wisdom here what people do to accomplish this.
>> Obviously this is a trivial example, but the issue really becomes
>> problematic when you have a large dataframe.
>>
>> Thanks in advance!
>>
>> Sam
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Stephen Sefick
> **************************************************
> Auburn University
> Biological Sciences
> 331 Funchess Hall
> Auburn, Alabama
> 36849
> **************************************************
> sas0025 at auburn.edu
> http://www.auburn.edu/~sas0025
> **************************************************
>
> Let's not spend our time and resources thinking about things that are
so
> little or so large that all they really do for us is puff us up and make us
> feel like gods.  We are mammals, and have not exhausted the annoying little
> problems of being mammals.
>
>                                 -K. Mullis
>
> "A big computer, a complex algorithm and a long time does not equal
> science."
>
>                               -Robert Gentleman
>

R help - Aug 2015 - Choosing columns by number

[R] Choosing columns by number

[R] Choosing columns by number

[R] Choosing columns by number

[R] Choosing columns by number

[R] Choosing columns by number

[R] Choosing columns by number

[R] Choosing columns by number