thr3ads.net - R help - [R] Referencing variable names rather than column numbers [Dec 2009]

If this information is useful, please help other people find it:
Share via:

John-Paul Ferguson

2009-Dec-05 16:22 UTC

[R] Referencing variable names rather than column numbers

I apologize for how basic a question this is. I am a Stata user who
has begun using R, and the syntax differences still trip me up. The
most basic questions, involving as they do general terms, can be the
hardest to find solutions for through search.

Assume for the moment that I have a dataset that contains seven
variables: Pollution, Temp, Industry, Population, Wind, Rain and
Wet.days. (This actual dataset is taken from Michael Crawley's
"Statistics: An Introduction Using R" and is available as
"pollute.txt" in
http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
Assume I have attached pollute. Then

cor(pollute)

will give me the correlation table for these seven variables. If I
would prefer only to see the correlations between, say, Pollution,
Temp and Industry, I can get that with

cor(pollute[,1:3])

or with

cor(pollute[1:3])

Similarly, I can see the correlations between Temp, Population and Rain with

cor(pollute[,c(2,4,6)])

or with

cor(pollute[c(2,4,6)])

This is fine for a seven-variable dataset. When I have 250 variables,
though, I start to pale at looking up column indexes over and over. I
know from reading the list archives that I can extract the column
index of Industry, for example, by typing

which("Industry"==names(pollute))

but doing that before each command seems dire. Trained to using Stata
as I am, I am inclined to check the correlation of the first three or
the second, fourth and sixth columns by substituting the column names
for the column indexes--something like the following:

cor(pollute[Pollution:Industry])
cor(pollute[c(Temp,Population,Rain)])

These however throw errors.

I know that many commands in R are perfectly happy to take variable
names--the regression models, for example--but that some do not. And
so I ask you two general questions:

1. Is there a syntax for referring to variable names rather than
column indexes in situations like these?
2. Is there something that I should look for in a command's help file
that often indicates whether it can take column names rather than
indexes?

Again, apologies for asking something that has likely been asked
before. I would appreciate any suggestions that you have.

Best,
John-Paul Ferguson
Assistant Professor of Organizational Behavior
Stanford University Graduate School of Business
518 Memorial Way, K313
Stanford, CA 94305

baptiste auguie

2009-Dec-05 16:30 UTC

head link

[R] Referencing variable names rather than column numbers

Hi,

Try this,

cor(pollute[ ,c("Pollution","Temp","Industry")])

and ?"[" in particular,
"Character vectors will be matched to the names of the object "

HTH,

baptiste

2009/12/5 John-Paul Ferguson <ferguson_john-paul at
gsb.stanford.edu>:> I apologize for how basic a question this is. I am a Stata user who
> has begun using R, and the syntax differences still trip me up. The
> most basic questions, involving as they do general terms, can be the
> hardest to find solutions for through search.
>
> Assume for the moment that I have a dataset that contains seven
> variables: Pollution, Temp, Industry, Population, Wind, Rain and
> Wet.days. (This actual dataset is taken from Michael Crawley's
> "Statistics: An Introduction Using R" and is available as
> "pollute.txt" in
> http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
> Assume I have attached pollute. Then
>
> cor(pollute)
>
> will give me the correlation table for these seven variables. If I
> would prefer only to see the correlations between, say, Pollution,
> Temp and Industry, I can get that with
>
> cor(pollute[,1:3])
>
> or with
>
> cor(pollute[1:3])
>
> Similarly, I can see the correlations between Temp, Population and Rain
with
>
> cor(pollute[,c(2,4,6)])
>
> or with
>
> cor(pollute[c(2,4,6)])
>
> This is fine for a seven-variable dataset. When I have 250 variables,
> though, I start to pale at looking up column indexes over and over. I
> know from reading the list archives that I can extract the column
> index of Industry, for example, by typing
>
> which("Industry"==names(pollute))
>
> but doing that before each command seems dire. Trained to using Stata
> as I am, I am inclined to check the correlation of the first three or
> the second, fourth and sixth columns by substituting the column names
> for the column indexes--something like the following:
>
> cor(pollute[Pollution:Industry])
> cor(pollute[c(Temp,Population,Rain)])
>
> These however throw errors.
>
> I know that many commands in R are perfectly happy to take variable
> names--the regression models, for example--but that some do not. And
> so I ask you two general questions:
>
> 1. Is there a syntax for referring to variable names rather than
> column indexes in situations like these?
> 2. Is there something that I should look for in a command's help file
> that often indicates whether it can take column names rather than
> indexes?
>
> Again, apologies for asking something that has likely been asked
> before. I would appreciate any suggestions that you have.
>
> Best,
> John-Paul Ferguson
> Assistant Professor of Organizational Behavior
> Stanford University Graduate School of Business
> 518 Memorial Way, K313
> Stanford, CA 94305
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Ista Zahn

2009-Dec-05 16:43 UTC

head link

[R] Referencing variable names rather than column numbers

As baptiste noted, you can do

cor(pollute[ ,c("Pollution","Temp","Industry")]).

But

cor(pollute[,"Pollution":"Industry"])

will not work. For that you can do

cor(pollute[
,which(names(pollute)=="Pollution"):which(names(pollute)=="Industry")])

-Ista

On Sat, Dec 5, 2009 at 11:22 AM, John-Paul Ferguson
<ferguson_john-paul at gsb.stanford.edu> wrote:> I apologize for how basic a question this is. I am a Stata user who
> has begun using R, and the syntax differences still trip me up. The
> most basic questions, involving as they do general terms, can be the
> hardest to find solutions for through search.
>
> Assume for the moment that I have a dataset that contains seven
> variables: Pollution, Temp, Industry, Population, Wind, Rain and
> Wet.days. (This actual dataset is taken from Michael Crawley's
> "Statistics: An Introduction Using R" and is available as
> "pollute.txt" in
> http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
> Assume I have attached pollute. Then
>
> cor(pollute)
>
> will give me the correlation table for these seven variables. If I
> would prefer only to see the correlations between, say, Pollution,
> Temp and Industry, I can get that with
>
> cor(pollute[,1:3])
>
> or with
>
> cor(pollute[1:3])
>
> Similarly, I can see the correlations between Temp, Population and Rain
with
>
> cor(pollute[,c(2,4,6)])
>
> or with
>
> cor(pollute[c(2,4,6)])
>
> This is fine for a seven-variable dataset. When I have 250 variables,
> though, I start to pale at looking up column indexes over and over. I
> know from reading the list archives that I can extract the column
> index of Industry, for example, by typing
>
> which("Industry"==names(pollute))
>
> but doing that before each command seems dire. Trained to using Stata
> as I am, I am inclined to check the correlation of the first three or
> the second, fourth and sixth columns by substituting the column names
> for the column indexes--something like the following:
>
> cor(pollute[Pollution:Industry])
> cor(pollute[c(Temp,Population,Rain)])
>
> These however throw errors.
>
> I know that many commands in R are perfectly happy to take variable
> names--the regression models, for example--but that some do not. And
> so I ask you two general questions:
>
> 1. Is there a syntax for referring to variable names rather than
> column indexes in situations like these?
> 2. Is there something that I should look for in a command's help file
> that often indicates whether it can take column names rather than
> indexes?
>
> Again, apologies for asking something that has likely been asked
> before. I would appreciate any suggestions that you have.
>
> Best,
> John-Paul Ferguson
> Assistant Professor of Organizational Behavior
> Stanford University Graduate School of Business
> 518 Memorial Way, K313
> Stanford, CA 94305
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Jorge Ivan Velez

2009-Dec-05 16:54 UTC

head link

[R] Referencing variable names rather than column numbers

Dear John-Paul,

Take a look at https://stat.ethz.ch/pipermail/r-help/2009-July/204027.html It
contains different ways to do (in part) what you want.

HTH,
Jorge

On Sat, Dec 5, 2009 at 11:22 AM, John-Paul Ferguson <> wrote:
> I apologize for how basic a question this is. I am a Stata user who
> has begun using R, and the syntax differences still trip me up. The
> most basic questions, involving as they do general terms, can be the
> hardest to find solutions for through search.
>
> Assume for the moment that I have a dataset that contains seven
> variables: Pollution, Temp, Industry, Population, Wind, Rain and
> Wet.days. (This actual dataset is taken from Michael Crawley's
> "Statistics: An Introduction Using R" and is available as
> "pollute.txt" in
> http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
> Assume I have attached pollute. Then
>
> cor(pollute)
>
> will give me the correlation table for these seven variables. If I
> would prefer only to see the correlations between, say, Pollution,
> Temp and Industry, I can get that with
>
> cor(pollute[,1:3])
>
> or with
>
> cor(pollute[1:3])
>
> Similarly, I can see the correlations between Temp, Population and Rain
> with
>
> cor(pollute[,c(2,4,6)])
>
> or with
>
> cor(pollute[c(2,4,6)])
>
> This is fine for a seven-variable dataset. When I have 250 variables,
> though, I start to pale at looking up column indexes over and over. I
> know from reading the list archives that I can extract the column
> index of Industry, for example, by typing
>
> which("Industry"==names(pollute))
>
> but doing that before each command seems dire. Trained to using Stata
> as I am, I am inclined to check the correlation of the first three or
> the second, fourth and sixth columns by substituting the column names
> for the column indexes--something like the following:
>
> cor(pollute[Pollution:Industry])
> cor(pollute[c(Temp,Population,Rain)])
>
> These however throw errors.
>
> I know that many commands in R are perfectly happy to take variable
> names--the regression models, for example--but that some do not. And
> so I ask you two general questions:
>
> 1. Is there a syntax for referring to variable names rather than
> column indexes in situations like these?
> 2. Is there something that I should look for in a command's help file
> that often indicates whether it can take column names rather than
> indexes?
>
> Again, apologies for asking something that has likely been asked
> before. I would appreciate any suggestions that you have.
>
> Best,
> John-Paul Ferguson
> Assistant Professor of Organizational Behavior
> Stanford University Graduate School of Business
> 518 Memorial Way, K313
> Stanford, CA 94305
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R help - Dec 2009 - Referencing variable names rather than column numbers

[R] Referencing variable names rather than column numbers

[R] Referencing variable names rather than column numbers

[R] Referencing variable names rather than column numbers

[R] Referencing variable names rather than column numbers

Maybe Matching Threads