thr3ads.net - R help - [R] subset using noncontiguous variables by name (not index) [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Muenchen, Robert A (Bob)

2007-Aug-26 20:37 UTC

[R] subset using noncontiguous variables by name (not index)

Hi All,

I'm using the subset function to select a list of variables, some of
which are contiguous in the data frame, and others of which are not. It
works fine when I use the form:

subset(mydata,select=c(x1,x3:x5,x7) )

In reality, my list is far more complex. So I would like to store it in
a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
work. That use of the c function seems to violate R rules, so I'm not
sure how it works at all. A small simulation of the problem is below. 

If the variable names & orders were really this simple, I could use
indices like 

summary( mydata[ ,c(1,3:5,7) ] ) 

but alas, they are not. 

How does the c function work this way in the first place, and how can I
make this substitution?

Thanks,
Bob

mydata <- data.frame(
  x1=c(1,2,3,4,5),
  x2=c(1,2,3,4,5),
  x3=c(1,2,3,4,5),
  x4=c(1,2,3,4,5),
  x5=c(1,2,3,4,5),
  x6=c(1,2,3,4,5),
  x7=c(1,2,3,4,5)
)
mydata

# This does what I want.
summary( 
  subset(mydata,select=c(x1,x3:x5,x7) ) 
)

# Can I substitute myVars?
attach(mydata)
myVars1 <- c(x1,x3:x5,x7)

# Not looking good!
myVars1

# This doesn't do the right thing.
summary( 
  subset(mydata,select=myVars1 ) 
)

# Total desperation on this attempt:
myVars2 <- "x1,x3:x5,x7"
myVars2

# This doesn't work either.
summary( 
  subset(mydata,select=myVars2 )
)



========================================================Bob Muenchen (pronounced
Min'-chen), Manager
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: muenchen at utk.edu
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

Gabor Grothendieck

2007-Aug-26 21:09 UTC

head link

[R] subset using noncontiguous variables by name (not index)

Using builtin data frame anscombe try this. First we set up a data frame
anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
out from that data frame and unlist it to get the desired index vector.
> anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> anscombe[idx]   x1 x3 x4   y2
1  10 10  8 9.14
2   8  8  8 8.14
3  13 13  8 8.74
4   9  9  8 8.77
5  11 11  8 9.26
6  14 14  8 8.10
7   6  6  8 6.13
8   4  4 19 3.10
9  12 12  8 9.13
10  7  7  8 7.26
11  5  5  8 4.74


On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu>
wrote:> Hi All,
>
> I'm using the subset function to select a list of variables, some of
> which are contiguous in the data frame, and others of which are not. It
> works fine when I use the form:
>
> subset(mydata,select=c(x1,x3:x5,x7) )
>
> In reality, my list is far more complex. So I would like to store it in
> a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
> work. That use of the c function seems to violate R rules, so I'm not
> sure how it works at all. A small simulation of the problem is below.
>
> If the variable names & orders were really this simple, I could use
> indices like
>
> summary( mydata[ ,c(1,3:5,7) ] )
>
> but alas, they are not.
>
> How does the c function work this way in the first place, and how can I
> make this substitution?
>
> Thanks,
> Bob
>
> mydata <- data.frame(
>  x1=c(1,2,3,4,5),
>  x2=c(1,2,3,4,5),
>  x3=c(1,2,3,4,5),
>  x4=c(1,2,3,4,5),
>  x5=c(1,2,3,4,5),
>  x6=c(1,2,3,4,5),
>  x7=c(1,2,3,4,5)
> )
> mydata
>
> # This does what I want.
> summary(
>  subset(mydata,select=c(x1,x3:x5,x7) )
> )
>
> # Can I substitute myVars?
> attach(mydata)
> myVars1 <- c(x1,x3:x5,x7)
>
> # Not looking good!
> myVars1
>
> # This doesn't do the right thing.
> summary(
>  subset(mydata,select=myVars1 )
> )
>
> # Total desperation on this attempt:
> myVars2 <- "x1,x3:x5,x7"
> myVars2
>
> # This doesn't work either.
> summary(
>  subset(mydata,select=myVars2 )
> )
>
>
>
> ========================================================> Bob Muenchen
(pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: muenchen at utk.edu
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

François Pinard

2007-Aug-26 21:30 UTC

head link

[R] subset using noncontiguous variables by name (not index)

[Muenchen, Robert A (Bob)]
>I'm using the subset function to select a list of variables, some of
>which are contiguous in the data frame, and others of which are not. It
>works fine when I use the form:
>subset(mydata,select=c(x1,x3:x5,x7))
>In reality, my list is far more complex. So I would like to store it in
>a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
>work. That use of the c function seems to violate R rules, so I'm not
>sure how it works at all. A small simulation of the problem is below.  
>mydata <- data.frame(
>  x1=c(1,2,3,4,5),
>  x2=c(1,2,3,4,5),
>  x3=c(1,2,3,4,5),
>  x4=c(1,2,3,4,5),
>  x5=c(1,2,3,4,5),
>  x6=c(1,2,3,4,5),
>  x7=c(1,2,3,4,5)
>)
>mydata
># This does what I want.
>summary(subset(mydata, select=c(x1, x3:x5, x7)))
Maybe:

  variables <- expression(c(x1, x3:x5, x7))

and later:

  summary(subset(mydata, select=eval(variables)))

However, I do not know how one computes the expression piecemeal, that 
is, better than by building a string and parsing the result.

-- 
Fran?ois Pinard   http://pinard.progiciels-bpi.ca

Muenchen, Robert A (Bob)

2007-Aug-27 18:28 UTC

head link

[R] subset using noncontiguous variables by name (not index)

Thanks for helping me see why R doesn't have the "obvious"! -Bob
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> Sent: Monday, August 27, 2007 2:12 PM
> To: Muenchen, Robert A (Bob)
> Subject: RE: [R] subset using noncontiguous variables by name (not
> index)
> 
> On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
> 
> > Thomas, that's a good point. I was thinking of anscombe[x1::y1]
> making
> > it clear which one, but you would then want just x1::y1 to have
> > unambiguous meaning on its own, which is impossible.
> >
> > As for x1:xN, it's unambiguous on its own.
> 
> 
> It actually isn't. We already have a meaning. Consider
>    x1<-4
>    xN<-6
>    x1:xN
> It also breaks R's argument passing rules by treating x1 as string
> rather than a name.
> 
> What would be unambiguous at the moment is "x1":"x4",
provided there
> was a sufficiently precise set of rules on what was allowed. Consider
>   "x1":"x-1"    (negative?)
>   "x1":"x3.14"  (non-integer?)
>   "x3.12":"x3.14" (is the prefix x or x3.?)
>   "x1":"X4"     (the prefix changes)
>   "01":"14"     (is the prefix empty or 0?)
>   "x09":"xA2"     (is this illegal decimal or legal
hexadecimal?)
>   "IL23R1":"IL23R4" (what is the prefix?)
>   "x1a":"x4a"    (infix numbering?)
> 
> 
> 
>       -thomas
> 
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
>

Possibly Parallel Threads

Search for more maybe matching threads

R help - Aug 2007 - subset using noncontiguous variables by name (not index)

[R] subset using noncontiguous variables by name (not index)

[R] subset using noncontiguous variables by name (not index)

[R] subset using noncontiguous variables by name (not index)

[R] subset using noncontiguous variables by name (not index)

Possibly Parallel Threads