Muenchen, Robert A (Bob)
2007-Aug-26 20:37 UTC
[R] subset using noncontiguous variables by name (not index)
Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. If the variable names & orders were really this simple, I could use indices like summary( mydata[ ,c(1,3:5,7) ] ) but alas, they are not. How does the c function work this way in the first place, and how can I make this substitution? Thanks, Bob mydata <- data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary( subset(mydata,select=c(x1,x3:x5,x7) ) ) # Can I substitute myVars? attach(mydata) myVars1 <- c(x1,x3:x5,x7) # Not looking good! myVars1 # This doesn't do the right thing. summary( subset(mydata,select=myVars1 ) ) # Total desperation on this attempt: myVars2 <- "x1,x3:x5,x7" myVars2 # This doesn't work either. summary( subset(mydata,select=myVars2 ) ) ========================================================Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: muenchen at utk.edu Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html
Gabor Grothendieck
2007-Aug-26 21:09 UTC
[R] subset using noncontiguous variables by name (not index)
Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector.> anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe)) > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) > anscombe[idx]x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:> Hi All, > > I'm using the subset function to select a list of variables, some of > which are contiguous in the data frame, and others of which are not. It > works fine when I use the form: > > subset(mydata,select=c(x1,x3:x5,x7) ) > > In reality, my list is far more complex. So I would like to store it in > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to > work. That use of the c function seems to violate R rules, so I'm not > sure how it works at all. A small simulation of the problem is below. > > If the variable names & orders were really this simple, I could use > indices like > > summary( mydata[ ,c(1,3:5,7) ] ) > > but alas, they are not. > > How does the c function work this way in the first place, and how can I > make this substitution? > > Thanks, > Bob > > mydata <- data.frame( > x1=c(1,2,3,4,5), > x2=c(1,2,3,4,5), > x3=c(1,2,3,4,5), > x4=c(1,2,3,4,5), > x5=c(1,2,3,4,5), > x6=c(1,2,3,4,5), > x7=c(1,2,3,4,5) > ) > mydata > > # This does what I want. > summary( > subset(mydata,select=c(x1,x3:x5,x7) ) > ) > > # Can I substitute myVars? > attach(mydata) > myVars1 <- c(x1,x3:x5,x7) > > # Not looking good! > myVars1 > > # This doesn't do the right thing. > summary( > subset(mydata,select=myVars1 ) > ) > > # Total desperation on this attempt: > myVars2 <- "x1,x3:x5,x7" > myVars2 > > # This doesn't work either. > summary( > subset(mydata,select=myVars2 ) > ) > > > > ========================================================> Bob Muenchen (pronounced Min'-chen), Manager > Statistical Consulting Center > U of TN Office of Information Technology > 200 Stokely Management Center, Knoxville, TN 37996-0520 > Voice: (865) 974-5230 > FAX: (865) 974-4810 > Email: muenchen at utk.edu > Web: http://oit.utk.edu/scc, > News: http://listserv.utk.edu/archives/statnews.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
François Pinard
2007-Aug-26 21:30 UTC
[R] subset using noncontiguous variables by name (not index)
[Muenchen, Robert A (Bob)]>I'm using the subset function to select a list of variables, some of >which are contiguous in the data frame, and others of which are not. It >works fine when I use the form:>subset(mydata,select=c(x1,x3:x5,x7))>In reality, my list is far more complex. So I would like to store it in >a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to >work. That use of the c function seems to violate R rules, so I'm not >sure how it works at all. A small simulation of the problem is below.>mydata <- data.frame( > x1=c(1,2,3,4,5), > x2=c(1,2,3,4,5), > x3=c(1,2,3,4,5), > x4=c(1,2,3,4,5), > x5=c(1,2,3,4,5), > x6=c(1,2,3,4,5), > x7=c(1,2,3,4,5) >) >mydata># This does what I want. >summary(subset(mydata, select=c(x1, x3:x5, x7)))Maybe: variables <- expression(c(x1, x3:x5, x7)) and later: summary(subset(mydata, select=eval(variables))) However, I do not know how one computes the expression piecemeal, that is, better than by building a string and parsing the result. -- Fran?ois Pinard http://pinard.progiciels-bpi.ca
Muenchen, Robert A (Bob)
2007-Aug-27 18:28 UTC
[R] subset using noncontiguous variables by name (not index)
Thanks for helping me see why R doesn't have the "obvious"! -Bob> -----Original Message----- > From: Thomas Lumley [mailto:tlumley at u.washington.edu] > Sent: Monday, August 27, 2007 2:12 PM > To: Muenchen, Robert A (Bob) > Subject: RE: [R] subset using noncontiguous variables by name (not > index) > > On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote: > > > Thomas, that's a good point. I was thinking of anscombe[x1::y1] > making > > it clear which one, but you would then want just x1::y1 to have > > unambiguous meaning on its own, which is impossible. > > > > As for x1:xN, it's unambiguous on its own. > > > It actually isn't. We already have a meaning. Consider > x1<-4 > xN<-6 > x1:xN > It also breaks R's argument passing rules by treating x1 as string > rather than a name. > > What would be unambiguous at the moment is "x1":"x4", provided there > was a sufficiently precise set of rules on what was allowed. Consider > "x1":"x-1" (negative?) > "x1":"x3.14" (non-integer?) > "x3.12":"x3.14" (is the prefix x or x3.?) > "x1":"X4" (the prefix changes) > "01":"14" (is the prefix empty or 0?) > "x09":"xA2" (is this illegal decimal or legal hexadecimal?) > "IL23R1":"IL23R4" (what is the prefix?) > "x1a":"x4a" (infix numbering?) > > > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > tlumley at u.washington.edu University of Washington, Seattle >