Martin Maechler
2012-Aug-18 15:03 UTC
[Rd] Quiz: How to get a "named column" from a data frame
Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence "R-devel"): Consider this toy example, where the dataframe already has only one column :> nv <- c(a=1, d=17, e=101); nva d e 1 17 101> df <- as.data.frame(cbind(VAR = nv)); dfVAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get> identical(nv, .......)[1] TRUE where ...... only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '.......' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin
Joshua Ulrich
2012-Aug-18 15:16 UTC
[Rd] Quiz: How to get a "named column" from a data frame
I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R> identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv <- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df <- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. > > (*) In my solution, the above '.......' consists of 17 letters. > I'll post it later today (CEST time) ... or confirm > that someone else has done so. > > Martin > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2012-Aug-18 16:20 UTC
[Rd] Quiz: How to get a "named column" from a data frame
On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechb?hler .... wrote:> On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler > <maechler at stat.math.ethz.ch> wrote: >> >> Today, I was looking for an elegant (and efficient) way >> to get a named (atomic) vector by selecting one column of a data frame. >> Of course, the vector names must be the rownames of the data frame. >> >> Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was >> wondering if there are obvious better ones, and >> also if this should not become more idiomatic (hence "R-devel"): >> >> Consider this toy example, where the dataframe already has only >> one column : >> >> > nv <- c(a=1, d=17, e=101); nv >> a d e >> 1 17 101 >> >> > df <- as.data.frame(cbind(VAR = nv)); df >> VAR >> a 1 >> d 17 >> e 101 >> >> Now how, can I get 'nv' back from 'df' ? I.e., how to get >> >> > identical(nv, .......) >> [1] TRUE >> >> where ...... only uses 'df' (and no non-standard R packages)? > > >> identical(nv, df[,1]) > [1] TRUE > >> In my solution, the above '.......' consists of 17 letters. > > > I count 6 in mineBut it is not a solution in a current version of R! though it's still interesting that df[,1] worked in some incantation of R. What's your sessionInfo()? Martin> > /Christian
Winston Chang
2012-Aug-18 18:54 UTC
[Rd] Quiz: How to get a "named column" from a data frame
This isn't super-concise, but has the virtue of being clear: nv <- c(a=1, d=17, e=101) df <- as.data.frame(cbind(VAR = nv)) identical(nv, setNames(df$VAR, rownames(df))) # TRUE It seems to be more efficient than the other methods as well: f1 <- function() setNames(df$VAR, rownames(df)) f2 <- function() t(df)[1,] f3 <- function() as.matrix(df)[,1] r <- microbenchmark(f1(), f2(), f3(), times=1000) r # Unit: microseconds # expr min lq median uq max # 1 f1() 14.589 17.0315 18.608 19.3220 89.388 # 2 f2() 68.057 70.8735 72.240 75.8065 3707.012 # 3 f3() 58.153 61.2600 62.521 65.0380 238.483 -Winston On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv <- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df <- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. > > (*) In my solution, the above '.......' consists of 17 letters. > I'll post it later today (CEST time) ... or confirm > that someone else has done so. > > Martin > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Hadley Wickham
2012-Aug-18 19:11 UTC
[Rd] Quiz: How to get a "named column" from a data frame
On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv <- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df <- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one.But aren't you making life difficult for yourself by not using I ? df <- data.frame(VAR = I(nv)) str(df[[1]]) (which isn't quite identically because it now has the AsIs class) Hadley -- Assistant Professor Department of Statistics / Rice University http://had.co.nz/
J. R. M. Hosking
2012-Aug-18 19:28 UTC
[Rd] Quiz: How to get a "named column" from a data frame
On 2012-08-18 11:03, Martin Maechler wrote:> Today, I was looking for an elegant (and efficient) way > to get a named (atomic) vector by selecting one column of a data frame. > Of course, the vector names must be the rownames of the data frame. > > Ok, here is the quiz, I know one quite "cute"/"slick" answer, but was > wondering if there are obvious better ones, and > also if this should not become more idiomatic (hence "R-devel"): > > Consider this toy example, where the dataframe already has only > one column : > >> nv<- c(a=1, d=17, e=101); nv > a d e > 1 17 101 > >> df<- as.data.frame(cbind(VAR = nv)); df > VAR > a 1 > d 17 > e 101 > > Now how, can I get 'nv' back from 'df' ? I.e., how to get > >> identical(nv, .......) > [1] TRUE > > where ...... only uses 'df' (and no non-standard R packages)? > > As said, I know a simple solution (*), but I'm sure it is not > obvious to most R users and probably not even to the majority of > R-devel readers... OTOH, people like Bill Dunlap will not take > long to provide it or a better one. > > (*) In my solution, the above '.......' consists of 17 letters. > I'll post it later today (CEST time) ... or confirm > that someone else has done so. > > MartinFor this purpose my private function library has a function withnames(): withnames(): Extract from data frame as a named vector Description: Extracts data from a data frame; if the result is a vector (i.e. we extracted a single column and did not specify 'drop=FALSE') it is assigned names derived from the row names of the data frame. Usage: withnames(expr) Arguments: expr: R expression. Details: 'expr' is evaluated in an environment in which the extractor functions '$.data.frame', '[.data.frame', and '[[.data.frame' are replaced by versions that attach the data frame's row names to an extracted vector. Value: 'expr', evaluated as described above. ## Code withnames<-function(expr) { eval(substitute(expr), list( `[.data.frame` = function(x,i,...) { out<-x[i,...] if (is.null(dim(out))) names(out)<-row.names(x)[i] return(out)}, `[[.data.frame` = function(x,...) { out<-x[[...]] if (is.null(dim(out))) names(out)<-row.names(x) return(out)}, `$.data.frame` = function(x,name) { out<-x[[name, exact=FALSE]] if (is.null(dim(out))) names(out)<-row.names(x) return(out)} ), enclos=parent.frame()) } ## Examples dd <- data.frame(aa=1:6, bb=letters[c(1,3,2,3,3,1)], row.names=LETTERS[1:6]) dd dd$aa # Unnamed vector withnames(dd$aa) # Named vector withnames(dd[["aa"]]) # Named vector withnames(dd[2:4,"aa"]) # Named vector withnames(dd$bb) # Factor with names withnames(outer(dd$a,dd$a)) # Both dimensions have names ## But now I am looking for a version that will play nicely with with(): withnames(with(dd, aa)) # No names! with(dd, withnames(aa)) # No names!