Eeles, Christopher
2020-Sep-24 21:45 UTC
[Rd] How to use `[` without evaluating the arguments.
Hello R-devel, I am currently attempting to implement an API similar to data.table wherein single bracket subsetting can accept an unquoted expression to be evaluated in the context of my object. A simple example from the data.table package looks like this: DT <- data.table(col1 = c('a', 'b', 'c'), col2 = c('x', 'y', 'z')) DT[col1 == 'a'] Where the expression i in DT[i, j] is captured with substitute then evaluated inside the DT object. Reviewing the source code from data.table, it seems that they implemented this feature simple by defining a new S3 method on `[` called `[.data.table`. I tried to replicate this API as follows. I have defined an S4 which contains an S3 class as follows: #' Define an S3 Class #' #' Allows use of S3 methods with new S4 class. This is required to overcome #' limitations of the `[` S4 method. #' setOldClass('long.table') #' LongTable class definition #' #' Define a private constructor method to be used to build a `LongTable` object. #' #' @param drugs [`data.table`] #' @param cells [`data.table`] #' @param assays [`list`] #' @param metadata [`list`] #' #' #' @return [`LongTable`] object containing the assay data from a #' #' @import data.table #' @keywords internal .LongTable <- setClass("LongTable", slots=list(rowData='data.table', colData='data.table', assays='list', metadata='list', .intern='environment'), contains='long.table') #' LongTable constructor method #' #' @param rowData [`data.table`, `data.frame`, `matrix`] A table like object #' coercible to a `data.table` containing the a unique `rowID` column which #' is used to key assays, as well as additional row metadata to subset on. #' @param rowIDs [`character`, `integer`] A vector specifying #' the names or integer indexes of the row data identifier columns. These #' columns will be pasted together to make up the row.names of the #' `LongTable` object. #' @param colData [`data.table`, `data.frame`, `matrix`] A table like object #' coercible to a `data.table` containing the a unique `colID` column which #' is used to key assays, as well as additional column metadata to subset on. #' @param colIDs [`character`, `integer`] A vector specifying #' the names or integer indexes of the col data identifier columns. These #' columns will be pasted together to make up the col.names of the #' `LongTable` object. #' @param assays A [`list`] containing one or more objects coercible to a #' `data.table`, and keyed by rowID and colID corresponding to the rowID and #' colID columns in colData and rowData. #' @param metadata A [`list`] of metadata associated with the `LongTable` #' object being constructed #' @param keep.rownames [`logical` or `character`] Logical: whether rownames #' should be added as a column if coercing to a `data.table`, default is FALSE. #' If TRUE, rownames are added to the column 'rn'. Character: specify a custom #' column name to store the rownames in. #' #' @return [`LongTable`] object #' #' @import data.table #' @export LongTable <- function(rowData, rowIDs, colData, colIDs, assays, metadata=list(), keep.rownames=FALSE) { ## TODO:: Handle missing parameters if (!is(colData, 'data.table')) { colData <- data.table(colData, keep.rownames=keep.rownames) } if (!is(rowData, 'data.table')) { rowData <- data.table(rowData, keep.rownames=keep.rownames) } if (!all(vapply(assays, FUN=is.data.table, FUN.VALUE=logical(1)))) { tryCatch({ assays <- lapply(assays, FUN=data.table, keep.rownames=keep.rownames) }, warning = function(w) { warning(w) }, error = function(e, assays) { message(e) types <- lapply(assays, typeof) stop(paste0('List items are types: ', paste0(types, collapse=', '), '\nPlease ensure all items in the assays list are coerced to data.tables!')) }) } # Initialize the .internals object to store private metadata for a LongTable internals <- new.env() ## TODO:: Implement error handling internals$rowIDs <- if (is.numeric(rowIDs) && max(rowIDs) < ncol(rowData)) rowIDs else which(colnames(rowData) %in% rowIDs) lockBinding('rowIDs', internals) internals$colIDs <- if (is.numeric(colIDs) && max(colIDs) < ncol(colData)) colIDs else which(colnames(colData) %in% colIDs) lockBinding('colIDs', internals) # Assemble the pseudo row and column names for the LongTable .pasteColons <- function(...) paste(..., collapse=':') rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$rowIDs] colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$colIDs] return(.LongTable(rowData=rowData, colData=colData, assays=assays, metadata=metadata, .intern=internals)) } I have also defined a subset method as an S3 and S4 generic: #' Subset method for a LongTable object. #' #' Allows use of the colData and rowData `data.table` objects to query based on #' rowID and colID, which is then used to subset all value data.tables stored #' in the dataList slot. #' #' This function is endomorphic, it always returns a LongTable object. #' #' @param x [`LongTable`] The object to subset. #' @param rowQuery [`character`, `numeric`, `logical` or `expression`] #' Character: pass in a character vector of drug names, which will subset the #' object on all row id columns matching the vector. #' #' Numeric or Logical: these select based on the rowKey from the `rowData` #' method for the `LongTable`. #' #' Expression: Accepts valid query statements to the `data.table` i parameter, #' this can be used to make complex queries using the `data.table` API #' for the `rowData` data.table. #' #' @param columnQuery [`character`, `numeric`, `logical` or `expression`] #' Character: pass in a character vector of drug names, which will subset the #' object on all drug id columns matching the vector. #' #' Numeric or Logical: these select base don the rowID from the `rowData` #' method for the `LongTable`. #' #' Expression: Accepts valid query statements to the `data.table` i parameter, #' this can be used to make complex queries using the `data.table` API #' for the `rowData` data.table. #' #' @param values [`character`, `numeric` or `logical`] Optional list of value #' names to subset. Can be used to subset the dataList column further, #' returning only the selected items in the new LongTable. #' #' @return [`LongTable`] A new `LongTable` object subset based on the specified #' parameters. #' #' @importMethodsFrom BiocGenerics subset #' @import data.table #' @export subset.long.table <- function(x, rowQuery, columnQuery, assays) { longTable <- x rm(x) if (!missing(rowQuery)) { if (tryCatch(is.character(rowQuery), error=function(e) FALSE)) { select <- grep('^cellLine[:digit:]*', colnames(rowData(longTable)), value=TRUE) rowQueryString <- paste0(paste0(select, ' %in% ', .variableToCodeString(rowQuery)), collapse=' | ') rowQuery <- str2lang(rowQueryString) } else { rowQuery <- substitute(rowQuery) } rowDataSubset <- rowData(longTable)[eval(rowQuery), ] } else { rowDataSubset <- rowData(longTable) } if (!missing(columnQuery)) { if (tryCatch(is.character(columnQuery), error=function(e) FALSE)) { select <- grep('^drug[:digit:]*', colnames(colData(longTable)), value=TRUE) columnQueryString <- paste0(paste0(select, ' %in% ', .variableToCodeString(columnQuery)), collapse=' | ') columnQuery <- str2lang(columnQueryString) } else { columnQuery <- substitute(columnQuery) } colDataSubset <- colData(longTable)[eval(columnQuery), ] } else { colDataSubset <- colData(longTable) } rowKeys <- rowDataSubset$rowKey colKeys <- colDataSubset$colKey if (missing(assays)) { assays <- assayNames(longTable) } keepAssays <- assayNames(longTable) %in% assays assayData <- lapply(assays(longTable)[keepAssays], FUN=.filterLongDataTable, indexList=list(rowKeys, colKeys)) return(LongTable(colData=colDataSubset, colIDs=longTable at .intern$colIDs , rowData=rowDataSubset, rowIDs=longTable at .intern$rowIDs, assays=assayData, metadata=metadata(longTable))) } setMethod('subset', 'LongTable', subset.long.table) Everything behaves as I expect when calling the subset function. For example subset(longTable, cellLine1 == 'VCAP) Successfully returns while also working with character, integer or boolean based indexing. The issue arises when I try to implement the '[' method. I have tried a number of different approaches, but none of them has been successful. My current approach is as follows: `[.long.table` <- function(x, i, j) eval(substitute(subset(x, i, j))) This function works as expected in most cases, for example. longTable[c(1,2,3), c(1,2,3,] `[.long.table`(longTable, cellLine1 == 'VCAP') Both work normally. However, when I try using `[` like an operator: longTable[cellLine1 == 'VCAP', ] I get the error 'Error: object 'cellLine1' not found'. This suggests to me that instead of passing the expression into the function `[`, it is trying to evaluate the expression before dispatching a method. Given that similar syntax works fine with data.table, and I believe also in the tibble tidyverse package, I am quite confused. If you have any recommendations on how I can prevent evaluation prior to method dispatch, or of a work around that would produce the same API using a different approach, it would be appreciated. Thanks for your assistance. Best, --- Christopher Eeles Software Developer BHK Laboratory<http://www.bhklab.ca/> Princess Margaret Cancer Centre<https://www.pmgenomics.ca/pmgenomics/> University Health Network<http://www.uhn.ca/> This e-mail may contain confidential and/or privileged i...{{dropped:22}}
Hugh Parsonage
2020-Sep-25 12:18 UTC
[Rd] How to use `[` without evaluating the arguments.
This works as expected: "[.foo" <- function(x, i, j) { sx <- substitute(x) si <- substitute(i) sj <- substitute(j) 100 * length(sx) + 10 * length(si) + length(sj) } x <- 1:10 class(x) <- "foo" x[y == z, a(x)] #> [1] 132 Note in your implementation you ask the function evaluate the expression. You may have been intending to recompose the calls from the substituted values of x, i, j and evaluate this new call. On Fri, 25 Sep 2020 at 20:02, Eeles, Christopher <Christopher.Eeles at uhnresearch.ca> wrote:> > Hello R-devel, > > I am currently attempting to implement an API similar to data.table wherein single bracket subsetting can accept an unquoted expression to be evaluated in the context of my object. > > A simple example from the data.table package looks like this: > > > DT <- data.table(col1 = c('a', 'b', 'c'), col2 = c('x', 'y', 'z')) > DT[col1 == 'a'] > > Where the expression i in DT[i, j] is captured with substitute then evaluated inside the DT object. > > Reviewing the source code from data.table, it seems that they implemented this feature simple by defining a new S3 method on `[` called `[.data.table`. I tried to replicate this API as follows. > > I have defined an S4 which contains an S3 class as follows: > > > #' Define an S3 Class > #' > #' Allows use of S3 methods with new S4 class. This is required to overcome > #' limitations of the `[` S4 method. > #' > setOldClass('long.table') > > #' LongTable class definition > #' > #' Define a private constructor method to be used to build a `LongTable` object. > #' > #' @param drugs [`data.table`] > #' @param cells [`data.table`] > #' @param assays [`list`] > #' @param metadata [`list`] > #' > #' > #' @return [`LongTable`] object containing the assay data from a > #' > #' @import data.table > #' @keywords internal > .LongTable <- setClass("LongTable", > slots=list(rowData='data.table', > colData='data.table', > assays='list', > metadata='list', > .intern='environment'), > contains='long.table') > > #' LongTable constructor method > #' > #' @param rowData [`data.table`, `data.frame`, `matrix`] A table like object > #' coercible to a `data.table` containing the a unique `rowID` column which > #' is used to key assays, as well as additional row metadata to subset on. > #' @param rowIDs [`character`, `integer`] A vector specifying > #' the names or integer indexes of the row data identifier columns. These > #' columns will be pasted together to make up the row.names of the > #' `LongTable` object. > #' @param colData [`data.table`, `data.frame`, `matrix`] A table like object > #' coercible to a `data.table` containing the a unique `colID` column which > #' is used to key assays, as well as additional column metadata to subset on. > #' @param colIDs [`character`, `integer`] A vector specifying > #' the names or integer indexes of the col data identifier columns. These > #' columns will be pasted together to make up the col.names of the > #' `LongTable` object. > #' @param assays A [`list`] containing one or more objects coercible to a > #' `data.table`, and keyed by rowID and colID corresponding to the rowID and > #' colID columns in colData and rowData. > #' @param metadata A [`list`] of metadata associated with the `LongTable` > #' object being constructed > #' @param keep.rownames [`logical` or `character`] Logical: whether rownames > #' should be added as a column if coercing to a `data.table`, default is FALSE. > #' If TRUE, rownames are added to the column 'rn'. Character: specify a custom > #' column name to store the rownames in. > #' > #' @return [`LongTable`] object > #' > #' @import data.table > #' @export > LongTable <- function(rowData, rowIDs, colData, colIDs, assays, > metadata=list(), keep.rownames=FALSE) { > > ## TODO:: Handle missing parameters > > if (!is(colData, 'data.table')) { > colData <- data.table(colData, keep.rownames=keep.rownames) > } > > if (!is(rowData, 'data.table')) { > rowData <- data.table(rowData, keep.rownames=keep.rownames) > } > > if (!all(vapply(assays, FUN=is.data.table, FUN.VALUE=logical(1)))) { > tryCatch({ > assays <- lapply(assays, FUN=data.table, keep.rownames=keep.rownames) > }, warning = function(w) { > warning(w) > }, error = function(e, assays) { > message(e) > types <- lapply(assays, typeof) > stop(paste0('List items are types: ', > paste0(types, collapse=', '), > '\nPlease ensure all items in the assays list are > coerced to data.tables!')) > }) > } > > # Initialize the .internals object to store private metadata for a LongTable > internals <- new.env() > > ## TODO:: Implement error handling > internals$rowIDs <- > if (is.numeric(rowIDs) && max(rowIDs) < ncol(rowData)) > rowIDs > else > which(colnames(rowData) %in% rowIDs) > lockBinding('rowIDs', internals) > > internals$colIDs <- > if (is.numeric(colIDs) && max(colIDs) < ncol(colData)) > colIDs > else > which(colnames(colData) %in% colIDs) > lockBinding('colIDs', internals) > > # Assemble the pseudo row and column names for the LongTable > .pasteColons <- function(...) paste(..., collapse=':') > rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$rowIDs] > colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$colIDs] > > return(.LongTable(rowData=rowData, colData=colData, > assays=assays, metadata=metadata, > .intern=internals)) > } > > I have also defined a subset method as an S3 and S4 generic: > > > #' Subset method for a LongTable object. > #' > #' Allows use of the colData and rowData `data.table` objects to query based on > #' rowID and colID, which is then used to subset all value data.tables stored > #' in the dataList slot. > #' > #' This function is endomorphic, it always returns a LongTable object. > #' > #' @param x [`LongTable`] The object to subset. > #' @param rowQuery [`character`, `numeric`, `logical` or `expression`] > #' Character: pass in a character vector of drug names, which will subset the > #' object on all row id columns matching the vector. > #' > #' Numeric or Logical: these select based on the rowKey from the `rowData` > #' method for the `LongTable`. > #' > #' Expression: Accepts valid query statements to the `data.table` i parameter, > #' this can be used to make complex queries using the `data.table` API > #' for the `rowData` data.table. > #' > #' @param columnQuery [`character`, `numeric`, `logical` or `expression`] > #' Character: pass in a character vector of drug names, which will subset the > #' object on all drug id columns matching the vector. > #' > #' Numeric or Logical: these select base don the rowID from the `rowData` > #' method for the `LongTable`. > #' > #' Expression: Accepts valid query statements to the `data.table` i parameter, > #' this can be used to make complex queries using the `data.table` API > #' for the `rowData` data.table. > #' > #' @param values [`character`, `numeric` or `logical`] Optional list of value > #' names to subset. Can be used to subset the dataList column further, > #' returning only the selected items in the new LongTable. > #' > #' @return [`LongTable`] A new `LongTable` object subset based on the specified > #' parameters. > #' > #' @importMethodsFrom BiocGenerics subset > #' @import data.table > #' @export > subset.long.table <- function(x, rowQuery, columnQuery, assays) { > > longTable <- x > rm(x) > > if (!missing(rowQuery)) { > if (tryCatch(is.character(rowQuery), error=function(e) FALSE)) { > select <- grep('^cellLine[:digit:]*', colnames(rowData(longTable)), value=TRUE) > rowQueryString <- paste0(paste0(select, ' %in% ', .variableToCodeString(rowQuery)), collapse=' | ') > rowQuery <- str2lang(rowQueryString) > } else { > rowQuery <- substitute(rowQuery) > } > rowDataSubset <- rowData(longTable)[eval(rowQuery), ] > } else { > rowDataSubset <- rowData(longTable) > } > > if (!missing(columnQuery)) { > if (tryCatch(is.character(columnQuery), error=function(e) FALSE)) { > select <- grep('^drug[:digit:]*', colnames(colData(longTable)), value=TRUE) > columnQueryString <- paste0(paste0(select, ' %in% ', .variableToCodeString(columnQuery)), collapse=' | ') > columnQuery <- str2lang(columnQueryString) > } else { > columnQuery <- substitute(columnQuery) > } > colDataSubset <- colData(longTable)[eval(columnQuery), ] > } else { > colDataSubset <- colData(longTable) > } > > rowKeys <- rowDataSubset$rowKey > colKeys <- colDataSubset$colKey > > if (missing(assays)) { assays <- assayNames(longTable) } > keepAssays <- assayNames(longTable) %in% assays > > assayData <- lapply(assays(longTable)[keepAssays], > FUN=.filterLongDataTable, > indexList=list(rowKeys, colKeys)) > > return(LongTable(colData=colDataSubset, colIDs=longTable at .intern$colIDs , > rowData=rowDataSubset, rowIDs=longTable at .intern$rowIDs, > assays=assayData, metadata=metadata(longTable))) > } > > setMethod('subset', 'LongTable', subset.long.table) > > Everything behaves as I expect when calling the subset function. For example > > subset(longTable, cellLine1 == 'VCAP) > > Successfully returns while also working with character, integer or boolean based indexing. > > The issue arises when I try to implement the '[' method. I have tried a number of different approaches, but none of them has been successful. My current approach is as follows: > > > `[.long.table` <- function(x, i, j) eval(substitute(subset(x, i, j))) > > This function works as expected in most cases, for example. > > > longTable[c(1,2,3), c(1,2,3,] > `[.long.table`(longTable, cellLine1 == 'VCAP') > > Both work normally. > > However, when I try using `[` like an operator: > > longTable[cellLine1 == 'VCAP', ] > > I get the error 'Error: object 'cellLine1' not found'. > > This suggests to me that instead of passing the expression into the function `[`, it is trying to evaluate the expression before dispatching a method. > > Given that similar syntax works fine with data.table, and I believe also in the tibble tidyverse package, I am quite confused. > > If you have any recommendations on how I can prevent evaluation prior to method dispatch, or of a work around that would produce the same API using a different approach, it would be appreciated. > > Thanks for your assistance. > > > Best, > > --- > Christopher Eeles > Software Developer > BHK Laboratory<http://www.bhklab.ca/> > Princess Margaret Cancer Centre<https://www.pmgenomics.ca/pmgenomics/> > University Health Network<http://www.uhn.ca/> > > > > > This e-mail may contain confidential and/or privileged i...{{dropped:22}} > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel