thr3ads.net - R devel - [Rd] How to use `[` without evaluating the arguments. [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Eeles, Christopher

2020-Sep-24 21:45 UTC

[Rd] How to use `[` without evaluating the arguments.

Hello R-devel,

I am currently attempting to implement an API similar to data.table wherein
single bracket subsetting can accept an unquoted expression to be evaluated in
the context of my object.

A simple example from the data.table package looks like this:


DT <- data.table(col1 = c('a', 'b', 'c'), col2 =
c('x', 'y', 'z'))
DT[col1 == 'a']

Where the expression i in DT[i, j] is captured with substitute then evaluated
inside the DT object.

Reviewing the source code from data.table, it seems that they implemented this
feature simple by defining a new S3 method on `[` called `[.data.table`. I tried
to replicate this API as follows.

I have defined an S4 which contains an S3 class as follows:


#' Define an S3 Class
#'
#' Allows use of S3 methods with new S4 class. This is required to overcome
#' limitations of the `[` S4 method.
#'
setOldClass('long.table')

#' LongTable class definition
#'
#' Define a private constructor method to be used to build a `LongTable`
object.
#'
#' @param drugs [`data.table`]
#' @param cells [`data.table`]
#' @param assays [`list`]
#' @param metadata [`list`]
#'
#'
#' @return [`LongTable`] object containing the assay data from a
#'
#' @import data.table
#' @keywords internal
.LongTable <- setClass("LongTable",
                       slots=list(rowData='data.table',
                                  colData='data.table',
                                  assays='list',
                                  metadata='list',
                                  .intern='environment'),
                       contains='long.table')

#' LongTable constructor method
#'
#' @param rowData [`data.table`, `data.frame`, `matrix`] A table like object
#'   coercible to a `data.table` containing the a unique `rowID` column
which
#'   is used to key assays, as well as additional row metadata to subset on.
#' @param rowIDs [`character`, `integer`] A vector specifying
#'   the names or integer indexes of the row data identifier columns. These
#'   columns will be pasted together to make up the row.names of the
#'   `LongTable` object.
#' @param colData [`data.table`, `data.frame`, `matrix`] A table like object
#'   coercible to a `data.table` containing the a unique `colID` column
which
#'   is used to key assays, as well as additional column metadata to subset
on.
#' @param colIDs [`character`, `integer`] A vector specifying
#'   the names or integer indexes of the col data identifier columns. These
#'   columns will be pasted together to make up the col.names of the
#'   `LongTable` object.
#' @param assays A [`list`] containing one or more objects coercible to a
#'   `data.table`, and keyed by rowID and colID corresponding to the rowID
and
#'   colID columns in colData and rowData.
#' @param metadata A [`list`] of metadata associated with the `LongTable`
#'   object being constructed
#' @param keep.rownames [`logical` or `character`] Logical: whether rownames
#'   should be added as a column if coercing to a `data.table`, default is
FALSE.
#'   If TRUE, rownames are added to the column 'rn'. Character:
specify a custom
#'   column name to store the rownames in.
#'
#' @return [`LongTable`] object
#'
#' @import data.table
#' @export
LongTable <- function(rowData, rowIDs, colData, colIDs, assays,
                      metadata=list(), keep.rownames=FALSE) {

    ## TODO:: Handle missing parameters

    if (!is(colData, 'data.table')) {
        colData <- data.table(colData, keep.rownames=keep.rownames)
    }

    if (!is(rowData, 'data.table')) {
        rowData <- data.table(rowData, keep.rownames=keep.rownames)
    }

    if (!all(vapply(assays, FUN=is.data.table, FUN.VALUE=logical(1)))) {
        tryCatch({
            assays <- lapply(assays, FUN=data.table,
keep.rownames=keep.rownames)
        }, warning = function(w) {
            warning(w)
        }, error = function(e, assays) {
            message(e)
            types <- lapply(assays, typeof)
            stop(paste0('List items are types: ',
                        paste0(types, collapse=', '),
                        '\nPlease ensure all items in the assays list are
                        coerced to data.tables!'))
        })
    }

    # Initialize the .internals object to store private metadata for a LongTable
    internals <- new.env()

    ## TODO:: Implement error handling
    internals$rowIDs <-
        if (is.numeric(rowIDs) && max(rowIDs) < ncol(rowData))
            rowIDs
        else
            which(colnames(rowData) %in% rowIDs)
    lockBinding('rowIDs', internals)

    internals$colIDs <-
        if (is.numeric(colIDs) && max(colIDs) < ncol(colData))
            colIDs
        else
            which(colnames(colData) %in% colIDs)
    lockBinding('colIDs', internals)

    # Assemble the pseudo row and column names for the LongTable
    .pasteColons <- function(...) paste(..., collapse=':')
    rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))),
.SDcols=internals$rowIDs]
    colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))),
.SDcols=internals$colIDs]

    return(.LongTable(rowData=rowData, colData=colData,
                      assays=assays, metadata=metadata,
                      .intern=internals))
}

I have also defined a subset method as an S3 and S4 generic:


#' Subset method for a LongTable object.
#'
#' Allows use of the colData and rowData `data.table` objects to query based
on
#'  rowID and colID, which is then used to subset all value data.tables
stored
#'  in the dataList slot.
#'
#' This function is endomorphic, it always returns a LongTable object.
#'
#' @param x [`LongTable`] The object to subset.
#' @param rowQuery [`character`, `numeric`, `logical` or `expression`]
#'  Character: pass in a character vector of drug names, which will subset
the
#'      object on all row id columns matching the vector.
#'
#'  Numeric or Logical: these select based on the rowKey from the `rowData`
#'      method for the `LongTable`.
#'
#'  Expression: Accepts valid query statements to the `data.table` i
parameter,
#'      this can be used to make complex queries using the `data.table` API
#'      for the `rowData` data.table.
#'
#' @param columnQuery [`character`, `numeric`, `logical` or `expression`]
#'  Character: pass in a character vector of drug names, which will subset
the
#'      object on all drug id columns matching the vector.
#'
#'  Numeric or Logical: these select base don the rowID from the `rowData`
#'      method for the `LongTable`.
#'
#'  Expression: Accepts valid query statements to the `data.table` i
parameter,
#'      this can be used to make complex queries using the `data.table` API
#'      for the `rowData` data.table.
#'
#' @param values [`character`, `numeric` or `logical`] Optional list of
value
#'      names to subset. Can be used to subset the dataList column further,
#'      returning only the selected items in the new LongTable.
#'
#' @return [`LongTable`] A new `LongTable` object subset based on the
specified
#'      parameters.
#'
#' @importMethodsFrom BiocGenerics subset
#' @import data.table
#' @export
subset.long.table <- function(x, rowQuery, columnQuery, assays) {

    longTable <- x
    rm(x)

    if (!missing(rowQuery)) {
        if (tryCatch(is.character(rowQuery), error=function(e) FALSE)) {
            select <- grep('^cellLine[:digit:]*',
colnames(rowData(longTable)), value=TRUE)
            rowQueryString <- paste0(paste0(select, ' %in% ',
.variableToCodeString(rowQuery)), collapse=' | ')
            rowQuery <- str2lang(rowQueryString)
        } else {
            rowQuery <- substitute(rowQuery)
        }
        rowDataSubset <- rowData(longTable)[eval(rowQuery), ]
    } else {
        rowDataSubset <- rowData(longTable)
    }

    if (!missing(columnQuery)) {
        if (tryCatch(is.character(columnQuery), error=function(e) FALSE)) {
            select <- grep('^drug[:digit:]*',
colnames(colData(longTable)), value=TRUE)
            columnQueryString <- paste0(paste0(select, ' %in% ',
.variableToCodeString(columnQuery)), collapse=' | ')
            columnQuery <- str2lang(columnQueryString)
        } else {
            columnQuery <- substitute(columnQuery)
        }
        colDataSubset <- colData(longTable)[eval(columnQuery), ]
    } else {
        colDataSubset <- colData(longTable)
    }

    rowKeys <- rowDataSubset$rowKey
    colKeys <- colDataSubset$colKey

    if (missing(assays)) { assays <- assayNames(longTable) }
    keepAssays <- assayNames(longTable) %in% assays

    assayData <- lapply(assays(longTable)[keepAssays],
                     FUN=.filterLongDataTable,
                     indexList=list(rowKeys, colKeys))

    return(LongTable(colData=colDataSubset, colIDs=longTable at .intern$colIDs ,
                     rowData=rowDataSubset, rowIDs=longTable at .intern$rowIDs,
                     assays=assayData, metadata=metadata(longTable)))
}

setMethod('subset', 'LongTable', subset.long.table)

Everything behaves as I expect when calling the subset function. For example

subset(longTable, cellLine1 == 'VCAP)

Successfully returns while also working with character, integer or boolean based
indexing.

The issue arises when I try to implement the '[' method. I have tried a
number of different approaches, but none of them has been successful. My current
approach is as follows:


`[.long.table` <- function(x, i, j) eval(substitute(subset(x, i, j)))

This function works as expected in most cases, for example.


longTable[c(1,2,3), c(1,2,3,]
`[.long.table`(longTable, cellLine1 == 'VCAP')

Both work normally.

However, when I try using `[` like an operator:

longTable[cellLine1 == 'VCAP', ]

I get the error 'Error: object 'cellLine1' not found'.

This suggests to me that instead of passing the expression into the function
`[`, it is trying to evaluate the expression before dispatching a method.

Given that similar syntax works fine with data.table, and I believe also in the
tibble tidyverse package, I am quite confused.

If you have any recommendations on how I can prevent evaluation prior to method
dispatch, or of a work around that would produce the same API using a different
approach, it would be appreciated.

Thanks for  your assistance.


Best,

---
Christopher Eeles
Software Developer
BHK Laboratory<http://www.bhklab.ca/>
Princess Margaret Cancer Centre<https://www.pmgenomics.ca/pmgenomics/>
University Health Network<http://www.uhn.ca/>




This e-mail may contain confidential and/or privileged i...{{dropped:22}}

Hugh Parsonage

2020-Sep-25 12:18 UTC

head link

[Rd] How to use `[` without evaluating the arguments.

This works as expected:

"[.foo" <- function(x, i, j) {
  sx <- substitute(x)
  si <- substitute(i)
  sj <- substitute(j)
  100 * length(sx) + 10 * length(si) + length(sj)
}

x <- 1:10
class(x) <- "foo"
x[y == z, a(x)]
#> [1] 132

Note in your implementation you ask the function evaluate the
expression. You may have been intending to recompose the calls from
the substituted values of x, i, j and evaluate this new call.

On Fri, 25 Sep 2020 at 20:02, Eeles, Christopher
<Christopher.Eeles at uhnresearch.ca> wrote:>
> Hello R-devel,
>
> I am currently attempting to implement an API similar to data.table wherein
single bracket subsetting can accept an unquoted expression to be evaluated in
the context of my object.
>
> A simple example from the data.table package looks like this:
>
>
> DT <- data.table(col1 = c('a', 'b', 'c'), col2 =
c('x', 'y', 'z'))
> DT[col1 == 'a']
>
> Where the expression i in DT[i, j] is captured with substitute then
evaluated inside the DT object.
>
> Reviewing the source code from data.table, it seems that they implemented
this feature simple by defining a new S3 method on `[` called `[.data.table`. I
tried to replicate this API as follows.
>
> I have defined an S4 which contains an S3 class as follows:
>
>
> #' Define an S3 Class
> #'
> #' Allows use of S3 methods with new S4 class. This is required to
overcome
> #' limitations of the `[` S4 method.
> #'
> setOldClass('long.table')
>
> #' LongTable class definition
> #'
> #' Define a private constructor method to be used to build a
`LongTable` object.
> #'
> #' @param drugs [`data.table`]
> #' @param cells [`data.table`]
> #' @param assays [`list`]
> #' @param metadata [`list`]
> #'
> #'
> #' @return [`LongTable`] object containing the assay data from a
> #'
> #' @import data.table
> #' @keywords internal
> .LongTable <- setClass("LongTable",
>                        slots=list(rowData='data.table',
>                                   colData='data.table',
>                                   assays='list',
>                                   metadata='list',
>                                   .intern='environment'),
>                        contains='long.table')
>
> #' LongTable constructor method
> #'
> #' @param rowData [`data.table`, `data.frame`, `matrix`] A table like
object
> #'   coercible to a `data.table` containing the a unique `rowID` column
which
> #'   is used to key assays, as well as additional row metadata to
subset on.
> #' @param rowIDs [`character`, `integer`] A vector specifying
> #'   the names or integer indexes of the row data identifier columns.
These
> #'   columns will be pasted together to make up the row.names of the
> #'   `LongTable` object.
> #' @param colData [`data.table`, `data.frame`, `matrix`] A table like
object
> #'   coercible to a `data.table` containing the a unique `colID` column
which
> #'   is used to key assays, as well as additional column metadata to
subset on.
> #' @param colIDs [`character`, `integer`] A vector specifying
> #'   the names or integer indexes of the col data identifier columns.
These
> #'   columns will be pasted together to make up the col.names of the
> #'   `LongTable` object.
> #' @param assays A [`list`] containing one or more objects coercible to
a
> #'   `data.table`, and keyed by rowID and colID corresponding to the
rowID and
> #'   colID columns in colData and rowData.
> #' @param metadata A [`list`] of metadata associated with the
`LongTable`
> #'   object being constructed
> #' @param keep.rownames [`logical` or `character`] Logical: whether
rownames
> #'   should be added as a column if coercing to a `data.table`, default
is FALSE.
> #'   If TRUE, rownames are added to the column 'rn'. Character:
specify a custom
> #'   column name to store the rownames in.
> #'
> #' @return [`LongTable`] object
> #'
> #' @import data.table
> #' @export
> LongTable <- function(rowData, rowIDs, colData, colIDs, assays,
>                       metadata=list(), keep.rownames=FALSE) {
>
>     ## TODO:: Handle missing parameters
>
>     if (!is(colData, 'data.table')) {
>         colData <- data.table(colData, keep.rownames=keep.rownames)
>     }
>
>     if (!is(rowData, 'data.table')) {
>         rowData <- data.table(rowData, keep.rownames=keep.rownames)
>     }
>
>     if (!all(vapply(assays, FUN=is.data.table, FUN.VALUE=logical(1)))) {
>         tryCatch({
>             assays <- lapply(assays, FUN=data.table,
keep.rownames=keep.rownames)
>         }, warning = function(w) {
>             warning(w)
>         }, error = function(e, assays) {
>             message(e)
>             types <- lapply(assays, typeof)
>             stop(paste0('List items are types: ',
>                         paste0(types, collapse=', '),
>                         '\nPlease ensure all items in the assays list
are
>                         coerced to data.tables!'))
>         })
>     }
>
>     # Initialize the .internals object to store private metadata for a
LongTable
>     internals <- new.env()
>
>     ## TODO:: Implement error handling
>     internals$rowIDs <-
>         if (is.numeric(rowIDs) && max(rowIDs) < ncol(rowData))
>             rowIDs
>         else
>             which(colnames(rowData) %in% rowIDs)
>     lockBinding('rowIDs', internals)
>
>     internals$colIDs <-
>         if (is.numeric(colIDs) && max(colIDs) < ncol(colData))
>             colIDs
>         else
>             which(colnames(colData) %in% colIDs)
>     lockBinding('colIDs', internals)
>
>     # Assemble the pseudo row and column names for the LongTable
>     .pasteColons <- function(...) paste(..., collapse=':')
>     rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))),
.SDcols=internals$rowIDs]
>     colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))),
.SDcols=internals$colIDs]
>
>     return(.LongTable(rowData=rowData, colData=colData,
>                       assays=assays, metadata=metadata,
>                       .intern=internals))
> }
>
> I have also defined a subset method as an S3 and S4 generic:
>
>
> #' Subset method for a LongTable object.
> #'
> #' Allows use of the colData and rowData `data.table` objects to query
based on
> #'  rowID and colID, which is then used to subset all value data.tables
stored
> #'  in the dataList slot.
> #'
> #' This function is endomorphic, it always returns a LongTable object.
> #'
> #' @param x [`LongTable`] The object to subset.
> #' @param rowQuery [`character`, `numeric`, `logical` or `expression`]
> #'  Character: pass in a character vector of drug names, which will
subset the
> #'      object on all row id columns matching the vector.
> #'
> #'  Numeric or Logical: these select based on the rowKey from the
`rowData`
> #'      method for the `LongTable`.
> #'
> #'  Expression: Accepts valid query statements to the `data.table` i
parameter,
> #'      this can be used to make complex queries using the `data.table`
API
> #'      for the `rowData` data.table.
> #'
> #' @param columnQuery [`character`, `numeric`, `logical` or
`expression`]
> #'  Character: pass in a character vector of drug names, which will
subset the
> #'      object on all drug id columns matching the vector.
> #'
> #'  Numeric or Logical: these select base don the rowID from the
`rowData`
> #'      method for the `LongTable`.
> #'
> #'  Expression: Accepts valid query statements to the `data.table` i
parameter,
> #'      this can be used to make complex queries using the `data.table`
API
> #'      for the `rowData` data.table.
> #'
> #' @param values [`character`, `numeric` or `logical`] Optional list of
value
> #'      names to subset. Can be used to subset the dataList column
further,
> #'      returning only the selected items in the new LongTable.
> #'
> #' @return [`LongTable`] A new `LongTable` object subset based on the
specified
> #'      parameters.
> #'
> #' @importMethodsFrom BiocGenerics subset
> #' @import data.table
> #' @export
> subset.long.table <- function(x, rowQuery, columnQuery, assays) {
>
>     longTable <- x
>     rm(x)
>
>     if (!missing(rowQuery)) {
>         if (tryCatch(is.character(rowQuery), error=function(e) FALSE)) {
>             select <- grep('^cellLine[:digit:]*',
colnames(rowData(longTable)), value=TRUE)
>             rowQueryString <- paste0(paste0(select, ' %in% ',
.variableToCodeString(rowQuery)), collapse=' | ')
>             rowQuery <- str2lang(rowQueryString)
>         } else {
>             rowQuery <- substitute(rowQuery)
>         }
>         rowDataSubset <- rowData(longTable)[eval(rowQuery), ]
>     } else {
>         rowDataSubset <- rowData(longTable)
>     }
>
>     if (!missing(columnQuery)) {
>         if (tryCatch(is.character(columnQuery), error=function(e) FALSE)) {
>             select <- grep('^drug[:digit:]*',
colnames(colData(longTable)), value=TRUE)
>             columnQueryString <- paste0(paste0(select, ' %in% ',
.variableToCodeString(columnQuery)), collapse=' | ')
>             columnQuery <- str2lang(columnQueryString)
>         } else {
>             columnQuery <- substitute(columnQuery)
>         }
>         colDataSubset <- colData(longTable)[eval(columnQuery), ]
>     } else {
>         colDataSubset <- colData(longTable)
>     }
>
>     rowKeys <- rowDataSubset$rowKey
>     colKeys <- colDataSubset$colKey
>
>     if (missing(assays)) { assays <- assayNames(longTable) }
>     keepAssays <- assayNames(longTable) %in% assays
>
>     assayData <- lapply(assays(longTable)[keepAssays],
>                      FUN=.filterLongDataTable,
>                      indexList=list(rowKeys, colKeys))
>
>     return(LongTable(colData=colDataSubset, colIDs=longTable at
.intern$colIDs ,
>                      rowData=rowDataSubset, rowIDs=longTable at
.intern$rowIDs,
>                      assays=assayData, metadata=metadata(longTable)))
> }
>
> setMethod('subset', 'LongTable', subset.long.table)
>
> Everything behaves as I expect when calling the subset function. For
example
>
> subset(longTable, cellLine1 == 'VCAP)
>
> Successfully returns while also working with character, integer or boolean
based indexing.
>
> The issue arises when I try to implement the '[' method. I have
tried a number of different approaches, but none of them has been successful. My
current approach is as follows:
>
>
> `[.long.table` <- function(x, i, j) eval(substitute(subset(x, i, j)))
>
> This function works as expected in most cases, for example.
>
>
> longTable[c(1,2,3), c(1,2,3,]
> `[.long.table`(longTable, cellLine1 == 'VCAP')
>
> Both work normally.
>
> However, when I try using `[` like an operator:
>
> longTable[cellLine1 == 'VCAP', ]
>
> I get the error 'Error: object 'cellLine1' not found'.
>
> This suggests to me that instead of passing the expression into the
function `[`, it is trying to evaluate the expression before dispatching a
method.
>
> Given that similar syntax works fine with data.table, and I believe also in
the tibble tidyverse package, I am quite confused.
>
> If you have any recommendations on how I can prevent evaluation prior to
method dispatch, or of a work around that would produce the same API using a
different approach, it would be appreciated.
>
> Thanks for  your assistance.
>
>
> Best,
>
> ---
> Christopher Eeles
> Software Developer
> BHK Laboratory<http://www.bhklab.ca/>
> Princess Margaret Cancer
Centre<https://www.pmgenomics.ca/pmgenomics/>
> University Health Network<http://www.uhn.ca/>
>
>
>
>
> This e-mail may contain confidential and/or privileged i...{{dropped:22}}
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Possibly Parallel Threads

Search for more reasonably related threads

R devel - Sep 2020 - How to use `[` without evaluating the arguments.

[Rd] How to use `[` without evaluating the arguments.

[Rd] How to use `[` without evaluating the arguments.

Possibly Parallel Threads