Eric Lecoutre
2007-Dec-20 14:15 UTC
[R] custom subset method / handling columns selection as logic in '...' parameter
Dear R-helpers & bioconductor Sorry for cross-posting, this concerns R-programming stuff applied on Bioconductor context. Also sorry for this long message, I try to be complete in my request. I am trying to write a subset method for a specific class (ExpressionSet from Bioconductor) allowing selection more flexible than "[" method . The schema I am thinking for is the following: subset.ExpressionSet <- function(x,subset,...){ } I will use the subset argument for rows (genes), as in default method. Now I would like to allow to select different columns (features) based on phenotypic data. phenotypic data provides detailed information about the columns. Basically, first function I have written allows the following:> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2)# subset=NULL takes all rows See: there are two conditions on two variables belonging to the associated data.frame encapsulated in the ExpressionSetObject (to be complete, the conditions will be applied on more of 2 columns, as they are used on the phylogenic data.frame that concerns all variables) To simplify a little bit, this would nearly return: ExpressionSetObject[,V1==value & V2==value] This is nice as I can already handle any number of conditions on variables values thanks to '...'. First step is conditions <- list(...) and are then handled later in code Nevertheless, those conditions are basic (one value). I would like to handle arbitrary conditions, such as: V1 %in% c(value1, value2) More simple expression would be passed with V2==value instead of V2=value2 My very problem is that I don't know how to turn '...' into an object containing those conditions that could be used later. My attempt which seems the nearest is:> foo <- function(...){ > as.expression(substitute(list(...))) > } >foo(x==1,y%in%1:2)expression(list(x == 1, y %in% 1:2)) where as I would like to have something like list(expression(x==1), expression(y %in% 1:2)) those expressions beeing evaluated later on in the context of my specific object. Are there any existing function where '...' are already handled the way I want so that I can mimic? Thanks for any insight. Eric --- For those who have Biobase available, here is my current subset function and a demo-case that explains a little bit. library(Biobase) example(ExpressionSet) # create sample object print(expressionSet) # now my subset function as it is subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){ # subset is used to subset on rows # ... is used to make multiple conditions on columns based on pData # list of conditions is handled in ... stopifnot(is(x,"ExpressionSet")) phenoData <- pData(x) listCriteria <- list(...) if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x))) subset <- subset & !is.na(subset) retainedCriteria <- list() tmp <- sapply(names(listCriteria), function(critname) { if(!critname %in% colnames(phenoData)){ if (verbose) cat("\n*** subsetCompounds: Dropped criteria:",critname, "not in phenoData of object\n") }else{ if(is.null(listCriteria[critname])) listCriteria[[critname]]<- unique(phenoData[,critname]) retainedCriteria[[critname]] <<- phenoData[,critname] %in% listCriteria[critname] } }) criteriaValues <- do.call("cbind",retainedCriteria) selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)] ## cbind(phenoData,criteriaValues) out <- x[subset,selectedColumns] if (verbose) cat('\n',length(selectedColumns),' columns selected (',paste(selectedColumns,collapse=' '), ')\n',sep='') invisible(return(out)) } # looking at phenotypic data associated with the sample expressionSet> pData(expressionSet)sex type score A Female Control 0.75 B Male Case 0.40 C Male Control 0.73 D Male Case 0.42 E Female Case 0.93 F Male Control 0.22 G Male Case 0.96 H Male Case 0.79 I Female Case 0.37 J Male Control 0.63 K Male Case 0.26 L Female Control 0.36 M Male Case 0.41 N Male Case 0.80 O Female Case 0.10 P Female Control 0.41 Q Female Case 0.16 R Male Control 0.72 S Male Case 0.17 T Female Case 0.74 U Male Control 0.35 V Female Control 0.77 W Male Control 0.27 X Male Control 0.98 Y Female Case 0.94 Z Female Case 0.32 # now the sample use> (subset1 =subset(expressionSet,sex="Male",type="Control"))7 columns selected (C F J R U W X) ExpressionSet (storageMode: lockedEnvironment) assayData: 500 features, 7 samples element names: exprs, se.exprs phenoData sampleNames: C, F, ..., X (7 total) varLabels and varMetadata description: sex: Female/Male type: Case/Control score: Testing Score featureData featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total) fvarLabels and fvarMetadata description: none experimentData: use 'experimentData(object)' Annotation: hgu95av2 # what I would like to allow in use: (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the =instead of [[alternative HTML version deleted]]
Martin Morgan
2007-Dec-20 14:46 UTC
[R] custom subset method / handling columns selection as logic in '...' parameter
Eric -- Please don't cross post Please simplify your example so that others do not have to work hard to understand what you are asking See additional response on the Bioconductor mailing list. Martin "Eric Lecoutre" <ericlecoutre at gmail.com> writes:> Dear R-helpers & bioconductor > > > Sorry for cross-posting, this concerns R-programming stuff applied on > Bioconductor context. > Also sorry for this long message, I try to be complete in my request. > > I am trying to write a subset method for a specific class (ExpressionSet > from Bioconductor) allowing selection more flexible than "[" method . > > The schema I am thinking for is the following: > > subset.ExpressionSet <- function(x,subset,...){ > > } > > I will use the subset argument for rows (genes), as in default method. > > Now I would like to allow to select different columns (features) based on > phenotypic data. > phenotypic data provides detailed information about the columns. > > Basically, first function I have written allows the following: > >> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2) > # subset=NULL takes all rows > > See: there are two conditions on two variables belonging to the associated > data.frame encapsulated in the ExpressionSetObject (to be complete, the > conditions will be applied on more of 2 columns, as they are used on the > phylogenic data.frame that concerns all variables) > To simplify a little bit, this would nearly return: > ExpressionSetObject[,V1==value & V2==value] > > This is nice as I can already handle any number of conditions on variables > values thanks to '...'. First step is > conditions <- list(...) and are then handled later in code > > Nevertheless, those conditions are basic (one value). > > I would like to handle arbitrary conditions, such as: V1 %in% c(value1, > value2) > More simple expression would be passed with V2==value instead of V2=value2 > > My very problem is that I don't know how to turn '...' into an object > containing those conditions that could be used later. > > My attempt which seems the nearest is: > >> foo <- function(...){ >> as.expression(substitute(list(...))) >> } >>foo(x==1,y%in%1:2) > expression(list(x == 1, y %in% 1:2)) > > where as I would like to have something like > list(expression(x==1), expression(y %in% 1:2)) > those expressions beeing evaluated later on in the context of my specific > object. > > > Are there any existing function where '...' are already handled the way I > want so that I can mimic? > > Thanks for any insight. > > > Eric > > --- > > For those who have Biobase available, here is my current subset function and > a demo-case that explains a little bit. > > > library(Biobase) > example(ExpressionSet) # create sample object > print(expressionSet) > > # now my subset function as it is > > subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){ > # subset is used to subset on rows > # ... is used to make multiple conditions on columns based on pData > # list of conditions is handled in ... > stopifnot(is(x,"ExpressionSet")) > phenoData <- pData(x) > listCriteria <- list(...) > if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x))) > subset <- subset & !is.na(subset) > retainedCriteria <- list() > tmp <- sapply(names(listCriteria), function(critname) { > if(!critname %in% colnames(phenoData)){ > if (verbose) cat("\n*** subsetCompounds: Dropped > criteria:",critname, "not in phenoData of object\n") > }else{ > if(is.null(listCriteria[critname])) listCriteria[[critname]]<- > unique(phenoData[,critname]) > retainedCriteria[[critname]] <<- phenoData[,critname] %in% > listCriteria[critname] > } > }) > criteriaValues <- do.call("cbind",retainedCriteria) > > selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)] > ## cbind(phenoData,criteriaValues) > out <- x[subset,selectedColumns] > if (verbose) cat('\n',length(selectedColumns),' columns selected > (',paste(selectedColumns,collapse=' '), > ')\n',sep='') > invisible(return(out)) > } > > # looking at phenotypic data associated with the sample expressionSet >> pData(expressionSet) > sex type score > A Female Control 0.75 > B Male Case 0.40 > C Male Control 0.73 > D Male Case 0.42 > E Female Case 0.93 > F Male Control 0.22 > G Male Case 0.96 > H Male Case 0.79 > I Female Case 0.37 > J Male Control 0.63 > K Male Case 0.26 > L Female Control 0.36 > M Male Case 0.41 > N Male Case 0.80 > O Female Case 0.10 > P Female Control 0.41 > Q Female Case 0.16 > R Male Control 0.72 > S Male Case 0.17 > T Female Case 0.74 > U Male Control 0.35 > V Female Control 0.77 > W Male Control 0.27 > X Male Control 0.98 > Y Female Case 0.94 > Z Female Case 0.32 > > > # now the sample use >> (subset1 =subset(expressionSet,sex="Male",type="Control")) > 7 columns selected (C F J R U W X) > ExpressionSet (storageMode: lockedEnvironment) > assayData: 500 features, 7 samples > element names: exprs, se.exprs > phenoData > sampleNames: C, F, ..., X (7 total) > varLabels and varMetadata description: > sex: Female/Male > type: Case/Control > score: Testing Score > featureData > featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total) > fvarLabels and fvarMetadata description: none > experimentData: use 'experimentData(object)' > Annotation: hgu95av2 > > > # what I would like to allow in use: > (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the => instead of > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
Apparently Analagous Threads
- Error in function (classes, fdef, mtable): unable to find an inherited method for function "indexProbes", for signature "exprSet", "character"
- problem with creation of eSet
- Can't make affylmGUI work
- A problem subsetting a data frame
- FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)