Anne-Marie Madore
2009-Jan-27 15:15 UTC
[R] Problem with RMA using limma, oligo and pdInfoBuilder packages
Hi, I am a Ph.D. student from Québec, Canada. I’m a beginner with R and Bioconductor. Until now the only experience I have is in analyzing microarray data using affy and limma packages. Now I am trying to analyze Rat Gene 10 st arrays and I would like to run RMA analysis and Smyth moderated t test on those arrays. Since no cdf official package is available for those arrays, after reading many of the questions and responses on this mailing list, I decided to use pdInfoBuilder, oligo and limma packages to run analysis. The problem is, at the end, I get expression and differential expression measured for all probe separately but not the calculated expression representing all probe of each gene. When I run RMA, I got only two steps, Background correcting and Normalizing but not Calculating expression. Do you know how I can get differential expression calculated for each gene? I don’t know if the problem is in the package I built or if I can use some code to answer this question. I list all codes used to build and install the package “pd.ragene.1.0.st.v1” and used to analyze expression arrays below. Many thanks for your help, Anne-Marie Madore ## building the package> library(Biobase)Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type ''openVignette()''. To cite Bioconductor, see ''citation("Biobase")'' and for packages ''citation(pkgname)''.> library(pdInfoBuilder)Loading required package: RSQLite Loading required package: DBI Loading required package: affxparser Loading required package: oligo Loading required package: splines Loading required package: preprocessCore Loading required package: AnnotationDbi Loading required package: oligoClasses oligo Package - Series 1.5.x> setwd("D:/Anne-Marie/Doctorat/puces ADN macrophages/puces rat/AnnieDube/Analyse")> transFile <-"RaGene-1_0-st-v1.na27.rn4.transcript.csv1/RaGene-1_0-st-v1.na27.rn4.transcr ipt.csv"> probeFile <- "RaGene-1_0-st-v1.probe.tab/RaGene-1_0-st-v1.probe.tab"> clfFile <- "RaGene-1_0-st-v1.r4.clf/RaGene-1_0-st-v1.r4.clf"> pgfFile <- "RaGene-1_0-st-v1.r4.pgf/RaGene-1_0-st-v1.r4.pgf"> pkg <- new("AffyGenePDInfoPkgSeed", author="Anne-Marie Madore",email="anne-marie.madore.1@ulaval.ca", version="0.0.1", + genomebuild="RefSeq April 3, 2007, GenBank® January 25, 2007, Rat Ensembl transcripts April 3, 2007 ", + biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, transFile=transFile, probeFile=probeFile)> makePdInfoPackage(pkg, destDir=".")Creating package in ./pd.ragene.1.0.st.v1 loadUnitsByBatch took 50.51 sec loadAffyCsv took 12.73 sec loadAffySeqCsv took 57.62 sec DB sort, index creation took 24.75 sec [1] TRUE Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type ''NULL'' 2: In is.na(x) : is.na() applied to non-(list or vector) of type ''NULL'' ## installing the package in cmd command shell Microsoft Windows [version 6.0.6001] Copyright (c) 2006 Microsoft Corporation. Tous droits réservés. C:\Users\Anne-Marie Madore>cd c:\Program Files\R\R-2.8.1\bin c:\Program Files\R\R-2.8.1\bin>R CMD INSTALL pd.ragene.1.0.st.v1 installing to ''c:/PROGRA~1/R/R-28~1.1/library'' ---------- Making package pd.ragene.1.0.st.v1 ------------ adding build stamp to DESCRIPTION installing NAMESPACE file and metadata installing R files installing inst files preparing package pd.ragene.1.0.st.v1 for lazy loading Loading required package: RSQLite Loading required package: DBI Loading required package: oligoClasses Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type ''openVignette()''. To cite Bioconductor, see ''citation("Biobase")'' and for packages ''citation(pkgname)''. no man files in this package installing indices installing help adding MD5 sums * DONE (pd.ragene.1.0.st.v1) ## If I run a check (R CMD check pd.ragene.st.v1) I get three warning messages and one note: 1. * checking R files for non-ASCII characters ... WARNING Found the following files with non-ASCII characters: all.R Portable packages must use only ASCII characters in their R code, except perhaps in comments. 2. * checking whether the name space can be loaded with stated dependencies ... WARNING Error in initDbConnection() : could not find function "dbConnect" Error: .onLoad failed in ''loadNamespace'' for ''pd.ragene.1.0.st.v1'' Execution halted A namespace must be able to be loaded with just the base namespace loaded: otherwise if the namespace gets loaded by a saved object, the session will be unable to start. Probably some imports need to be declared in the NAMESPACE file. 3. * checking R code for possible problems ... NOTE closeDb: no visible binding for global variable ''dbCon'' 4. * checking for missing documentation entries ... WARNING Undocumented code objects: pd.ragene.1.0.st.v1 All user-level objects in a package should have documentation entries. See the chapter ''Writing R documentation files'' in manual ''Writing R Extensions''. ## analyzing the package> library("pd.ragene.1.0.st.v1")> library(oligo)> library(limma)> library(genefilter)Loading required package: survival> library(geneplotter)Loading required package: lattice Loading required package: annotate Loading required package: xtable KernSmooth 2.22 installed Copyright M. P. Wand 1997> cel.files <- list.celfiles(".", full.names = TRUE)> basename(cel.files)[1] "AD_Ctrl_1.CEL" "AD_Ctrl_2.CEL" "AD_Ctrl_3.CEL" "AD_Ctrl_5.CEL" [5] "AD_Ctrl_6.CEL" "AD_Traite_10.CEL" "AD_Traite_11.CEL" "AD_Traite_7.CEL" [9] "AD_Traite_8.CEL" "AD_Traite_9.CEL"> test <- read.celfiles(cel.files)Platform design info loaded.> phenoData(test) <- read.AnnotatedDataFrame("phenoData.txt", header = TRUE,row.name=1)> class(test)[1] "GeneFeatureSet" attr(,"package") [1] "oligoClasses"> class(phenoData)[1] "standardGeneric" attr(,"package") [1] "methods"> eset <- rma(test)Background correcting Normalizing> e <- exprs(eset)> index1 <- 1:5> index2 <- 6:10> d <- rowMeans(e[, index1]) - rowMeans(e[, index2])> design <- model.matrix(~factor(eset$Key))> fit <- lmFit(eset, design)> ebayes <- eBayes(fit)> sample <- row.names(ebayes)> Pvalue <- ebayes$p.value[,2]> Mean1 <- rowMeans(e[,index1])> Mean2 <- rowMeans(e[,index2])> sd1 <- apply(e[,index1], 1, "sd")> sd2 <- apply(e[,index2], 1, "sd")> csv <- read.csv(file="D:/Anne-Marie/Doctorat/puces ADN macrophages/pucesrat/Annie Dube/Analyse/RaGene-1_0-st-v1.na27.rn4.transcript.header.DOS.csv",head=TRUE, sep=",")> all <- merge(csv, Pvalue, by.x="sample", by.y=0, all=T)> all2 <- merge(all, Mean1, by.x="sample", by.y=0, all=T)> all3 <- merge(all2, Mean2, by.x="sample", by.y=0, all=T)> all4 <- merge(all3, sd1, by.x="sample", by.y=0, all=T)> all5 <- merge(all4, sd2, by.x="sample", by.y=0, all=T)> all6 <- merge(all5, d, by.x="sample", by.y=0, all=T)> write.table(data.frame(all6), file="ULTIMETESTtout.xls", sep="\t",col.names = NA)> dim(all6)[1] 240410 24>> # I also try using slightly the same way as I used with affy package>> pd <- read.AnnotatedDataFrame("pheno.txt", header = TRUE, row.names = 1)> pData(pd)Phenotype AD_Ctrl_1.CEL Control AD_Ctrl_2.CEL Control AD_Ctrl_3.CEL Control AD_Ctrl_5.CEL Control AD_Ctrl_6.CEL Control AD_Traite_7.CEL Traited AD_Traite_8.CEL Traited AD_Traite_9.CEL Traited AD_Traite_10.CEL Traited AD_Traite_11.CEL Traited> a <- read.celfiles(filenames = rownames(pData(pd)), phenoData = pd,verbose = TRUE) Platform design info loaded.> eset <- rma(a)Background correcting Normalizing> exprs.eset <- exprs(eset)> index1 <- 1:5> index2 <- 6:10> d <- (rowMeans(exprs.eset[,index1]) - rowMeans(exprs.eset[,index2]))> population.groups <- factor (c(rep("Control",5), rep ("Traited",5)))> design <- model.matrix (~population.groups)> fit <- lmFit (eset, design)> fit.eBayes <- eBayes (fit)> sample <- row.names(fit.eBayes)> Pvalue <- fit.eBayes$p.value[,2]> Mean1 <- rowMeans(e[,index1])> Mean2 <- rowMeans(e[,index2])> sd1 <- apply(e[,index1], 1, "sd")> sd2 <- apply(e[,index2], 1, "sd")> csv <- read.csv(file="D:/Anne-Marie/Doctorat/puces ADN macrophages/pucesrat/Annie Dube/Analyse/RaGene-1_0-st-v1.na27.rn4.transcript.header.DOS.csv",head=TRUE, sep=",")> all <- merge(csv, Pvalue, by.x="sample", by.y=0, all=T)> all2 <- merge(all, Mean1, by.x="sample", by.y=0, all=T)> all3 <- merge(all2, Mean2, by.x="sample", by.y=0, all=T)> all4 <- merge(all3, sd1, by.x="sample", by.y=0, all=T)> all5 <- merge(all4, sd2, by.x="sample", by.y=0, all=T)> all6 <- merge(all5, d, by.x="sample", by.y=0, all=T)> write.table(data.frame(all6), file="TESTtout.xls", sep="\t", col.names NA)> dim(all6)[1] 240410 24> sessionInfo()R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] geneplotter_1.20.0 annotate_1.20.1 xtable_1.5-4 [4] lattice_0.17-17 genefilter_1.22.0 survival_2.34-1 [7] limma_2.16.3 pd.ragene.1.0.st.v1_0.0.1 pdInfoBuilder_1.6.0 [10] oligo_1.6.0 oligoClasses_1.4.0 AnnotationDbi_1.4.2 [13] preprocessCore_1.4.0 affxparser_1.14.2 RSQLite_0.7-1 [16] DBI_0.2-4 Biobase_2.2.1 loaded via a namespace (and not attached): [1] grid_2.8.1 KernSmooth_2.22-22 RColorBrewer_1.0-2 [[alternative HTML version deleted]]
Martin Morgan
2009-Jan-27 17:08 UTC
[R] Problem with RMA using limma, oligo and pdInfoBuilder packages
Anne-Marie Madore <anne-marie.madore.1 at ulaval.ca> writes:> Hi, > > > > I am a Ph.D. student from Qu?bec, Canada. I?m a beginner with R and > Bioconductor. Until now the only experience I have is in analyzingPlease ask Bioconductor questions on the Bioconductor mailing list. http://bioconductor.org/docs/mailList.html Martin> microarray data using affy and limma packages. Now I am trying to analyze > Rat Gene 10 st arrays and I would like to run RMA analysis and Smyth > moderated t test on those arrays. Since no cdf official package is available > for those arrays, after reading many of the questions and responses on this > mailing list, I decided to use pdInfoBuilder, oligo and limma packages to > run analysis. The problem is, at the end, I get expression and differential > expression measured for all probe separately but not the calculated > expression representing all probe of each gene. When I run RMA, I got only > two steps, Background correcting and Normalizing but not Calculating > expression. Do you know how I can get differential expression calculated for > each gene? I don?t know if the problem is in the package I built or if I can > use some code to answer this question. I list all codes used to build and > install the package ?pd.ragene.1.0.st.v1? and used to analyze expression > arrays below. > > > > Many thanks for your help, > > > > Anne-Marie Madore > > > > > > > > > > ## building the package > > > >> library(Biobase) > > Loading required package: tools > > > > Welcome to Bioconductor > > > > Vignettes contain introductory material. To view, type > > 'openVignette()'. To cite Bioconductor, see > > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > >> library(pdInfoBuilder) > > Loading required package: RSQLite > > Loading required package: DBI > > Loading required package: affxparser > > Loading required package: oligo > > Loading required package: splines > > Loading required package: preprocessCore > > Loading required package: AnnotationDbi > > Loading required package: oligoClasses > > oligo Package - Series 1.5.x > >> setwd("D:/Anne-Marie/Doctorat/puces ADN macrophages/puces rat/Annie > Dube/Analyse") > >> transFile <- > "RaGene-1_0-st-v1.na27.rn4.transcript.csv1/RaGene-1_0-st-v1.na27.rn4.transcr > ipt.csv" > >> probeFile <- "RaGene-1_0-st-v1.probe.tab/RaGene-1_0-st-v1.probe.tab" > >> clfFile <- "RaGene-1_0-st-v1.r4.clf/RaGene-1_0-st-v1.r4.clf" > >> pgfFile <- "RaGene-1_0-st-v1.r4.pgf/RaGene-1_0-st-v1.r4.pgf" > >> pkg <- new("AffyGenePDInfoPkgSeed", author="Anne-Marie Madore", > email="anne-marie.madore.1 at ulaval.ca", version="0.0.1", > > + genomebuild="RefSeq April 3, 2007, GenBank? January 25, 2007, Rat Ensembl > transcripts April 3, 2007 ", > > + biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, > transFile=transFile, probeFile=probeFile) > >> makePdInfoPackage(pkg, destDir=".") > > Creating package in ./pd.ragene.1.0.st.v1 > > loadUnitsByBatch took 50.51 sec > > loadAffyCsv took 12.73 sec > > loadAffySeqCsv took 57.62 sec > > DB sort, index creation took 24.75 sec > > [1] TRUE > > Warning messages: > > 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > > 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > > > > > > ## installing the package in cmd command shell > > > > Microsoft Windows [version 6.0.6001] > > Copyright (c) 2006 Microsoft Corporation. Tous droits r?serv?s. > > > > C:\Users\Anne-Marie Madore>cd c:\Program Files\R\R-2.8.1\bin > > > > c:\Program Files\R\R-2.8.1\bin>R CMD INSTALL pd.ragene.1.0.st.v1 > > installing to 'c:/PROGRA~1/R/R-28~1.1/library' > > > > > > ---------- Making package pd.ragene.1.0.st.v1 ------------ > > adding build stamp to DESCRIPTION > > installing NAMESPACE file and metadata > > installing R files > > installing inst files > > preparing package pd.ragene.1.0.st.v1 for lazy loading > > Loading required package: RSQLite > > Loading required package: DBI > > Loading required package: oligoClasses > > Loading required package: Biobase > > Loading required package: tools > > > > Welcome to Bioconductor > > > > Vignettes contain introductory material. To view, type > > 'openVignette()'. To cite Bioconductor, see > > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > > no man files in this package > > installing indices > > installing help > > adding MD5 sums > > > > * DONE (pd.ragene.1.0.st.v1) > > > > > > ## If I run a check (R CMD check pd.ragene.st.v1) I get three warning > messages and one note: > > > > 1. * checking R files for non-ASCII characters ... WARNING > Found the following files with non-ASCII characters: all.R Portable packages > must use only ASCII characters in their R code, except perhaps in comments. > > 2. * checking whether the name space can be loaded with stated > dependencies ... WARNING > Error in initDbConnection() : could not find function "dbConnect" Error: > .onLoad failed in 'loadNamespace' for 'pd.ragene.1.0.st.v1' Execution halted > A namespace must be able to be loaded with just the base namespace loaded: > otherwise if the namespace gets loaded by a saved object, the session will > be unable to start. > Probably some imports need to be declared in the NAMESPACE file. > > 3. * checking R code for possible problems ... NOTE > closeDb: no visible binding for global variable 'dbCon' > > 4. * checking for missing documentation entries ... WARNING > Undocumented code objects: > pd.ragene.1.0.st.v1 > All user-level objects in a package should have documentation entries. See > the chapter 'Writing R documentation files' in manual 'Writing R > Extensions'. > > > > > > ## analyzing the package > > > >> library("pd.ragene.1.0.st.v1") > >> library(oligo) > >> library(limma) > >> library(genefilter) > > Loading required package: survival > >> library(geneplotter) > > Loading required package: lattice > > Loading required package: annotate > > Loading required package: xtable > > KernSmooth 2.22 installed > > Copyright M. P. Wand 1997 > >> cel.files <- list.celfiles(".", full.names = TRUE) > >> basename(cel.files) > > [1] "AD_Ctrl_1.CEL" "AD_Ctrl_2.CEL" "AD_Ctrl_3.CEL" > "AD_Ctrl_5.CEL" > > [5] "AD_Ctrl_6.CEL" "AD_Traite_10.CEL" "AD_Traite_11.CEL" > "AD_Traite_7.CEL" > > [9] "AD_Traite_8.CEL" "AD_Traite_9.CEL" > >> test <- read.celfiles(cel.files) > > Platform design info loaded. > >> phenoData(test) <- read.AnnotatedDataFrame("phenoData.txt", header = TRUE, > row.name=1) > >> class(test) > > [1] "GeneFeatureSet" > > attr(,"package") > > [1] "oligoClasses" > >> class(phenoData) > > [1] "standardGeneric" > > attr(,"package") > > [1] "methods" > >> eset <- rma(test) > > Background correcting > > Normalizing > >> e <- exprs(eset) > >> index1 <- 1:5 > >> index2 <- 6:10 > >> d <- rowMeans(e[, index1]) - rowMeans(e[, index2]) > >> design <- model.matrix(~factor(eset$Key)) > >> fit <- lmFit(eset, design) > >> ebayes <- eBayes(fit) > >> sample <- row.names(ebayes) > >> Pvalue <- ebayes$p.value[,2] > >> Mean1 <- rowMeans(e[,index1]) > >> Mean2 <- rowMeans(e[,index2]) > >> sd1 <- apply(e[,index1], 1, "sd") > >> sd2 <- apply(e[,index2], 1, "sd") > >> csv <- read.csv(file="D:/Anne-Marie/Doctorat/puces ADN macrophages/puces > rat/Annie > Dube/Analyse/RaGene-1_0-st-v1.na27.rn4.transcript.header.DOS.csv",head=TRUE, > sep=",") > >> all <- merge(csv, Pvalue, by.x="sample", by.y=0, all=T) > >> all2 <- merge(all, Mean1, by.x="sample", by.y=0, all=T) > >> all3 <- merge(all2, Mean2, by.x="sample", by.y=0, all=T) > >> all4 <- merge(all3, sd1, by.x="sample", by.y=0, all=T) > >> all5 <- merge(all4, sd2, by.x="sample", by.y=0, all=T) > >> all6 <- merge(all5, d, by.x="sample", by.y=0, all=T) > >> write.table(data.frame(all6), file="ULTIMETESTtout.xls", sep="\t", > col.names = NA) > >> dim(all6) > > [1] 240410 24 > >> > >> # I also try using slightly the same way as I used with affy package > >> > >> pd <- read.AnnotatedDataFrame("pheno.txt", header = TRUE, row.names = 1) > >> pData(pd) > > Phenotype > > AD_Ctrl_1.CEL Control > > AD_Ctrl_2.CEL Control > > AD_Ctrl_3.CEL Control > > AD_Ctrl_5.CEL Control > > AD_Ctrl_6.CEL Control > > AD_Traite_7.CEL Traited > > AD_Traite_8.CEL Traited > > AD_Traite_9.CEL Traited > > AD_Traite_10.CEL Traited > > AD_Traite_11.CEL Traited > >> a <- read.celfiles(filenames = rownames(pData(pd)), phenoData = pd, > verbose = TRUE) > > Platform design info loaded. > >> eset <- rma(a) > > Background correcting > > Normalizing > >> exprs.eset <- exprs(eset) > >> index1 <- 1:5 > >> index2 <- 6:10 > >> d <- (rowMeans(exprs.eset[,index1]) - rowMeans(exprs.eset[,index2])) > >> population.groups <- factor (c(rep("Control",5), rep ("Traited",5))) > >> design <- model.matrix (~population.groups) > >> fit <- lmFit (eset, design) > >> fit.eBayes <- eBayes (fit) > >> sample <- row.names(fit.eBayes) > >> Pvalue <- fit.eBayes$p.value[,2] > >> Mean1 <- rowMeans(e[,index1]) > >> Mean2 <- rowMeans(e[,index2]) > >> sd1 <- apply(e[,index1], 1, "sd") > >> sd2 <- apply(e[,index2], 1, "sd") > >> csv <- read.csv(file="D:/Anne-Marie/Doctorat/puces ADN macrophages/puces > rat/Annie > Dube/Analyse/RaGene-1_0-st-v1.na27.rn4.transcript.header.DOS.csv",head=TRUE, > sep=",") > >> all <- merge(csv, Pvalue, by.x="sample", by.y=0, all=T) > >> all2 <- merge(all, Mean1, by.x="sample", by.y=0, all=T) > >> all3 <- merge(all2, Mean2, by.x="sample", by.y=0, all=T) > >> all4 <- merge(all3, sd1, by.x="sample", by.y=0, all=T) > >> all5 <- merge(all4, sd2, by.x="sample", by.y=0, all=T) > >> all6 <- merge(all5, d, by.x="sample", by.y=0, all=T) > >> write.table(data.frame(all6), file="TESTtout.xls", sep="\t", col.names > NA) > >> dim(all6) > > [1] 240410 24 > >> sessionInfo() > > R version 2.8.1 (2008-12-22) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] splines tools stats graphics grDevices utils datasets > methods base > > > > other attached packages: > > [1] geneplotter_1.20.0 annotate_1.20.1 xtable_1.5-4 > > > [4] lattice_0.17-17 genefilter_1.22.0 survival_2.34-1 > > > [7] limma_2.16.3 pd.ragene.1.0.st.v1_0.0.1 pdInfoBuilder_1.6.0 > > > [10] oligo_1.6.0 oligoClasses_1.4.0 AnnotationDbi_1.4.2 > > > [13] preprocessCore_1.4.0 affxparser_1.14.2 RSQLite_0.7-1 > > > [16] DBI_0.2-4 Biobase_2.2.1 > > > > loaded via a namespace (and not attached): > > [1] grid_2.8.1 KernSmooth_2.22-22 RColorBrewer_1.0-2 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793