Dear all, I have run into a problem when running some code implemented in the Bioconductor panp-package (applied to my own expression data), whereby gene expression values of known true negative probesets (x) are interpolated onto present/absent p-values (y) between 0 and 1 using the *approxfun - function*{stats}; when I have used R version 2.8, everything had worked fine, however, after updating to R 2.11.1., I got unexpected output (explained below). Please correct me here, but as far as I understand, the yleft and yright arguments set the extreme values of the interpolated y-values in case the input x-values (on whose approxfun is applied) fall outside range(x). So if I run approxfun with yleft=1 and yright=0 with y-values between 0 and 1, then I should never get any values higher than 1. However, this is not the case, as this code-example illustrates:> ### define the x-values used to construct the approxfun, basically theseare 2000 expression values ranging from ~ 3 to 7:> xNeg <- NegExprs[, 1] > xNeg <- sort(xNeg, decreasing = TRUE) > > ### generate 2000 y-values between 0 and 1: > yNeg <- seq(0, 1, 1/(length(xNeg) - 1)) > ### define yleft and yright as well as the rule to clarify what shouldhappen if input x-values lie outside range(x):> interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2)Warning message: In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) : collapsing to unique 'x' values> ### apply the approxfun to expression data that range from ~2.9 to 13.9and can therefore lie outside range(xNeg):> PV <- sapply(AllExprs[, 1], interp) > range(PV)[1] 0.000 6208.932> summary(PV)Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03 So the resulting output PV object contains data ranging from 0 to 6208, the latter of which lies outside yleft and is not anywhere close to extreme y-values that were used to set up the interp-function. This seems wrong to me, and from what I understand, yleft and yright are simply ignored? I have attached a few histograms that visualize the data distributions of the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV (the output), so that it is easier to make sense of the data structures... Does anyone have an explanation for this or can tell me how to fix the problem? Thanks a million for any help, best, Sam> sessionInfo()R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] panp_1.18.0 affy_1.26.1 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 preprocessCore_1.10.0 -- ----------------------------------------------------- Samuel Wuest Smurfit Institute of Genetics Trinity College Dublin Dublin 2, Ireland Phone: +353-1-896 2444 Web: http://www.tcd.ie/Genetics/wellmer-2/index.html Email: wuests at tcd.ie ------------------------------------------------------
The plots did not come through, see the posting guide for which attachments are allowed. It will be easier for us to help if you can send reproducible code (we can copy and paste to run, then examine, edit, etc.). Try finding a subset of your data for which the problem still occurs, then send the data if possible, or similar simulated data if you cannot send original data. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Samuel Wuest > Sent: Wednesday, August 25, 2010 8:20 AM > To: r-help at r-project.org > Subject: [R] approxfun-problems (yleft and yright ignored) > > Dear all, > > I have run into a problem when running some code implemented in the > Bioconductor panp-package (applied to my own expression data), whereby > gene > expression values of known true negative probesets (x) are interpolated > onto > present/absent p-values (y) between 0 and 1 using the *approxfun - > function*{stats}; when I have used R version 2.8, everything had > worked fine, > however, after updating to R 2.11.1., I got unexpected output > (explained > below). > > Please correct me here, but as far as I understand, the yleft and > yright > arguments set the extreme values of the interpolated y-values in case > the > input x-values (on whose approxfun is applied) fall outside range(x). > So if > I run approxfun with yleft=1 and yright=0 with y-values between 0 and > 1, > then I should never get any values higher than 1. However, this is not > the > case, as this code-example illustrates: > > > ### define the x-values used to construct the approxfun, basically > these > are 2000 expression values ranging from ~ 3 to 7: > > xNeg <- NegExprs[, 1] > > xNeg <- sort(xNeg, decreasing = TRUE) > > > > ### generate 2000 y-values between 0 and 1: > > yNeg <- seq(0, 1, 1/(length(xNeg) - 1)) > > ### define yleft and yright as well as the rule to clarify what > should > happen if input x-values lie outside range(x): > > interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2) > Warning message: > In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) : > collapsing to unique 'x' values > > ### apply the approxfun to expression data that range from ~2.9 to > 13.9 > and can therefore lie outside range(xNeg): > > PV <- sapply(AllExprs[, 1], interp) > > range(PV) > [1] 0.000 6208.932 > > summary(PV) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03 > > So the resulting output PV object contains data ranging from 0 to 6208, > the > latter of which lies outside yleft and is not anywhere close to extreme > y-values that were used to set up the interp-function. This seems wrong > to > me, and from what I understand, yleft and yright are simply ignored? > > I have attached a few histograms that visualize the data distributions > of > the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV (the > output), so that it is easier to make sense of the data structures... > > Does anyone have an explanation for this or can tell me how to fix the > problem? > > Thanks a million for any help, best, Sam > > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] panp_1.18.0 affy_1.26.1 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 preprocessCore_1.10.0 > > > -- > ----------------------------------------------------- > Samuel Wuest > Smurfit Institute of Genetics > Trinity College Dublin > Dublin 2, Ireland > Phone: +353-1-896 2444 > Web: http://www.tcd.ie/Genetics/wellmer-2/index.html > Email: wuests at tcd.ie > ------------------------------------------------------
<p><br><br><br></p> [[alternative HTML version deleted]]
Gregory Ryslik wrote:> Hi, > > Thank you for the help! Would this imply then that if my "answers" and > "predicted" are both matrices, I need to first make them into factors? I was > hoping to avoid that step...Why are they matrices? What is the additional dimension? And: what should become of the additional dimension? with 2d reference and prediction, do you want to produce 3d or 4d confusion "matrices"?> Thank you again!You are welcome. Claudia> > Kind regards, Greg On Oct 8, 2010, at 10:04 AM, Claudia Beleites wrote: > >> Gregory Ryslik wrote: >>> Hi, I played with the table option but I seem to be only able to give me >>> counts for numbers that exist. For example, if I don't have any 4's that >>> are predicted, that number is skipped! >> Well, you need to tell the function that there _could_ be a 4 : >> >>> ref <- factor (1 : 3) ref <- factor (1 : 4) pred <- factor (c (1 : 3, 1), >>> levels = levels (ref)) ref >> [1] 1 2 3 4 Levels: 1 2 3 4 >>> pred >> [1] 1 2 3 1 Levels: 1 2 3 4 >>> table (ref, pred) >> pred ref 1 2 3 4 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 1 0 0 0 >> >> Claudia >> >>> Thanks, Greg Sent via BlackBerry by AT&T -----Original Message----- From: >>> Claudia Beleites <cbeleites at units.it> Date: Fri, 08 Oct 2010 15:38:31 To: >>> Gregory Ryslik<rsaber at comcast.net> Cc: R Help<r-help at r-project.org> >>> Subject: Re: [R] confusion matrix Dear Greg, If it is only the NA that >>> worries you: function table can deal with that. ? table and: example >>> (table) If you want to make a confusion matrix that works also with >>> fractional answers (e.g. 50% A, 50% B, a.k.a soft classification) then >>> you can contact me and become test user of a package that I'm just >>> writing (you can also wait until it is published to CRAN, but that will >>> take a while). Best regards, Claudia Gregory Ryslik wrote: >>>> Hi Everyone, In follow up to my previous question, I wrote some code >>>> that correctly makes a confusion matrix as I need it. However, it only >>>> works when the numbers are between 1 and n. If the possible outcomes >>>> are between 0 and n, then I can't reference row "0" of the matrix and >>>> the code breaks. Does anyone have any easy fixes for this? I've >>>> attached the entire code to this email. As always, thank you for your >>>> help! Greg Code: answers<-matrix(c(4,2,1,3,2,1),nrow =6) mat1<- >>>> matrix(c(3,3,4,NA,4,2),nrow = 6) mat2<-matrix(c(3,2,1,4,2,3),nrow = 6) >>>> mat3<-matrix(c(4,2,2,2,1,1),nrow = 6) mat4<-matrix(c(4,2,1,3,1,4),nrow >>>> = 6) mat5<-matrix(c(2,3,1,4,2,3),nrow = 6) matrixlist<- >>>> list(mat1,mat2,mat3,mat4,mat5) predicted.values<- >>>> matrix(unlist(matrixlist),nrow = dim(mat1)[1]) >>>> confusion.matrix<-matrix(0, nrow >>>> length(as.vector(unique(answers))),ncol >>>> length(as.vector(unique(answers)))) for(i in >>>> 1:dim(predicted.values)[1]){ for(j in 1: dim(predicted.values)[2]){ >>>> predicted.value<- predicted.values[i,j] if(!is.na(predicted.value)){ >>>> true.value<- answers[i,] confusion.matrix[true.value, predicted.value] >>>> <- confusion.matrix[true.value,predicted.value]+1 } } } class.error<- >>>> diag(1- prop.table(confusion.matrix,1)) >>>> confusion.matrix<-cbind(confusion.matrix,class.error) >>>> confusion.data.frame<-as.data.frame(confusion.matrix) >>>> names(confusion.data.frame)[1:length(as.vector(unique(answers)))]<- >>>> 1:length(as.vector(unique(answers))) >>>> names(confusion.data.frame)[length(as.vector(unique(answers)))+1]<- >>>> "class.error" [[alternative HTML version deleted]] >>>> ______________________________________________ R-help at r-project.org >>>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >>>> read the posting guide http://www.R-project.org/posting-guide.html and >>>> provide commented, minimal, self-contained, reproducible code. >> >> -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali >> Universit? degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste >> >> phone: +39 0 40 5 58-37 68 email: cbeleites at units.it >-- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Universit? degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbeleites at units.it