Tim Howard
2009-Sep-24 13:09 UTC
[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
All, I'm trying again with a slightly more generic version of my first question. I can extract the plotted values from hist(), boxplot(), and even plot.randomForest(). Observe: # get some data dat <- rnorm(100) # grab histogram data hdat <- hist(dat) hdat #provides details of the hist output #grab boxplot data bdat <- boxplot(dat) bdat #provides details of the boxplot output # the same works for randomForest library(randomForest) data(mtcars) RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, ntree=100), log="y") RFdat ##But, I can't use this method in ROCR library(ROCR) data(ROCR.xval) RCdat <- plot(perf, avg="threshold") RCdat ## output: NULL Does anyone have any tricks for piping or extracting these data? Or, perhaps for steering me in another direction? Thanks, Tim From: "Tim Howard" <tghoward at gw.dec.state.ny.us> Subject: [R] ROCR.plot methods, cross validation averaging To: <osander at mpi-sb.mpg.de>, <tobias.sing at mpi-sb.mpg.de>, <r-help at r-project.org> Message-ID: <4ABA1079.6D16.00D5.0 at gw.dec.state.ny.us> Content-Type: text/plain; charset=US-ASCII Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - I think my first question is generic and could apply to many methods, which is why I'm directing this initially to R-help as well as Tobias and Oliver. Question 1. The plot function in ROCR will average your cross validation data if asked. I'd like to use that averaged data to find a "best" cutoff but I can't figure out how to grab the actual data that get plotted. A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it. Question 2. I am asking ROCR to average lists with varying lengths for each list entry. See my example below. None of the ROCR examples have data structured in this manner. Can anyone speak to whether the averaging methods in ROCR allow for this? If I can't easily grab the data as desired from Question 1, can someone help me figure out how to average the lists, by threshold, similarly? Question 3. If my cross validation data happen to have a list entry whose length = 2, ROCR errors out. Please see the second part of my example. Any suggestions? #reproducible examples exemplifying my questions ##part one## library(ROCR) data(ROCR.xval) # set up data so it looks more like my real data sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) testSet <- ROCR.xval # do the extraction for (i in 1:length(ROCR.xval[[1]])){ y <- sample(c(1:350),sampSize[i]) testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] } # now massage the data using ROCR, set up for a ROC plot # if it errors out here, run the above sample again. pred <- prediction(testSet$predictions, testSet$labels) perf <- performance(pred,"tpr","fpr") # create the ROC plot, averaging by cutoff value plot(perf, avg="threshold") # check out the structure of the data str(perf) # note the ragged edges of the list and that I assume averaging # whether it be vertical, horizontal, or threshold, somehow # accounts for this? ## part two ## # add a list entry with only two values perf at x.values[[1]] <- c(0,1) perf at y.values[[1]] <- c(0,1) perf at alpha.values[[1]] <- c(Inf,0) plot(perf, avg="threshold") ##output results in an error with this message # Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, : # missing value where TRUE/FALSE needed Thanks in advance for your help Tim Howard New York Natural Heritage Program
David Winsemius
2009-Sep-24 13:25 UTC
[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
On Sep 24, 2009, at 9:09 AM, Tim Howard wrote:> All, > I'm trying again with a slightly more generic version of my first > question. I can extract the > plotted values from hist(), boxplot(), and even plot.randomForest(). > Observe: > > # get some data > dat <- rnorm(100) > # grab histogram data > hdat <- hist(dat) > hdat #provides details of the hist output > > #grab boxplot data > bdat <- boxplot(dat) > bdat #provides details of the boxplot output > > # the same works for randomForest > library(randomForest) > data(mtcars) > RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, > ntree=100), log="y") > RFdat > > > ##But, I can't use this method in ROCR > library(ROCR) > data(ROCR.xval) > RCdat <- plot(perf, avg="threshold")That code throws an object not found error. Perhaps you defined perf earlier? David> > RCdat > ## output: NULL > > Does anyone have any tricks for piping or extracting these data? > Or, perhaps for steering me in another direction? > > Thanks, > Tim > > > From: "Tim Howard" <tghoward at gw.dec.state.ny.us> > Subject: [R] ROCR.plot methods, cross validation averaging > To: <osander at mpi-sb.mpg.de>, <tobias.sing at mpi-sb.mpg.de>, > <r-help at r-project.org> > Message-ID: <4ABA1079.6D16.00D5.0 at gw.dec.state.ny.us> > Content-Type: text/plain; charset=US-ASCII > > Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - > > I think my first question is generic and could apply to many methods, > which is why I'm directing this initially to R-help as well as > Tobias and Oliver. > > Question 1. The plot function in ROCR will average your cross > validation > data if asked. I'd like to use that averaged data to find a "best" > cutoff > but I can't figure out how to grab the actual data that get plotted. > A simple redirect of the plot (such as test <- plot(mydata)) doesn't > do it. > > Question 2. I am asking ROCR to average lists with varying lengths for > each list entry. See my example below. None of the ROCR examples > have data > structured in this manner. Can anyone speak to whether the averaging > methods in ROCR allow for this? If I can't easily grab the data as > desired > from Question 1, can someone help me figure out how to average the > lists, > by threshold, similarly? > > Question 3. If my cross validation data happen to have a list entry > whose > length = 2, ROCR errors out. Please see the second part of my example. > Any suggestions? > > #reproducible examples exemplifying my questions > ##part one## > library(ROCR) > data(ROCR.xval) > # set up data so it looks more like my real data > sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) > testSet <- ROCR.xval > # do the extraction > for (i in 1:length(ROCR.xval[[1]])){ > y <- sample(c(1:350),sampSize[i]) > testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] > testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] > } > # now massage the data using ROCR, set up for a ROC plot > # if it errors out here, run the above sample again. > pred <- prediction(testSet$predictions, testSet$labels) > perf <- performance(pred,"tpr","fpr") > # create the ROC plot, averaging by cutoff value > plot(perf, avg="threshold") > # check out the structure of the data > str(perf) > # note the ragged edges of the list and that I assume averaging > # whether it be vertical, horizontal, or threshold, somehow > # accounts for this? > > ## part two ## > # add a list entry with only two values > perf at x.values[[1]] <- c(0,1) > perf at y.values[[1]] <- c(0,1) > perf at alpha.values[[1]] <- c(Inf,0) > > plot(perf, avg="threshold") > > ##output results in an error with this message > # Error in if (from == to) rep.int(from, length.out) else > as.vector(c(from, : > # missing value where TRUE/FALSE needed > > > Thanks in advance for your help > Tim Howard > New York Natural Heritage Program > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
David Winsemius
2009-Sep-24 13:43 UTC
[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
On Sep 24, 2009, at 9:09 AM, Tim Howard wrote:> All, > I'm trying again with a slightly more generic version of my first > question. I can extract the > plotted values from hist(), boxplot(), and even plot.randomForest(). > Observe: > > # get some data > dat <- rnorm(100) > # grab histogram data > hdat <- hist(dat) > hdat #provides details of the hist output > > #grab boxplot data > bdat <- boxplot(dat) > bdat #provides details of the boxplot output > > # the same works for randomForest > library(randomForest) > data(mtcars) > RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, > ntree=100), log="y") > RFdat > > > ##But, I can't use this method in ROCR > library(ROCR) > data(ROCR.xval) > RCdat <- plot(perf, avg="threshold") > > RCdat > ## output: NULL > > Does anyone have any tricks for piping or extracting these data? > Or, perhaps for steering me in another direction?After looking at the examples in ROCR, my guess is that you really ought to examine the perf object itself. It's an S4 object so some of the access to internals are a bit different. In the example performance object I just created, the y-values slot values would ba obtainable with: perf at y.values The is also help from: ?"plot-methods" -- David> > Thanks, > Tim > > > From: "Tim Howard" <tghoward at gw.dec.state.ny.us> > Subject: [R] ROCR.plot methods, cross validation averaging > To: <osander at mpi-sb.mpg.de>, <tobias.sing at mpi-sb.mpg.de>, > <r-help at r-project.org> > Message-ID: <4ABA1079.6D16.00D5.0 at gw.dec.state.ny.us> > Content-Type: text/plain; charset=US-ASCII > > Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - > > I think my first question is generic and could apply to many methods, > which is why I'm directing this initially to R-help as well as > Tobias and Oliver. > > Question 1. The plot function in ROCR will average your cross > validation > data if asked. I'd like to use that averaged data to find a "best" > cutoff > but I can't figure out how to grab the actual data that get plotted. > A simple redirect of the plot (such as test <- plot(mydata)) doesn't > do it. > > Question 2. I am asking ROCR to average lists with varying lengths for > each list entry. See my example below. None of the ROCR examples > have data > structured in this manner. Can anyone speak to whether the averaging > methods in ROCR allow for this? If I can't easily grab the data as > desired > from Question 1, can someone help me figure out how to average the > lists, > by threshold, similarly? > > Question 3. If my cross validation data happen to have a list entry > whose > length = 2, ROCR errors out. Please see the second part of my example. > Any suggestions? > > #reproducible examples exemplifying my questions > ##part one## > library(ROCR) > data(ROCR.xval) > # set up data so it looks more like my real data > sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) > testSet <- ROCR.xval > # do the extraction > for (i in 1:length(ROCR.xval[[1]])){ > y <- sample(c(1:350),sampSize[i]) > testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] > testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] > } > # now massage the data using ROCR, set up for a ROC plot > # if it errors out here, run the above sample again. > pred <- prediction(testSet$predictions, testSet$labels) > perf <- performance(pred,"tpr","fpr") > # create the ROC plot, averaging by cutoff value > plot(perf, avg="threshold") > # check out the structure of the data > str(perf) > # note the ragged edges of the list and that I assume averaging > # whether it be vertical, horizontal, or threshold, somehow > # accounts for this? > > ## part two ## > # add a list entry with only two values > perf at x.values[[1]] <- c(0,1) > perf at y.values[[1]] <- c(0,1) > perf at alpha.values[[1]] <- c(Inf,0) > > plot(perf, avg="threshold") > > ##output results in an error with this message > # Error in if (from == to) rep.int(from, length.out) else > as.vector(c(from, : > # missing value where TRUE/FALSE needed > > > Thanks in advance for your help > Tim Howard > New York Natural Heritage Program > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Tobias Sing
2009-Sep-24 13:57 UTC
[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
Tim, if I understand correctly, you are trying to get the numerical values of averaged cross-validation curves. Unfortunately the plot function of ROCR does not return anything in the current version (it's a good suggestion to change this). If you want a quick fix, you could change the plot.performance function of ROCR to return back the values you wanted. Kind regards, Tobias On Thu, Sep 24, 2009 at 3:09 PM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote:> All, > ?I'm trying again with a slightly more generic version of my first question. I can extract the > plotted values from hist(), boxplot(), and even plot.randomForest(). Observe: > > ?# get some data > dat <- rnorm(100) > ?# grab histogram data > hdat <- hist(dat) > hdat ? ? #provides details of the hist output > > ?#grab boxplot data > bdat <- boxplot(dat) > bdat ? ? #provides details of the boxplot output > > ?# the same works for randomForest > library(randomForest) > data(mtcars) > RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, ntree=100), log="y") > RFdat > > > ##But, I can't use this method in ROCR > library(ROCR) > data(ROCR.xval) > RCdat <- plot(perf, avg="threshold") > > RCdat > ## output: ?NULL > > Does anyone have any tricks for piping or extracting these data? > Or, perhaps for steering me in another direction? > > Thanks, > Tim > > > From: "Tim Howard" <tghoward at gw.dec.state.ny.us> > Subject: [R] ROCR.plot methods, cross validation averaging > To: <osander at mpi-sb.mpg.de>, <tobias.sing at mpi-sb.mpg.de>, > ? ? ? ?<r-help at r-project.org> > Message-ID: <4ABA1079.6D16.00D5.0 at gw.dec.state.ny.us> > Content-Type: text/plain; charset=US-ASCII > > Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - > > I think my first question is generic and could apply to many methods, > which is why I'm directing this initially to R-help as well as Tobias and Oliver. > > Question 1. The plot function in ROCR will average your cross validation > data if asked. I'd like to use that averaged data to find a "best" cutoff > but I can't figure out how to grab the actual data that get plotted. > A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it. > > Question 2. I am asking ROCR to average lists with varying lengths for > each list entry. See my example below. None of the ROCR examples have data > structured in this manner. Can anyone speak to whether the averaging > methods in ROCR allow for this? If I can't easily grab the data as desired > from Question 1, can someone help me figure out how to average the lists, > by threshold, similarly? > > Question 3. If my cross validation data happen to have a list entry whose > length = 2, ROCR errors out. Please see the second part of my example. > Any suggestions? > > #reproducible examples exemplifying my questions > ##part one## > library(ROCR) > data(ROCR.xval) > ?# set up data so it looks more like my real data > sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) > testSet <- ROCR.xval > ?# do the extraction > for (i in 1:length(ROCR.xval[[1]])){ > ?y <- sample(c(1:350),sampSize[i]) > ?testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] > ?testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] > ?} > ?# now massage the data using ROCR, set up for a ROC plot > ?# if it errors out here, run the above sample again. > pred <- prediction(testSet$predictions, testSet$labels) > perf <- performance(pred,"tpr","fpr") > ?# create the ROC plot, averaging by cutoff value > plot(perf, avg="threshold") > ?# check out the structure of the data > str(perf) > ?# note the ragged edges of the list and that I assume averaging > ?# whether it be vertical, horizontal, or threshold, somehow > ?# accounts for this? > > ## part two ## > # add a list entry with only two values > perf at x.values[[1]] <- c(0,1) > perf at y.values[[1]] <- c(0,1) > perf at alpha.values[[1]] <- c(Inf,0) > > plot(perf, avg="threshold") > > ##output results in an error with this message > # Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, ?: > # missing value where TRUE/FALSE needed > > > Thanks in advance for your help > Tim Howard > New York Natural Heritage Program > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >