drflxms
2008-Sep-01 09:27 UTC
[R] convenient way to calculate specificity, sensitivity and accuracy from raw data
Dear R-colleagues, this is a question from a R-newbie medical doctor: I am evaluating data on inter-observer-reliability in endoscopy. 20 medical doctors judged 42 videos filling out a multiple choice survey for each video. The overall-data is organized in a classical way: observations (items from the multiple choice survey) as columns, each case (identified by the two columns "number of medical doctor" and "number of video") in a row. In addition there is a medical doctor number 21 who is assumed to be a gold-standard. As measure of inter-observer-agreement I calculated kappa according to Fleiss and simple agreement in percent using the routines "kappam.fleiss" and "agree" from the irr-package. Everything worked fine so far. Now I'd like to calculate specificity, sensitivity and accuracy for each item (compared to the gold-standard), as these are well-known and easy to understand quantities for medical doctors. Unfortunately I haven't found a feasible way to do this in R so far. All solutions I found, describe calculation of specificity, sensitivity and accuracy from a contingency-table / confusion-matrix only. For me it is very difficult to create such contingency-tables / confusion-matrices from the raw data I have. So I started to do it in Excel by hand - a lot of work! When I'll keep on doing this, I'll miss the deadline. So maybe someone can help me out: It would be very convenient, if there is way to calculate specificity, sensitivity and accuracy from the very same data.frames I created for the calculation of kappa and agreement. In these data.frames, which were generated from the overall-data-table described above using the "reshape" package, we have the judging medical doctor in the columns and the videos in the rows. In the cells there are the coded answer-options from the multiple choice survey. Please see an simple example with answer-options 0/1 (copied from R console) below: video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 What I did in Excel is: Creating the very same tables using pivot-charts. Comparing columns 1-20 to column 21 (gold-standard), summing up the count of values that are identical to 21. I repeated this for each answer-option. From the results, one can easily calculate specificity, sensitivity and accuracy. How to do this, or something similar leading to the same results in R? I'd appreciate any kind of help very much! Greetings from Munich, Felix
Dimitris Rizopoulos
2008-Sep-01 10:16 UTC
[R] convenient way to calculate specificity, sensitivity and accuracy from raw data
try something like this: dat <- read.table(textConnection("video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"), header = TRUE) closeAllConnections() goldstand <- dat$X21 prev <- sum(goldstand) cprev <- sum(!goldstand) n <- prev + cprev lapply(dat[-1], function(x){ tab <- table(x, goldstand) cS <- colSums(tab) if(nrow(tab) > 1 && ncol(tab) > 1) { out <- c(sp = tab[1,1], sn = tab[2,2]) / cS c(out, ac = (out[1] * cprev + out[2] * prev) / n) } }) I hope it helps. Best, Dimitris Quoting drflxms <drflxms at googlemail.com>:> Dear R-colleagues, > > this is a question from a R-newbie medical doctor: > > I am evaluating data on inter-observer-reliability in endoscopy. 20 > medical doctors judged 42 videos filling out a multiple choice survey > for each video. The overall-data is organized in a classical way: > observations (items from the multiple choice survey) as columns, each > case (identified by the two columns "number of medical doctor" and > "number of video") in a row. In addition there is a medical doctor > number 21 who is assumed to be a gold-standard. > > As measure of inter-observer-agreement I calculated kappa according to > Fleiss and simple agreement in percent using the routines > "kappam.fleiss" and "agree" from the irr-package. Everything worked fine > so far. > > Now I'd like to calculate specificity, sensitivity and accuracy for each > item (compared to the gold-standard), as these are well-known and easy > to understand quantities for medical doctors. > > Unfortunately I haven't found a feasible way to do this in R so far. All > solutions I found, describe calculation of specificity, sensitivity and > accuracy from a contingency-table / confusion-matrix only. For me it is > very difficult to create such contingency-tables / confusion-matrices > from the raw data I have. > > So I started to do it in Excel by hand - a lot of work! When I'll keep > on doing this, I'll miss the deadline. So maybe someone can help me out: > > It would be very convenient, if there is way to calculate specificity, > sensitivity and accuracy from the very same data.frames I created for > the calculation of kappa and agreement. In these data.frames, which were > generated from the overall-data-table described above using the > "reshape" package, we have the judging medical doctor in the columns and > the videos in the rows. In the cells there are the coded answer-options > from the multiple choice survey. Please see an simple example with > answer-options 0/1 (copied from R console) below: > > video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 > 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 > 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 > 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 > 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 > 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 > 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 > 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 > 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 > 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 > 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > What I did in Excel is: Creating the very same tables using > pivot-charts. Comparing columns 1-20 to column 21 (gold-standard), > summing up the count of values that are identical to 21. I repeated this > for each answer-option. From the results, one can easily calculate > specificity, sensitivity and accuracy. > > How to do this, or something similar leading to the same results in R? > I'd appreciate any kind of help very much! > > Greetings from Munich, > Felix > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Dimitris Rizopoulos Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://perswww.kuleuven.be/dimitris_rizopoulos/ Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
bartjoosen
2008-Sep-01 10:22 UTC
[R] convenient way to calculate specificity, sensitivity and accuracy from raw data
Dear Felix, I have no idea about the calculation of your accuracy, sensitivity, etc, but the sums: dat <- read.table(file="clipboard") #read in your data as dataframe dat dat$comp <- apply(dat,1,function(x) sum(x[-c(1,22)]==as.numeric(x[22]))) Good luck Bart drflxms wrote:> > Dear R-colleagues, > > this is a question from a R-newbie medical doctor: > > I am evaluating data on inter-observer-reliability in endoscopy. 20 > medical doctors judged 42 videos filling out a multiple choice survey > for each video. The overall-data is organized in a classical way: > observations (items from the multiple choice survey) as columns, each > case (identified by the two columns "number of medical doctor" and > "number of video") in a row. In addition there is a medical doctor > number 21 who is assumed to be a gold-standard. > > As measure of inter-observer-agreement I calculated kappa according to > Fleiss and simple agreement in percent using the routines > "kappam.fleiss" and "agree" from the irr-package. Everything worked fine > so far. > > Now I'd like to calculate specificity, sensitivity and accuracy for each > item (compared to the gold-standard), as these are well-known and easy > to understand quantities for medical doctors. > > Unfortunately I haven't found a feasible way to do this in R so far. All > solutions I found, describe calculation of specificity, sensitivity and > accuracy from a contingency-table / confusion-matrix only. For me it is > very difficult to create such contingency-tables / confusion-matrices > from the raw data I have. > > So I started to do it in Excel by hand - a lot of work! When I'll keep > on doing this, I'll miss the deadline. So maybe someone can help me out: > > It would be very convenient, if there is way to calculate specificity, > sensitivity and accuracy from the very same data.frames I created for > the calculation of kappa and agreement. In these data.frames, which were > generated from the overall-data-table described above using the > "reshape" package, we have the judging medical doctor in the columns and > the videos in the rows. In the cells there are the coded answer-options > from the multiple choice survey. Please see an simple example with > answer-options 0/1 (copied from R console) below: > > video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 > 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 > 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 > 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 > 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 > 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 > 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 > 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 > 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 > 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 > 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > What I did in Excel is: Creating the very same tables using > pivot-charts. Comparing columns 1-20 to column 21 (gold-standard), > summing up the count of values that are identical to 21. I repeated this > for each answer-option. From the results, one can easily calculate > specificity, sensitivity and accuracy. > > How to do this, or something similar leading to the same results in R? > I'd appreciate any kind of help very much! > > Greetings from Munich, > Felix > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/convenient-way-to-calculate-specificity%2C-sensitivity-and-accuracy-from-raw-data-tp19251644p19252232.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2008-Sep-01 11:31 UTC
[R] convenient way to calculate specificity, sensitivity and accuracy from raw data
Try this: pairs <- data.frame(pred = unlist(DF[2:21]), lab = DF[,22]) library(caret) pred <- factor(pairs$pred) lab <- factor(pairs$lab) table(pred, lab) sensitivity(pred, lab) specificity(pred, lab) Lines <- "video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0" DF <- read.table(textConnection(Lines), header = TRUE) pairs <- data.frame(pred = factor(unlist(DF[2:21])), lab = factor(DF[,22])) head(pairs) # look at first few rows # predictions and gold standard reference labels pred <- pairs$pred lab <- pairs$lab # confusion matrix table(pred, lab) library(caret) sensitivity(pred, lab) specificity(pred, lab) See ?sensitivity and ?specificity and specify the third arg if you want the second level to represent positive rather than the first. On Mon, Sep 1, 2008 at 5:27 AM, drflxms <drflxms at googlemail.com> wrote:> Dear R-colleagues, > > this is a question from a R-newbie medical doctor: > > I am evaluating data on inter-observer-reliability in endoscopy. 20 > medical doctors judged 42 videos filling out a multiple choice survey > for each video. The overall-data is organized in a classical way: > observations (items from the multiple choice survey) as columns, each > case (identified by the two columns "number of medical doctor" and > "number of video") in a row. In addition there is a medical doctor > number 21 who is assumed to be a gold-standard. > > As measure of inter-observer-agreement I calculated kappa according to > Fleiss and simple agreement in percent using the routines > "kappam.fleiss" and "agree" from the irr-package. Everything worked fine > so far. > > Now I'd like to calculate specificity, sensitivity and accuracy for each > item (compared to the gold-standard), as these are well-known and easy > to understand quantities for medical doctors. > > Unfortunately I haven't found a feasible way to do this in R so far. All > solutions I found, describe calculation of specificity, sensitivity and > accuracy from a contingency-table / confusion-matrix only. For me it is > very difficult to create such contingency-tables / confusion-matrices > from the raw data I have. > > So I started to do it in Excel by hand - a lot of work! When I'll keep > on doing this, I'll miss the deadline. So maybe someone can help me out: > > It would be very convenient, if there is way to calculate specificity, > sensitivity and accuracy from the very same data.frames I created for > the calculation of kappa and agreement. In these data.frames, which were > generated from the overall-data-table described above using the > "reshape" package, we have the judging medical doctor in the columns and > the videos in the rows. In the cells there are the coded answer-options > from the multiple choice survey. Please see an simple example with > answer-options 0/1 (copied from R console) below: > > video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 > 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 > 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 > 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 > 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 > 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 > 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 > 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 > 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 > 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 > 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > What I did in Excel is: Creating the very same tables using > pivot-charts. Comparing columns 1-20 to column 21 (gold-standard), > summing up the count of values that are identical to 21. I repeated this > for each answer-option. From the results, one can easily calculate > specificity, sensitivity and accuracy. > > How to do this, or something similar leading to the same results in R? > I'd appreciate any kind of help very much! > > Greetings from Munich, > Felix > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >