drflxms
2010-Nov-29 13:32 UTC
[R] selecting only corresponding categories from a confusion matrix
Dear R colleagues, as a result of my calculations regarding the inter-observer-variability in bronchoscopy, I get a confusion matrix like the following: 0 1 1001 1010 11 0 609 11 54 36 6 1 1 2 6 0 2 10 14 0 0 8 4 100 4 0 0 0 0 1000 23 7 12 10 5 1001 0 0 4 0 0 1010 4 0 0 3 0 1011 1 0 1 0 2 11 0 0 3 3 1 110 1 0 0 0 0 1100 2 0 0 0 0 1110 1 0 0 0 0 The first column represents the categories found among observers, the top row represents the categories found by the reference ("goldstandard"). I am looking for a way (general algorithm) to extract a data.frame with only the corresponding categories among observers and reference from the above confusion matrix. "Corresponding" means in this case, that a category has been chosen by both: observers and reference. In this example corresponding categories would be simply all categories that have been chosen by the reference (0,1,1001,1010,11), but generally there might also occur categories which are found by the reference only (and not among observers - in the first column). So the solution-dataframe for the above example would look like: 0 1 1001 1010 11 0 609 11 54 36 6 1 1 2 6 0 2 1001 0 0 4 0 0 1010 4 0 0 3 0 11 0 0 3 3 1 All the categories found among observers only, were omitted. If the solution algorithm would include a method to list the omitted categories and to count their number as well as the number of omitted cases, it would be just perfect for me. I'd be happy to read from you soon! Thanks in advance for any kind of help with this. Greetings from snowy Munich, Felix
David Winsemius
2010-Nov-29 13:49 UTC
[R] selecting only corresponding categories from a confusion matrix
On Nov 29, 2010, at 8:32 AM, drflxms wrote:> Dear R colleagues, > > as a result of my calculations regarding the inter-observer- > variability > in bronchoscopy, I get a confusion matrix like the following: > > 0 1 1001 1010 11 > 0 609 11 54 36 6 > 1 1 2 6 0 2 > 10 14 0 0 8 4 > 100 4 0 0 0 0 > 1000 23 7 12 10 5 > 1001 0 0 4 0 0 > 1010 4 0 0 3 0 > 1011 1 0 1 0 2 > 11 0 0 3 3 1 > 110 1 0 0 0 0 > 1100 2 0 0 0 0 > 1110 1 0 0 0 0 > > The first column represents the categories found among observers, the > top row represents the categories found by the reference > ("goldstandard"). > I am looking for a way (general algorithm) to extract a data.frame > with > only the corresponding categories among observers and reference from > the > above confusion matrix. "Corresponding" means in this case, that a > category has been chosen by both: observers and reference. > In this example corresponding categories would be simply all > categories > that have been chosen by the reference (0,1,1001,1010,11), but > generally > there might also occur categories which are found by the reference > only > (and not among observers - in the first column). > So the solution-dataframe for the above example would look like: > > 0 1 1001 1010 11 > 0 609 11 54 36 6 > 1 1 2 6 0 2 > 1001 0 0 4 0 0 > 1010 4 0 0 3 0 > 11 0 0 3 3 1I wasn't able to follow the confusing, er, confusion matrix explanation but it appears from a comparison of the input and output that you just want row indices that are the column names: > mtx[colnames(mtx), ] 0 1 1001 1010 11 0 609 11 54 36 6 1 1 2 6 0 2 1001 0 0 4 0 0 1010 4 0 0 3 0 11 0 0 3 3 1 > > # and the omitted > > mtx[!rownames(mtx) %in% colnames(mtx), ] 0 1 1001 1010 11 10 14 0 0 8 4 100 4 0 0 0 0 1000 23 7 12 10 5 1011 1 0 1 0 2 110 1 0 0 0 0 1100 2 0 0 0 0 1110 1 0 0 0 0 > > # and their number: > > NROW(mtx[!rownames(mtx) %in% colnames(mtx), ]) [1] 7> > All the categories found among observers only, were omitted. > > If the solution algorithm would include a method to list the omitted > categories and to count their number as well as the number of omitted > cases, it would be just perfect for me.David Winsemius, MD West Hartford, CT