Dear R help list,
I have a training dataset that looks like Table1.
I have an unknown dataset that looks like Table2.
I want to have a program that should search the training dataset and
identify that the unknown sample belongs to which category (type1, type2 or
type3)
and also if the unknown does not belong to any of the categories, it should
let me know.
The real dataset has 600 variables and 50 sample types.
I tried working with linear discriminant analysis (lda in MASS package) and
its predict function. It works great but I think lda is supposed to
categorize unknown into one of the types.
Most of my unknowns would not be from any category in the training dataset.
I don't want to have false positive identification.
Table 1: Three types and 10 variables
type1 type1 type1 type2 type2 type2 type3 type3
type3
var1 24 28 25 50 51 46 18 20 16
var2 4 5 4 9 8 9 10 9 10
var3 7 7 7 12 12 12 9 6 6
var4 4 5 4 10 12 9 2 2 2
var5 4 5 4 10 9 10 3 2 3
var6 5 4 5 2 3 2 1 3 5
var7 5 4 5 7 7 7 3 3 3
var8 3 4 3 10 10 8 4 2 4
var9 3 4 3 2 2 2 2 2 2
var10 3 3 3 4 4 4 3 1 2
Table 2
unknown
var1 23
var2 4
var3 7
var4 4
var5 4
var6 6
var7 5
var8 3
var9 3
var10 3
Thanks
RS
[[alternative HTML version deleted]]