Sudeepa Bhattacharyya
2010-Apr-06 18:43 UTC
[R] Need help on predictive modeling of count data
Hi, I have a data set of a number of compounds and their C13 chemical shifts data. The classes of the compounds are either strong or weak. There are 100 feature variables which are different C13 bins. In each C13 bin the values are counts (number of times a chemical shift is present in that bin) that are either 0, or 1 or 2 or 3. The dataset looks like this: compound Class C13Bin1 C13Bin2 C13Bin3 C13Bin4........C13Bin100 A Strong 0 0 0 2 1 B Weak 0 1 3 2 0 C Strong 0 1 0 0 0 D Weak 0 1 0 0 0 E Strong 0 0 0 3 1 F Strong 1 0 1 2 0 G Strong 2 0 0 0 3 H Weak 0 1 3 1 0 I have 100 observations in my data set and 100 diff c13 bins based on which I would like to do predictive modeling. I am not familiar with using predictive modeling (like using supervised or unsupervised techniques) on such zero-inflated count data. Can anybody help me with this? I use R but I am not an advanced user. Thanks very much, T. Bhatta (please call me Bhatta) [[alternative HTML version deleted]]