Dear Colleagues, please help distribute the following announcement of a data mining competition. Thanks, -XG ============================Data Mining Competition 2008 ============================ Website: http://dms.stat.ucf.edu/competition08/home.htm Department of Statistics & Actuarial Science University of Central Florida ANNOUNCEMENT The Data Mining program at the University of Central Florida (UCF) is announcing a data mining competition on marketing response analysis in collaboration with BlueCross BlueShields of Florida (BCBSFL). The purpose of this project is to develop a predictive model the can generate a list of potential responders in a future promotion mailing campaign. The response/target variable is 0-1 binary with value1 indicating a response in the previous mail campaign. Most of the explanatory variables or inputs used in this study are from census data and the rest are from a list data vendor. We have renamed all input variables as X1, X2, ... for data security and privacy concerns. DATASET DOWNLOADS Two formats of the datasets are made available: SAS formatted and comma-separated values (CSV). Please select the one that serves best to your convenience after registration. Register to Download Dataset Training Test SAS training.sas7bdat (392.53 mb) test.sas7bdat (43.89 mb) CSV training.csv (257.00 mb) test.csv (28.55 mb) PARTICIPATION AND AWARDS This competition is open to anyone interested. Please review the following rules carefully and contact us with any questions at data.mining.2008 at gmail.com. Please build your model using the training data set and accordingly obtain your predicted probability of response for each individual in the test sample. Two deliverables must be submitted by 5:00 pm (Eastern Time) on 3/31/2008 in order to participate in the contest. ? A data set with two columns: one is ID and the other is your predicted probabilities of response (not 0-1 predicted outcomes). ? A one-page write-up that contains your contact information and a brief description of your modeling methods and approaches. The contact information should list the names, titles, academic degrees, affiliations, and locations (city, state, and country, if international) of all authors. The top three winners will be selected according to predicted probabilities on the test sample data. All participants will be ranked using the following two specific model performance measures. ? Criterion 1: area under the receiver operating characteristic (ROC) curve. ? Criterion 2: percentage of responders caught among the first 10,000 individuals with highest prediction response probabilities. Then the final ranking will be the sum of these two separate ranks. In the case of ties (e.g., Tom has got No.1 in terms of Criterion 1 and No.3 in terms of Criterion 2, while Jerry has got No. 2 with both criteria), the one with higher rank in terms of Criterion 1 (i.e., Tom) would win out. All sponsored by BLBSFL, a cash prize of $1,000 will be awarded to the best performer; $500 for the second and $250 for the third. The three winning individuals or teams will also be invited to present their results at the Fourth Annual Business Intelligence Symposium in Orlando, FL on April 11, 2008. Award plates will be presented to the winners during the symposium. The work can be completed by an individual or group, but only one individual will be invited to present their work at the Symposium for a winning team. IMPORTANT DATES Feburuary 08, 2008 Competition Announced March 31, 2008 Submissions for Competition by 5:00 pm (Eastern Time) April 02, 2008 Announcement of Winners April 11-12, 2008 Fourth Annual Business Intelligence Symposium in Orlando, FL ===============================Xiaogang Su, Ph.D. Associate Professor / Undergraduate Coordinator Department of Statistics and Actuarial Science University of Central Florida Orlando, FL 32816 (407) 823-2940 [O] xiaosu at mail.ucf.edu http://pegasus.cc.ucf.edu/~xsu/