jude.ryan at ubs.com
2009-May-12 19:35 UTC
[R] How do I extract the scoring equations for neural networks and support vector machines?
Sorry for these multiple postings. I solved the problem using na.omit() to drop records with missing values for the time being. I will worry about imputation, etc. later. I calculated the sum of squared errors for 3 models, linear regression, neural networks, and support vector machines. This is the first run. Without doing any parameter tuning on the SVM or playing around with the number of nodes in the hidden layer of the neural network, I found that the SVM had the lowest sum of squared errors, followed by neural networks, with regression being last. This probably indicates that the data has non-linear patterns. I have a couple of questions. 1) Besides sum of squared errors, are there any other metrics that can be used to compare these 3 models? AIC, BIC, etc, can be used for regressions, but I am not sure whether they can be used for SVM's and neural networks. 2) Is there any easy way to extract the scoring equations for SVM's and neural networks? Using the R objects I can always score new data manually but the model will need to be implemented in a production environment. When the model gets implemented in production (could be the mainframe) I will need equations that can be coded in any language (COBOL or SAS on the mainframe). Also, getting the scoring equations for all 3 models will let me create an ensemble model where the predicted value could be the average of the predictions from the SVM, neural network and linear regression. If the ensemble model has the smallest sum of squared errors this would be the model I would use. I have SAS Enterprise Miner as well and can get a scoring equation for the neural network (I don't have SVM), but the scoring code that SAS EM generates sucks and I would much rather extract a scoring equation from R. I am using nnet() for the neural network. Thanks in advance, Jude Ryan ________________________________ From: Ryan, Jude Sent: Tuesday, May 12, 2009 1:23 PM To: 'r-help at r-project.org' Cc: juderyan61 at yahoo.com Subject: FW: neural network not using all observations As a follow-up to my email below: The input data frame to nnet() has dimensions:> dim(coreaff.trn.nn)[1] 5088 8 And the predictions from the neural network (35 records are dropped - see email below for more details) has dimensions:> pred <- predict(coreaff.nn1)> dim(pred)[1] 5053 1 So, the following line of R code does not work as the dimensions are different.> sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2)Error: dims [product 5053] do not match the length of object [5088] In addition: Warning message: In coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1) : longer object length is not a multiple of shorter object length While:> dim(pred)[1] 5053 1> tail(pred)[,1] 5083 664551.9 5084 552170.6 5085 684834.3 5086 1215282.5 5087 1116302.2 5088 658112.1 shows that the last row of pred is 5,088, which corresponds to the dimension of coreaff.trn.nn, the input data frame to the neural network. I tried using row() to identify the 35 records that were dropped (or not scored). The code I tried was:> coreaff.trn.nn.subset <- coreaff.trn.nn[row(coreaff.trn.nn) =row(pred), ]Error in row(coreaff.trn.nn) == row(pred) : non-conformable arrays But I am not doing something right. pred has dimension = 1 and row() requires an object of dimension = 2. So using cbind() I bound a column of sequence numbers to pred to make the dimension = 2 but that did not help. Basically, if I can identify the 5,053 records that the neural network made predictions for, in the data frame of 5,088 records (coreaff.trn.nn) used by the neural network, then I can compare the predictions to the actual values, and compare the predictive power of the neural network to the predictive power of the linear regression model. Any idea how I can extract the 5,053 records that the neural network made predictions for from the data frame (5,088 records) used to train the neural network? Thanks in advance, Jude ________________________________ From: Ryan, Jude Sent: Tuesday, May 12, 2009 11:11 AM To: 'r-help at r-project.org' Cc: juderyan61 at yahoo.com Subject: neural network not using all observations I am exploring neural networks (adding non-linearities) to see if I can get more predictive power than a linear regression model I built. I am using the function nnet and following the example of Venables and Ripley, in Modern Applied Statistics with S, on pages 246 to 249. I have standardized variables (z-scores) such as assets, age and tenure. I have other variables that are binary (0 or 1). In max_acc_ownr_nwrth_n_med for example, the variable has a value of 1 if the client's net worth is above the median net worth and a value of 0 otherwise. These are derived variable I created and variables that the regression algorithm has found to be predictive. A regression on the same variables shown below gives me an R-Square of about 0.12. I am trying to increase the predictive power of this regression model with a neural network being careful to avoid overfitting. Similar to Venables and Ripley, I used the following code:> library(nnet)> dim(coreaff.trn.nn)[1] 5088 8> head(coreaff.trn.nn)hh.iast.y WC_Total_Assets all_assets_per_hh age tenure max_acc_ownr_liq_asts_n_med max_acc_ownr_nwrth_n_med max_acc_ownr_ann_incm_n_med 1 3059448 -0.4692186 -0.4173532 -0.06599001 -1.04747935 0 1 0 2 4899746 3.4854334 4.0111164 -0.06599001 -0.72540200 1 1 1 3 727333 -0.2677357 -0.4177944 -0.30136473 -0.40332465 1 1 1 4 443138 -0.5295170 -0.6999646 -0.14444825 -1.04747935 0 0 0 5 484253 -0.6112205 -0.7306664 0.64013414 0.07979137 1 0 0 6 799054 0.6580506 1.1763114 0.24784295 0.07979137 0 1 1> coreaff.nn1 <- nnet(hh.iast.y ~ WC_Total_Assets + all_assets_per_hh +age + tenure + max_acc_ownr_liq_asts_n_med + + max_acc_ownr_nwrth_n_med + max_acc_ownr_ann_incm_n_med, coreaff.trn.nn, size = 2, decay = 1e-3, + linout = T, skip = T, maxit = 1000, Hess = T) # weights: 26 initial value 12893652845419998.000000 iter 10 value 6352515847944854.000000 final value 6287104424549762.000000 converged> summary(coreaff.nn1)a 7-2-1 network with 26 weights options were - skip-layer connections linear output units decay=0.001 b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 -21604.84 -2675.80 -5001.90 -1240.16 -335.44 -12462.51 -13293.80 -9032.34 b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 210841.52 47296.92 58100.43 -13819.10 -9195.80 117088.99 131939.57 106994.47 b->o h1->o h2->o i1->o i2->o i3->o i4->o i5->o i6->o i7->o 1115190.67 894123.33 -417269.57 89621.84 170268.12 44833.63 59585.05 112405.30 437581.05 244201.69> sum((hh.iast.y - predict(coreaff.nn1))^2)Error: object "hh.iast.y" not found So I try:> sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2)Error: dims [product 5053] do not match the length of object [5088] In addition: Warning message: In coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1) : longer object length is not a multiple of shorter object length Doing a little debugging:> pred <- predict(coreaff.nn1)> dim(pred)[1] 5053 1> dim(coreaff.trn.nn)[1] 5088 8 So it looks like the dimensions (number of records/cases) of the vector pred is 5,053 and the number of records of the input dataset is 5,088. It looks like the neural network is dropping 35 records. Does anyone have any idea of why it would do this? It is most probably because those 35 records are "bad" data, a pretty common occurrence in the real world. Does anyone know how I can identify the dropped records? If I can do this I can get the dimensions of the input dataset to be 5,053 and then:> sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2)would work. A summary of my dataset is:> summary(coreaff.trn.nn)hh.iast.y WC_Total_Assets all_assets_per_hh age tenure max_acc_ownr_liq_asts_n_med Min. : 0 Min. :-6.970e-01 Min. :-8.918e-01 Min. :-4.617e+00 Min. :-1.209e+00 Min. :0.0000 1st Qu.: 565520 1st Qu.:-5.387e-01 1st Qu.:-6.147e-01 1st Qu.:-4.583e-01 1st Qu.:-7.254e-01 1st Qu.:0.0000 Median : 834164 Median :-3.160e-01 Median :-3.718e-01 Median : 9.093e-02 Median :-2.423e-01 Median :0.0000 Mean : 1060244 Mean : 2.948e-13 Mean : 3.204e-12 Mean :-1.884e-11 Mean :-3.302e-12 Mean :0.4951 3rd Qu.: 1207181 3rd Qu.: 1.127e-01 3rd Qu.: 1.891e-01 3rd Qu.: 5.617e-01 3rd Qu.: 5.629e-01 3rd Qu.:1.0000 Max. :45003160 Max. : 1.332e+01 Max. : 4.011e+00 Max. : 5.818e+00 Max. : 4.267e+00 Max. :1.0000 NA's : 3.500e+01 max_acc_ownr_nwrth_n_med max_acc_ownr_ann_incm_n_med Min. :0.0 Min. :0.0000 1st Qu.:0.0 1st Qu.:0.0000 Median :0.5 Median :0.0000 Mean :0.5 Mean :0.3634 3rd Qu.:1.0 3rd Qu.:1.0000 Max. :1.0 Max. :1.0000 Since I am writing this post, I have a few other questions. I know I can compare 2 regression models using: anova(model1, model2) Will this work if one of the models is a regression model and the other model is a neural network? I have not reached the point in building a neural network to try this yet. If not, is there any other way I can compare the performance of a regression model and neural network? If not I may have to resort to programming to do this. I can probably use predict() to get one vector for the regression model and another for the neural network and then compare these predictions against the actual value. Is there any R package that can produce lift charts (ROC curves, gains tables, etc.), K-S statistic, etc., that can be used to quantify the performance of a predictive model (as done in database marketing)? If so, such a package can be used to compare a regression model and a neural network. Another question I have is can any of the neural network packages in R (nnet, AMORE, neural, neuralnet, or others I do not know about) do variable selection (the way the regression methods do)? Or must I do this manually looking at the weights and pruning the network by eliminating weights close to zero (at all the layers in the network)? Thanks in advance, Jude ___________________________________________ Jude Ryan Director, Client Analytical Services Strategy & Business Development UBS Financial Services Inc. 1200 Harbor Boulevard, 4th Floor Weehawken, NJ 07086-6791 Tel. 201-352-1935 Fax 201-272-2914 Email: jude.ryan at ubs.com -------------- next part -------------- Please do not transmit orders or instructions regarding a UBS account electronically, including but not limited to e-mail, fax, text or instant messaging. The information provided in this e-mail or any attachments is not an official transaction confirmation or account statement. For your protection, do not include account numbers, Social Security numbers, credit card numbers, passwords or other non-public information in your e-mail. Because the information contained in this message may be privileged, confidential, proprietary or otherwise protected from disclosure, please notify us immediately by replying to this message and deleting it from your computer if you have received this communication in error. Thank you. UBS Financial Services Inc. UBS International Inc. UBS Financial Services Incorporated of Puerto Rico UBS AG UBS reserves the right to retain all messages. Messages are protected and accessed only in legally justified cases.
Possibly Parallel Threads
- FW: neural network not using all observations
- neural network not using all observations
- Training nnet in two ways, trying to understand the performance difference - with (i hope!) commented, minimal, self-contained, reproducible code
- Nnet and AIC: selection of a parsimonious parameterisation
- Translating R code + library into Fortran?