AURORA GONZALEZ VIDAL
2017-Apr-10 12:27 UTC
[R] GA package integer hyperparameters optimization
Hello everybody. I am? using the GA package[1] in order to optimize the hyperparameter of SVM like in this example is done: http://stackoverflow.com/questions/32026436/how-to-optimize-parameters-using-genetic-algorithms However, when I try to adapt the example for random forest, it takes very very long to optimize. It might be because the hyperparameter of random forest are integers (ntree, mtry, nodes) but I don't know if there is a way to specify it in the algorithm. Any suggestion would be very much appreciated. Thank you! The code: library(GA) library("randomForest") data(Ozone, package="mlbench") Data <- na.omit(Ozone) # Setup the data for cross-validation K = 5 # 5-fold cross-validation fold_inds <- sample(1:K, nrow(Data), replace = TRUE) lst_CV_data <- lapply(1:K, function(i) list( ? train_data = Data[fold_inds != i, , drop = FALSE], ? test_data = Data[fold_inds == i, , drop = FALSE])) # Given the values of parameters 'ntree', 'mtry' and 'nodesize', return the rmse of the model over the test data evalParamsRF <- function(train_data, test_data, ntree, mtry, nodesize) { ? # Train ? model <- randomForest(V4 ~ ., data = train_data, ntree = ntree, mtry mtry, nodesize = nodesize ??????????????????????? , proximity=T) ? # Test ? rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2) ? return (rmse) } fitnessFuncRF <- function(x, Lst_CV_Data) { ? # Retrieve the RF parameters ? ntree_val <- x[1] ? mtry_val <- x[2] ? nodesize_val <- x[3] ? ? # Use cross-validation to estimate the RMSE for each split of the dataset ? rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data, ????????????????????????????????????????????????????????? evalParamsRF(train_data, test_data, ntree_val ?????????????????????????????????????????????????????????????????????? , mtry_val, nodesize_val))) ? ? # As fitness measure, return minus the average rmse (over the cross-validation folds), ? # so that by maximizing fitness we are minimizing the rmse ? return (-mean(rmse_vals)) } theta_min <- c(ntree = 100, mtry = 2, nodesize = 3) theta_max <- c(ntree = 1000, mtry = 7, nodesize = 20) # Run the genetic algorithm results <- ga(type = "real-valued", fitness = fitnessFuncRF, lst_CV_data, ????????????? names = names(theta_min), ????????????? min = theta_min, max = theta_max, ????????????? popSize = 50, maxiter = 10) summary(results) summary(results)$solution Links: ------ [1] https://cran.r-project.org/web/packages/GA/index.html ------ Aurora Gonz?lez Vidal Ph.D. student in Data Analytics for Energy Efficiency Faculty of Computer Sciences University of Murcia @. aurora.gonzalez2 at um.es T. 868 88 7866 sae.saiblogs.inf.um.es [[alternative HTML version deleted]]