Johan Lassen
2019-Aug-14 09:23 UTC
[R] Isolation forest using "solitude" package: help to predict
Dear community, I would like to know if someone can help clarifying how to predict anomaly scores on new data sets using the "solitude" package. A simple model can be trained using: library(solitude) # Training the model: iris_train <- iris[1:100, ] model <- isolation_forest(iris_train[, 1:4], seed 100,num.trees=100,importance="none") # The anomaly scores of a new test data set can be calculated by iris_test <- iris[100:150, ] predicted_anomalies <- predict(mo, iris_test[, 1:4],type="anomaly_score") #The challenge is how to predict the anomaly scores for a data set with less observations than the #number of observations in the training data set. # Example: using a subset of just 11 observations as compared to the 51 observations results in anomaly scores that are smaller: iris_test <- iris[100:110, ] predicted_anomalies <- predict(mo, iris_test[, 1:4],type="anomaly_score") Anyone knows how to predict "normalised (with respect to sample size)" anomaly scores using the solitude package for R? Thanks in advance! Johan -- Johan Lassen "In the cities people live in time - in the mountains people live in space" (Budistisk munk). [[alternative HTML version deleted]]