For versions 1.6.1 and 2.0.0 of Spark, the GaussianMixture is under the ml
namespace not mllib, try this instead:
envir$model <- "org.apache.spark.mllib.clustering.GaussianMixture"
Best, Javier
On Sun, Oct 9, 2016 at 1:47 PM, Axel Urbiz <axel.urbiz at gmail.com>
wrote:
> Hi All,
>
> Just started to experiment with "sparklyr" and already loving it.
>
> I'm trying to build an extension by constructing an R wrapper to
Spark's
> Gaussian Mixtures. My attempt is below, and so is the error message. Not
> sure if this is possible to do, and if so, what is wrong with my code.
>
> Any hints would be much appreciated.
>
> Best,
> Axel.
>
> -----
>
> library(sparklyr)
> library(dplyr)
> sc <- spark_connect(master = "local")
>
> x <- copy_to(sc, iris)
> x <- x %>% select(Petal_Width, Petal_Length)
>
> # set params
> k <- 3
> iter.max <- 100
> features <- dplyr::tbl_vars(x)
> compute.cost <- TRUE
> tolerance <- 1e-4
> ml.options <- ml_options()
>
> df <- spark_dataframe(x)
> sc <- spark_connection(df)
> df <- ml_prepare_features(
> x = df,
> features = features,
> envir = environment()
> # ml.options = ml.options
> )
> envir <- new.env(parent = emptyenv())
> envir$id <- ml.options$id.column
> df <- df %>%
> sdf_with_unique_id(envir$id) %>%
> spark_dataframe()
> tdf <- ml_prepare_dataframe(df, features, ml.options = ml.options, envir
> envir)
> envir$model <-
"org.apache.spark.ml.clustering.GaussianMixture"
> gmm <- invoke_new(sc, envir$model)
> >Error: failed to invoke spark command
> >16/10/09 16:35:35 ERROR <init> on
org.apache.spark.ml.clustering.GaussianMixture
> failed
>
>
[[alternative HTML version deleted]]