| spark.kmeans {SparkR} | R Documentation |
Fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans().
Users can call summary to print a summary of the fitted model, predict to make
predictions on new data, and write.ml/read.ml to save/load fitted models.
spark.kmeans(data, formula, ...)
## S4 method for signature 'SparkDataFrame,formula'
spark.kmeans(data, formula, k = 2,
maxIter = 20, initMode = c("k-means||", "random"))
## S4 method for signature 'KMeansModel'
summary(object, ...)
## S4 method for signature 'KMeansModel'
predict(object, newData)
## S4 method for signature 'KMeansModel,character'
write.ml(object, path, overwrite = FALSE)
data |
a SparkDataFrame for training. |
formula |
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in spark.kmeans. |
... |
additional argument(s) passed to the method. |
k |
number of centers. |
maxIter |
maximum iteration number. |
initMode |
the initialization algorithm choosen to fit the model. |
object |
a fitted k-means model. |
newData |
a SparkDataFrame for testing. |
path |
the directory where the model is saved. |
overwrite |
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
spark.kmeans returns a fitted k-means model.
summary returns the model's coefficients, size and cluster.
predict returns the predicted values based on a k-means model.
spark.kmeans since 2.0.0
summary(KMeansModel) since 2.0.0
predict(KMeansModel) since 2.0.0
write.ml(KMeansModel, character) since 2.0.0
## Not run: sparkR.session() data(iris) df <- createDataFrame(iris) model <- spark.kmeans(df, Sepal_Length ~ Sepal_Width, k = 4, initMode = "random") summary(model) # fitted values on training data fitted <- predict(model, df) head(select(fitted, "Sepal_Length", "prediction")) # save fitted model to input path path <- "path/to/model" write.ml(model, path) # can also read back the saved model and print savedModel <- read.ml(path) summary(savedModel) ## End(Not run)