| kmeans_tidiers {broom} | R Documentation |
These methods summarize the results of k-means clustering into three
tidy forms. tidy describes the center and size of each cluster,
augment adds the cluster assignments to the original data, and
glance summarizes the total within and between sum of squares
of the clustering.
## S3 method for class 'kmeans'
tidy(x, col.names = paste0("x", 1:ncol(x$centers)), ...)
## S3 method for class 'kmeans'
augment(x, data, ...)
## S3 method for class 'kmeans'
glance(x, ...)
x |
kmeans object |
col.names |
The names to call each dimension of the data in |
... |
extra arguments, not used |
data |
Original data (required for |
All tidying methods return a data.frame without rownames.
The structure depends on the method chosen.
tidy returns one row per cluster, with one column for each
dimension in the data describing the center, followed by
size |
The size of each cluster |
withinss |
The within-cluster sum of squares |
cluster |
A factor describing the cluster from 1:k |
augment returns the original data with one extra column:
.cluster |
The cluster assigned by the k-means algorithm |
glance returns a one-row data.frame with the columns
totss |
The total sum of squares |
tot.withinss |
The total within-cluster sum of squares |
betweenss |
The total between-cluster sum of squares |
iter |
The numbr of (outer) iterations |
library(dplyr)
library(ggplot2)
set.seed(2014)
centers <- data.frame(cluster=factor(1:3), size=c(100, 150, 50),
x1=c(5, 0, -3), x2=c(-1, 1, -2))
points <- centers %>% group_by(cluster) %>%
do(data.frame(x1=rnorm(.$size[1], .$x1[1]),
x2=rnorm(.$size[1], .$x2[1])))
k <- kmeans(points %>% dplyr::select(x1, x2), 3)
tidy(k)
head(augment(k, points))
glance(k)
ggplot(augment(k, points), aes(x1, x2)) +
geom_point(aes(color = .cluster)) +
geom_text(aes(label = cluster), data = tidy(k), size = 10)