This function tests multiple k-means clustering solutions, optionally after applying principal component analysis (PCA) for dimensionality reduction.
Models are always run on the raw standardized data (no PCA).
If pc_range is provided, additional models are run using the first N principal components.
Results include cluster sizes, silhouette statistics, and within-cluster sum of squares (WSS).
Use mean_sil and total_wss to identify optimal cluster configurations.
Usage
kmeans_grid(data, pc_range = NULL, k_range = 2:6, id_var = NULL)
Arguments
- data
A data frame of numeric variables to cluster. Non-numeric columns are dropped.
- pc_range
Optional numeric vector of principal component counts. If NULL, no PCA is applied.
- k_range
Numeric vector of cluster counts to test. Defaults to 2:6.
- id_var
ID variable to drop before clustering.
Value
A data frame summarizing results for each combination of PCs and clusters.
Details
The function automatically excludes the variable named "weight" from the clustering input.
Examples
# K-means runs (2-5 k-range) with and without PCA (2-4 PCs range)
kmeans_results <- kmeans_grid(data = data_clust, pc_range = 2:4, k_range = 2:5)
#> Error: These packages are required but not installed: dplyr