cratepy.clustering.clusteringalgs.KMeansSK¶
- class KMeansSK(init='k-means++', n_init=10, max_iter=300, tol=0.0001, random_state=None, algorithm='lloyd', n_clusters=None)[source]¶
Bases:
ClusteringAlgorithm
K-Means clustering algorithm (wrapper).
Documentation: see here.
- perform_clustering(self, data_matrix):
Perform cluster analysis and get cluster label of each dataset item.
Constructor.
- Parameters:
n_clusters (int, default=None) – Number of clusters to find.
init ({‘k-means++’, ‘random’, ndarray, callable}, default=’k-means++’) – Method for centroid initialization.
n_init (int, default=10) – Number of times K-Means is run with different centroid seeds.
max_iter (int, default=300) – Maximum number of iterations.
tol (float, default=1e-4) – Convergence tolerance (based on Frobenius norm of the different in the cluster centers of two consecutive iterations).
random_state ({int, RandomState}, default=None) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.
algorithm ({'lloyd', 'elkan'}, default='lloyd') – K-Means algorithm to use. ‘lloyd’ is the classical EM-style algorithm, ‘elkan’ uses the triangle inequality to speed up convergence.
List of Public Methods
Perform cluster analysis and get cluster label of each dataset item.
Methods
- __init__(init='k-means++', n_init=10, max_iter=300, tol=0.0001, random_state=None, algorithm='lloyd', n_clusters=None)[source]¶
Constructor.
- Parameters:
n_clusters (int, default=None) – Number of clusters to find.
init ({‘k-means++’, ‘random’, ndarray, callable}, default=’k-means++’) – Method for centroid initialization.
n_init (int, default=10) – Number of times K-Means is run with different centroid seeds.
max_iter (int, default=300) – Maximum number of iterations.
tol (float, default=1e-4) – Convergence tolerance (based on Frobenius norm of the different in the cluster centers of two consecutive iterations).
random_state ({int, RandomState}, default=None) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.
algorithm ({'lloyd', 'elkan'}, default='lloyd') – K-Means algorithm to use. ‘lloyd’ is the classical EM-style algorithm, ‘elkan’ uses the triangle inequality to speed up convergence.
- perform_clustering(data_matrix)[source]¶
Perform cluster analysis and get cluster label of each dataset item.
- Parameters:
data_matrix (numpy.ndarray (2d)) – Data matrix containing the required data to perform the cluster analysis (numpy.ndarray of shape (n_items, n_features)).
- Returns:
cluster_labels – Cluster label (int) assigned to each dataset item.
- Return type:
numpy.ndarray (1d)