cratepy.clustering.clusteringalgs.AgglomerativeSP¶
- class AgglomerativeSP(t, method='ward', metric='euclidean', criterion='maxclust', n_clusters=None)[source]¶
Bases:
AgglomerativeAlgorithm
Agglomerative clustering algorithm (wrapper).
Documentation: see here.
- perform_clustering(self, data_matrix):
Perform cluster analysis and get cluster label of each dataset item.
- _linkage_matrix¶
Linkage matrix associated with the hierarchical agglomerative clustering (numpy.ndarray of shape (n-1, 4)). At the i-th iteration the clusterings with indices Z[i, 0] and Z[i, 1], with distance Z[i, 2], are merged, forming a new cluster that contains Z[i, 3] original dataset items. All cluster indices j >= n refer to the cluster formed in Z[j-n, :].
- Type:
numpy.ndarray (2d)
Constructor.
- Parameters:
n_clusters (int, default=None) – The number of clusters to find.
t ({int, float}) – Scalar parameter associated to the criterion used to form a flat clustering. Threshold (float) with criterion in {‘inconsistent’, ‘distance’, ‘monocrit’} or maximum number of clusters with criterion in {‘maxclust’, ‘maxclust_monocrit’}.
method ({'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'}, default='ward') – Linkage criterion.
metric ({str, function}, default='euclidean') – Distance metric to use when the input data matrix is a numpy.ndarray of observation vectors, otherwise ignored. Options: {‘cityblock’, ‘euclidean’, ‘cosine’, …}.
criterion (str, {'inconsistent', 'distance', 'maxclust', 'monocrit', 'maxclust_monocrit'}, default='maxclust') – Criterion used to form a flat clustering (i.e., perform a horizontal cut in the hierarchical tree).
List of Public Methods
Get hierarchical agglomerative clustering linkage matrix.
Perform cluster analysis and get cluster label of each dataset item.
Methods
- __init__(t, method='ward', metric='euclidean', criterion='maxclust', n_clusters=None)[source]¶
Constructor.
- Parameters:
n_clusters (int, default=None) – The number of clusters to find.
t ({int, float}) – Scalar parameter associated to the criterion used to form a flat clustering. Threshold (float) with criterion in {‘inconsistent’, ‘distance’, ‘monocrit’} or maximum number of clusters with criterion in {‘maxclust’, ‘maxclust_monocrit’}.
method ({'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'}, default='ward') – Linkage criterion.
metric ({str, function}, default='euclidean') – Distance metric to use when the input data matrix is a numpy.ndarray of observation vectors, otherwise ignored. Options: {‘cityblock’, ‘euclidean’, ‘cosine’, …}.
criterion (str, {'inconsistent', 'distance', 'maxclust', 'monocrit', 'maxclust_monocrit'}, default='maxclust') – Criterion used to form a flat clustering (i.e., perform a horizontal cut in the hierarchical tree).
- get_linkage_matrix()[source]¶
Get hierarchical agglomerative clustering linkage matrix.
- Returns:
linkage_matrix – Linkage matrix associated with the hierarchical agglomerative clustering (numpy.ndarray of shape (n-1, 4)). At the i-th iteration the clusterings with indices Z[i, 0] and Z[i, 1], with distance Z[i, 2], are merged, forming a new cluster that contains Z[i, 3] original dataset items. All cluster indices j >= n refer to the cluster formed in Z[j-n, :].
- Return type:
numpy.ndarray (2d)
Notes
The hierarchical agglomerative clustering linkage matrix follows the definition of SciPy agglomerative clustering algorithm (see here).
- perform_clustering(data_matrix)[source]¶
Perform cluster analysis and get cluster label of each dataset item.
- Parameters:
data_matrix (numpy.ndarray (2d)) – Data matrix containing the required data to perform the cluster analysis (numpy.ndarray of shape (n_items, n_features)).
- Returns:
cluster_labels – Cluster label (int) assigned to each dataset item.
- Return type:
numpy.ndarray (1d)