cratepy.clustering.clusteringalgs.AgglomerativeSP¶

class AgglomerativeSP(t, method='ward', metric='euclidean', criterion='maxclust', n_clusters=None)[source]¶

Bases: AgglomerativeAlgorithm

Agglomerative clustering algorithm (wrapper).

Documentation: see here.

perform_clustering(self, data_matrix):: Perform cluster analysis and get cluster label of each dataset item.

_linkage_matrix¶

Linkage matrix associated with the hierarchical agglomerative clustering (numpy.ndarray of shape (n-1, 4)). At the i-th iteration the clusterings with indices Z[i, 0] and Z[i, 1], with distance Z[i, 2], are merged, forming a new cluster that contains Z[i, 3] original dataset items. All cluster indices j >= n refer to the cluster formed in Z[j-n, :].

Type:: numpy.ndarray (2d)

Constructor.

Parameters:

n_clusters (int, default=None) – The number of clusters to find.
t ({int, float}) – Scalar parameter associated to the criterion used to form a flat clustering. Threshold (float) with criterion in {‘inconsistent’, ‘distance’, ‘monocrit’} or maximum number of clusters with criterion in {‘maxclust’, ‘maxclust_monocrit’}.
method ({'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'}, default='ward') – Linkage criterion.
metric ({str, function}, default='euclidean') – Distance metric to use when the input data matrix is a numpy.ndarray of observation vectors, otherwise ignored. Options: {‘cityblock’, ‘euclidean’, ‘cosine’, …}.
criterion (str, {'inconsistent', 'distance', 'maxclust', 'monocrit', 'maxclust_monocrit'}, default='maxclust') – Criterion used to form a flat clustering (i.e., perform a horizontal cut in the hierarchical tree).

List of Public Methods

`get_linkage_matrix`	Get hierarchical agglomerative clustering linkage matrix.
`perform_clustering`	Perform cluster analysis and get cluster label of each dataset item.

Methods

__init__(t, method='ward', metric='euclidean', criterion='maxclust', n_clusters=None)[source]¶

Constructor.

Parameters:

n_clusters (int, default=None) – The number of clusters to find.
t ({int, float}) – Scalar parameter associated to the criterion used to form a flat clustering. Threshold (float) with criterion in {‘inconsistent’, ‘distance’, ‘monocrit’} or maximum number of clusters with criterion in {‘maxclust’, ‘maxclust_monocrit’}.
method ({'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'}, default='ward') – Linkage criterion.
metric ({str, function}, default='euclidean') – Distance metric to use when the input data matrix is a numpy.ndarray of observation vectors, otherwise ignored. Options: {‘cityblock’, ‘euclidean’, ‘cosine’, …}.
criterion (str, {'inconsistent', 'distance', 'maxclust', 'monocrit', 'maxclust_monocrit'}, default='maxclust') – Criterion used to form a flat clustering (i.e., perform a horizontal cut in the hierarchical tree).

get_linkage_matrix()[source]¶

Get hierarchical agglomerative clustering linkage matrix.

Returns:: linkage_matrix – Linkage matrix associated with the hierarchical agglomerative clustering (numpy.ndarray of shape (n-1, 4)). At the i-th iteration the clusterings with indices Z[i, 0] and Z[i, 1], with distance Z[i, 2], are merged, forming a new cluster that contains Z[i, 3] original dataset items. All cluster indices j >= n refer to the cluster formed in Z[j-n, :].
Return type:: numpy.ndarray (2d)

Notes

The hierarchical agglomerative clustering linkage matrix follows the definition of SciPy agglomerative clustering algorithm (see here).

perform_clustering(data_matrix)[source]¶

Perform cluster analysis and get cluster label of each dataset item.

Parameters:: data_matrix (numpy.ndarray (2d)) – Data matrix containing the required data to perform the cluster analysis (numpy.ndarray of shape (n_items, n_features)).
Returns:: cluster_labels – Cluster label (int) assigned to each dataset item.
Return type:: numpy.ndarray (1d)