cratepy.clustering.clusteringalgs.AgglomerativeSP

class AgglomerativeSP(t, method='ward', metric='euclidean', criterion='maxclust', n_clusters=None)[source]

Bases: AgglomerativeAlgorithm

Agglomerative clustering algorithm (wrapper).

Documentation: see here.

perform_clustering(self, data_matrix):

Perform cluster analysis and get cluster label of each dataset item.

_linkage_matrix

Linkage matrix associated with the hierarchical agglomerative clustering (numpy.ndarray of shape (n-1, 4)). At the i-th iteration the clusterings with indices Z[i, 0] and Z[i, 1], with distance Z[i, 2], are merged, forming a new cluster that contains Z[i, 3] original dataset items. All cluster indices j >= n refer to the cluster formed in Z[j-n, :].

Type:

numpy.ndarray (2d)

Constructor.

Parameters:
  • n_clusters (int, default=None) – The number of clusters to find.

  • t ({int, float}) – Scalar parameter associated to the criterion used to form a flat clustering. Threshold (float) with criterion in {‘inconsistent’, ‘distance’, ‘monocrit’} or maximum number of clusters with criterion in {‘maxclust’, ‘maxclust_monocrit’}.

  • method ({'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'}, default='ward') – Linkage criterion.

  • metric ({str, function}, default='euclidean') – Distance metric to use when the input data matrix is a numpy.ndarray of observation vectors, otherwise ignored. Options: {‘cityblock’, ‘euclidean’, ‘cosine’, …}.

  • criterion (str, {'inconsistent', 'distance', 'maxclust', 'monocrit', 'maxclust_monocrit'}, default='maxclust') – Criterion used to form a flat clustering (i.e., perform a horizontal cut in the hierarchical tree).

List of Public Methods

get_linkage_matrix

Get hierarchical agglomerative clustering linkage matrix.

perform_clustering

Perform cluster analysis and get cluster label of each dataset item.

Methods

__init__(t, method='ward', metric='euclidean', criterion='maxclust', n_clusters=None)[source]

Constructor.

Parameters:
  • n_clusters (int, default=None) – The number of clusters to find.

  • t ({int, float}) – Scalar parameter associated to the criterion used to form a flat clustering. Threshold (float) with criterion in {‘inconsistent’, ‘distance’, ‘monocrit’} or maximum number of clusters with criterion in {‘maxclust’, ‘maxclust_monocrit’}.

  • method ({'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'}, default='ward') – Linkage criterion.

  • metric ({str, function}, default='euclidean') – Distance metric to use when the input data matrix is a numpy.ndarray of observation vectors, otherwise ignored. Options: {‘cityblock’, ‘euclidean’, ‘cosine’, …}.

  • criterion (str, {'inconsistent', 'distance', 'maxclust', 'monocrit', 'maxclust_monocrit'}, default='maxclust') – Criterion used to form a flat clustering (i.e., perform a horizontal cut in the hierarchical tree).

get_linkage_matrix()[source]

Get hierarchical agglomerative clustering linkage matrix.

Returns:

linkage_matrix – Linkage matrix associated with the hierarchical agglomerative clustering (numpy.ndarray of shape (n-1, 4)). At the i-th iteration the clusterings with indices Z[i, 0] and Z[i, 1], with distance Z[i, 2], are merged, forming a new cluster that contains Z[i, 3] original dataset items. All cluster indices j >= n refer to the cluster formed in Z[j-n, :].

Return type:

numpy.ndarray (2d)

Notes

The hierarchical agglomerative clustering linkage matrix follows the definition of SciPy agglomerative clustering algorithm (see here).

perform_clustering(data_matrix)[source]

Perform cluster analysis and get cluster label of each dataset item.

Parameters:

data_matrix (numpy.ndarray (2d)) – Data matrix containing the required data to perform the cluster analysis (numpy.ndarray of shape (n_items, n_features)).

Returns:

cluster_labels – Cluster label (int) assigned to each dataset item.

Return type:

numpy.ndarray (1d)