qsarify.clustering module

Clustering Module

This module contains functions for clustering features based on hierarchical clustering method and calculating the cophenetic correlation coefficient of linkages. The cophenetic correlation coefficient is a measure of the correlation between the distance of observations in feature space and the distance of observations in cluster space. The cophenetic correlation coefficient is calculated for each linkage method and the method with the highest cophenetic correlation coefficient is used to cluster the features. The cophenetic correlation coefficient is calculated using the scipy.cluster.hierarchy.cophenet function.

qsarify.clustering.cophenetic(X_data)[source]

Calculate the cophenetic correlation coefficient of linkages

Parameters:
  • X_data (pandas DataFrame, shape = (n_samples, m_features)) –

  • method (str, method for linkage generation, default = 'corr' (Pearson correlation)) –

Return type:

None

class qsarify.clustering.featureCluster(X_data, method='corr', link='average', cut_d=3)[source]

Bases: object

Make cluster of features based on hierarchical clustering method

Parameters:
  • X_data (pandas DataFrame, shape = (n_samples, n_features)) –

  • link (str, kind of linkage method, default = 'average', 'complete', 'single') –

  • cut_d (int, depth in cluster(dendrogram), default = 3) –

  • functions (Sub) –

  • -------------

  • set_cluster(self)

  • cluster_dist(self)

cluster_dist()[source]

Show dendrogram of hierarchical clustering

Return type:

None

set_cluster(verbose=False, graph=False)[source]

Make cluster of features based on hierarchical clustering method

Parameters:
  • verbose (bool, print cluster information, default = False) –

  • graph (bool, show dendrogram, default = False) –

Returns:

cludict

Return type:

dict, cluster information of features as a dictionary