qsarify.clustering module
Clustering Module
This module contains functions for clustering features based on hierarchical clustering method and calculating the cophenetic correlation coefficient of linkages. The cophenetic correlation coefficient is a measure of the correlation between the distance of observations in feature space and the distance of observations in cluster space. The cophenetic correlation coefficient is calculated for each linkage method and the method with the highest cophenetic correlation coefficient is used to cluster the features. The cophenetic correlation coefficient is calculated using the scipy.cluster.hierarchy.cophenet function.
- qsarify.clustering.cophenetic(X_data)[source]
Calculate the cophenetic correlation coefficient of linkages
- Parameters:
X_data (pandas DataFrame, shape = (n_samples, m_features)) –
method (str, method for linkage generation, default = 'corr' (Pearson correlation)) –
- Return type:
None
- class qsarify.clustering.featureCluster(X_data, method='corr', link='average', cut_d=3)[source]
Bases:
object
Make cluster of features based on hierarchical clustering method
- Parameters:
X_data (pandas DataFrame, shape = (n_samples, n_features)) –
link (str, kind of linkage method, default = 'average', 'complete', 'single') –
cut_d (int, depth in cluster(dendrogram), default = 3) –
functions (Sub) –
------------- –
set_cluster(self) –
cluster_dist(self) –
- set_cluster(verbose=False, graph=False)[source]
Make cluster of features based on hierarchical clustering method
- Parameters:
verbose (bool, print cluster information, default = False) –
graph (bool, show dendrogram, default = False) –
- Returns:
cludict
- Return type:
dict, cluster information of features as a dictionary