qsarify.feature_selection_multi module

Multi-Processing Feature Selection Module

This module contains the functions for performing feature selection using the clustering module’s output as a guide for feature selection, and implements a genetic algorithm for feature selection using reflection.

class qsarify.feature_selection_multi.Evolution(evolve)[source]

Bases: object

Initializes the evolution class with the learning algorithm to be used

evolve(cluster_info, cluster, X_data, y_data, e_mlr)[source]
qsarify.feature_selection_multi.selection(X_data, y_data, cluster_info, model='regression', learning=500000, bank=200, component=4, interval=1000, cores=95)[source]

Forward feature selection using cophenetically correlated data on mutliple cores

Parameters:
  • X_data (pandas DataFrame , shape = (n_samples, n_features)) –

  • y_data (pandas DataFrame , shape = (n_samples,)) –

  • cluster_info (dictionary returned by clustering.featureCluster.set_cluster()) –

  • model (default="regression", otherwise "classification") –

  • learning (default=500000, number of overall models to be trained) –

  • bank (default=200, number of models to be trained in each iteration) –

  • component (default=4, number of features to be selected) –

  • interval (optional, default=1000, print current scoring and selected features) – every interval

  • cores (optional, default=(mp.cpu_count()*2)-1, number of processes to be used) – for multiprocessing; default is twice the number of cores minus 1, which is assuming you have SMT, HT, or something similar) If you have a large number of cores, you may want to set this to a lower number to avoid memory issues.

Return type:

list, result of selected best feature set