qsarify.feature_selection_multi module
Multi-Processing Feature Selection Module
This module contains the functions for performing feature selection using the clustering module’s output as a guide for feature selection, and implements a genetic algorithm for feature selection using reflection.
- class qsarify.feature_selection_multi.Evolution(evolve)[source]
Bases:
object
Initializes the evolution class with the learning algorithm to be used
- qsarify.feature_selection_multi.selection(X_data, y_data, cluster_info, model='regression', learning=500000, bank=200, component=4, interval=1000, cores=95)[source]
Forward feature selection using cophenetically correlated data on mutliple cores
- Parameters:
X_data (pandas DataFrame , shape = (n_samples, n_features)) –
y_data (pandas DataFrame , shape = (n_samples,)) –
cluster_info (dictionary returned by clustering.featureCluster.set_cluster()) –
model (default="regression", otherwise "classification") –
learning (default=500000, number of overall models to be trained) –
bank (default=200, number of models to be trained in each iteration) –
component (default=4, number of features to be selected) –
interval (optional, default=1000, print current scoring and selected features) – every interval
cores (optional, default=(mp.cpu_count()*2)-1, number of processes to be used) – for multiprocessing; default is twice the number of cores minus 1, which is assuming you have SMT, HT, or something similar) If you have a large number of cores, you may want to set this to a lower number to avoid memory issues.
- Return type:
list, result of selected best feature set