3. FEATURE SELECTION (MEDIUM-HIGH IMPORTANCE - Third Priority) (3/5) │ ├── Filter Methods (Univariate) (4/5) │ │ ├── Statistical Tests (4/5) │ │ │ ├── Chi-Square Test (4/5) │ │ │ │ ├── sklearn.feature_selection.chi2 │ │ │ │ ├── sklearn.feature_selection.SelectKBest(chi2) │ │ │ │ └── ✓ For categorical features vs categorical target │ │ │ ├── ANOVA F-Test (4/5) │ │ │ │ ├── sklearn.feature_selection.f_classif │ │ │ │ ├── sklearn.feature_selection.f_regression │ │ │ │ └── ✓ For numerical features vs categorical/numerical target │ │ │ ├── Mutual Information (4/5) │ │ │ │ ├── sklearn.feature_selection.mutual_info_classif │ │ │ │ ├── sklearn.feature_selection.mutual_info_regression │ │ │ │ └── ✓ Captures non-linear relationships │ │ │ └── Kendall's Tau (2/5) │ │ │ ├── scipy.stats.kendalltau │ │ │ └── ✓ Non-parametric correlation measure │ │ ├── Correlation-Based (4/5) │ │ │ ├── Pearson Correlation (4/5) │ │ │ │ ├── pandas.DataFrame.corr() │ │ │ │ ├── numpy.corrcoef() │ │ │ │ ├── scipy.stats.pearsonr() │ │ │ │ └── ✓ Linear relationships, normally distributed data │ │ │ ├── Spearman Correlation (4/5) │ │ │ │ ├── scipy.stats.spearmanr() │ │ │ │ ├── pandas.DataFrame.corr(method='spearman') │ │ │ │ └── ✓ Monotonic relationships, rank-based │ │ │ └── Kendall Correlation (2/5) │ │ │ ├── scipy.stats.kendalltau() │ │ │ └── ✓ Robust to outliers, small sample sizes │ │ └── Variance-Based (4/5) │ │ ├── Low Variance Filter (4/5) │ │ │ ├── sklearn.feature_selection.VarianceThreshold │ │ │ └── ✓ Removes features with low variance (near-constant) │ │ └── High Correlation Filter (4/5) │ │ ├── Custom implementation with pandas.DataFrame.corr() │ │ └── ✓ Removes highly correlated features (multicollinearity) │ │ │ ├── Wrapper Methods (Model-Based) (3/5) │ │ ├── Forward Selection (3/5) │ │ │ ├── sklearn.feature_selection.SequentialFeatureSelector(direction='forward') │ │ │ ├── mlxtend.feature_selection.SequentialFeatureSelector │ │ │ └── ✓ Starts empty, adds features iteratively │ │ ├── Backward Elimination (3/5) │ │ │ ├── sklearn.feature_selection.SequentialFeatureSelector(direction='backward') │ │ │ ├── mlxtend.feature_selection.SequentialFeatureSelector │ │ │ └── ✓ Starts with all features, removes iteratively │ │ ├── Recursive Feature Elimination (RFE) (4/5) │ │ │ ├── sklearn.feature_selection.RFE │ │ │ ├── sklearn.feature_selection.RFECV │ │ │ └── ✓ Recursively eliminates least important features │ │ └── Genetic Algorithms (2/5) │ │ ├── sklearn-genetic-opt.GAFeatureSelectionCV │ │ ├── DEAP │ │ └── ✓ Evolutionary approach to feature selection │ │ │ ├── Embedded Methods (Intrinsic) (4/5) │ │ ├── Tree-Based Importance (5/5) │ │ │ ├── Random Forest Importance (5/5) │ │ │ │ ├── sklearn.ensemble.RandomForestClassifier.feature_importances_ │ │ │ │ ├── sklearn.ensemble.RandomForestRegressor.feature_importances_ │ │ │ │ └── ✓ Gini/entropy-based importance, handles interactions │ │ │ ├── Extra Trees Importance (4/5) │ │ │ │ ├── sklearn.ensemble.ExtraTreesClassifier.feature_importances_ │ │ │ │ └── ✓ More randomized than Random Forest │ │ │ ├── XGBoost Importance (5/5) │ │ │ │ ├── xgboost.XGBClassifier.feature_importances_ │ │ │ │ ├── xgboost.plot_importance() │ │ │ │ └── ✓ Gain, weight, cover importance metrics │ │ │ ├── LightGBM Importance (5/5) │ │ │ │ ├── lightgbm.LGBMClassifier.feature_importances_ │ │ │ │ ├── lightgbm.plot_importance() │ │ │ │ └── ✓ Split-based importance, fast training │ │ │ └── CatBoost Importance (4/5) │ │ │ ├── catboost.CatBoostClassifier.feature_importances_ │ │ │ └── ✓ Handles categorical features natively │ │ └── Regularization-Based (4/5) │ │ ├── L1 Regularization (Lasso) (5/5) │ │ │ ├── sklearn.linear_model.Lasso │ │ │ └── ✓ Drives coefficients to zero, performs feature selection │ │ ├── L2 Regularization (Ridge) (4/5) │ │ │ ├── sklearn.linear_model.Ridge │ │ │ └── ✓ Shrinks coefficients, prevents overfitting, no explicit selection │ │ └── Elastic Net (L1+L2) (4/5) │ │ ├── sklearn.linear_model.ElasticNet │ │ └── ✓ Combines L1 and L2, robust to correlated features │ │ │ └── Hybrid Methods (3/5) │ ├── SelectFromModel (3/5) │ │ ├── sklearn.feature_selection.SelectFromModel │ │ └── ✓ Uses model's feature importance/coefficients to select features │ └── Boruta Algorithm (2/5) │ ├── boruta_py.BorutaPy │ └── ✓ All-relevant feature selection using Random Forest
← Back to Main Pipeline