3. FEATURE SELECTION (MEDIUM-HIGH IMPORTANCE - Third Priority) (3/5)
│ ├── Filter Methods (Univariate) (4/5)
│ │ ├── Statistical Tests (4/5)
│ │ │ ├── Chi-Square Test (4/5)
│ │ │ │ ├── sklearn.feature_selection.chi2
│ │ │ │ ├── sklearn.feature_selection.SelectKBest(chi2)
│ │ │ │ └── ✓ For categorical features vs categorical target
│ │ │ ├── ANOVA F-Test (4/5)
│ │ │ │ ├── sklearn.feature_selection.f_classif
│ │ │ │ ├── sklearn.feature_selection.f_regression
│ │ │ │ └── ✓ For numerical features vs categorical/numerical target
│ │ │ ├── Mutual Information (4/5)
│ │ │ │ ├── sklearn.feature_selection.mutual_info_classif
│ │ │ │ ├── sklearn.feature_selection.mutual_info_regression
│ │ │ │ └── ✓ Captures non-linear relationships
│ │ │ └── Kendall's Tau (2/5)
│ │ │ ├── scipy.stats.kendalltau
│ │ │ └── ✓ Non-parametric correlation measure
│ │ ├── Correlation-Based (4/5)
│ │ │ ├── Pearson Correlation (4/5)
│ │ │ │ ├── pandas.DataFrame.corr()
│ │ │ │ ├── numpy.corrcoef()
│ │ │ │ ├── scipy.stats.pearsonr()
│ │ │ │ └── ✓ Linear relationships, normally distributed data
│ │ │ ├── Spearman Correlation (4/5)
│ │ │ │ ├── scipy.stats.spearmanr()
│ │ │ │ ├── pandas.DataFrame.corr(method='spearman')
│ │ │ │ └── ✓ Monotonic relationships, rank-based
│ │ │ └── Kendall Correlation (2/5)
│ │ │ ├── scipy.stats.kendalltau()
│ │ │ └── ✓ Robust to outliers, small sample sizes
│ │ └── Variance-Based (4/5)
│ │ ├── Low Variance Filter (4/5)
│ │ │ ├── sklearn.feature_selection.VarianceThreshold
│ │ │ └── ✓ Removes features with low variance (near-constant)
│ │ └── High Correlation Filter (4/5)
│ │ ├── Custom implementation with pandas.DataFrame.corr()
│ │ └── ✓ Removes highly correlated features (multicollinearity)
│ │
│ ├── Wrapper Methods (Model-Based) (3/5)
│ │ ├── Forward Selection (3/5)
│ │ │ ├── sklearn.feature_selection.SequentialFeatureSelector(direction='forward')
│ │ │ ├── mlxtend.feature_selection.SequentialFeatureSelector
│ │ │ └── ✓ Starts empty, adds features iteratively
│ │ ├── Backward Elimination (3/5)
│ │ │ ├── sklearn.feature_selection.SequentialFeatureSelector(direction='backward')
│ │ │ ├── mlxtend.feature_selection.SequentialFeatureSelector
│ │ │ └── ✓ Starts with all features, removes iteratively
│ │ ├── Recursive Feature Elimination (RFE) (4/5)
│ │ │ ├── sklearn.feature_selection.RFE
│ │ │ ├── sklearn.feature_selection.RFECV
│ │ │ └── ✓ Recursively eliminates least important features
│ │ └── Genetic Algorithms (2/5)
│ │ ├── sklearn-genetic-opt.GAFeatureSelectionCV
│ │ ├── DEAP
│ │ └── ✓ Evolutionary approach to feature selection
│ │
│ ├── Embedded Methods (Intrinsic) (4/5)
│ │ ├── Tree-Based Importance (5/5)
│ │ │ ├── Random Forest Importance (5/5)
│ │ │ │ ├── sklearn.ensemble.RandomForestClassifier.feature_importances_
│ │ │ │ ├── sklearn.ensemble.RandomForestRegressor.feature_importances_
│ │ │ │ └── ✓ Gini/entropy-based importance, handles interactions
│ │ │ ├── Extra Trees Importance (4/5)
│ │ │ │ ├── sklearn.ensemble.ExtraTreesClassifier.feature_importances_
│ │ │ │ └── ✓ More randomized than Random Forest
│ │ │ ├── XGBoost Importance (5/5)
│ │ │ │ ├── xgboost.XGBClassifier.feature_importances_
│ │ │ │ ├── xgboost.plot_importance()
│ │ │ │ └── ✓ Gain, weight, cover importance metrics
│ │ │ ├── LightGBM Importance (5/5)
│ │ │ │ ├── lightgbm.LGBMClassifier.feature_importances_
│ │ │ │ ├── lightgbm.plot_importance()
│ │ │ │ └── ✓ Split-based importance, fast training
│ │ │ └── CatBoost Importance (4/5)
│ │ │ ├── catboost.CatBoostClassifier.feature_importances_
│ │ │ └── ✓ Handles categorical features natively
│ │ └── Regularization-Based (4/5)
│ │ ├── L1 Regularization (Lasso) (5/5)
│ │ │ ├── sklearn.linear_model.Lasso
│ │ │ └── ✓ Drives coefficients to zero, performs feature selection
│ │ ├── L2 Regularization (Ridge) (4/5)
│ │ │ ├── sklearn.linear_model.Ridge
│ │ │ └── ✓ Shrinks coefficients, prevents overfitting, no explicit selection
│ │ └── Elastic Net (L1+L2) (4/5)
│ │ ├── sklearn.linear_model.ElasticNet
│ │ └── ✓ Combines L1 and L2, robust to correlated features
│ │
│ └── Hybrid Methods (3/5)
│ ├── SelectFromModel (3/5)
│ │ ├── sklearn.feature_selection.SelectFromModel
│ │ └── ✓ Uses model's feature importance/coefficients to select features
│ └── Boruta Algorithm (2/5)
│ ├── boruta_py.BorutaPy
│ └── ✓ All-relevant feature selection using Random Forest