5. MODEL TRAINING & EVALUATION (CRITICAL - Fifth Priority) (5/5) │ ├── 5.1 Model Training (5/5) │ │ ├── Data Splitting (Train/Test/Validation) (5/5) │ │ │ ├── sklearn.model_selection.train_test_split │ │ │ └── ✓ Essential for unbiased evaluation of model performance │ │ ├── Model Fitting (5/5) │ │ │ ├── model.fit(X_train, y_train) │ │ │ └── ✓ The core process of learning patterns from training data │ │ └── Ensemble Methods (4/5) │ │ ├── Bagging (e.g., Random Forest) (4/5) │ │ │ ├── sklearn.ensemble.BaggingClassifier │ │ │ └── ✓ Training multiple models independently and averaging predictions │ │ ├── Boosting (e.g., AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost) (5/5) │ │ │ ├── sklearn.ensemble │ │ │ ├── xgboost │ │ │ ├── lightgbm │ │ │ ├── catboost │ │ │ └── ✓ Sequentially building models to correct errors of previous models │ │ └── Stacking (3/5) │ │ ├── sklearn.ensemble.StackingClassifier │ │ └── ✓ Training a meta-model on predictions of multiple base models │ │ │ ├── 5.2 Model Evaluation (Metrics) (5/5) │ │ ├── Classification Metrics (5/5) │ │ │ ├── Accuracy (5/5) │ │ │ ├── Precision, Recall, F1-Score (5/5) │ │ │ │ ├── sklearn.metrics.accuracy_score │ │ │ │ ├── sklearn.metrics.precision_score │ │ │ │ ├── sklearn.metrics.recall_score │ │ │ │ ├── sklearn.metrics.f1_score │ │ │ │ └── ✓ For evaluating classification model performance │ │ │ ├── ROC AUC (4/5) │ │ │ │ ├── sklearn.metrics.roc_auc_score │ │ │ │ └── ✓ Measures classifier's ability to distinguish between classes │ │ │ └── Confusion Matrix (5/5) │ │ │ ├── sklearn.metrics.confusion_matrix │ │ │ └── ✓ Visualizes performance of a classification model │ │ └── Regression Metrics (5/5) │ │ ├── Mean Squared Error (MSE), Root Mean Squared Error (RMSE) (5/5) │ │ │ ├── sklearn.metrics.mean_squared_error │ │ │ └── ✓ Common metrics for regression, penalize larger errors more │ │ ├── Mean Absolute Error (MAE) (4/5) │ │ │ ├── sklearn.metrics.mean_absolute_error │ │ │ └── ✓ Less sensitive to outliers than MSE │ │ └── R-squared ($R^2$) (4/5) │ │ ├── sklearn.metrics.r2_score │ │ └── ✓ Proportion of variance in dependent variable predictable from independent variables │ │ │ └── 5.3 Overfitting/Underfitting Diagnosis (5/5) │ ├── Learning Curves (4/5) │ │ ├── sklearn.model_selection.learning_curve │ │ └── ✓ Visualizing model performance with increasing training data size │ ├── Validation Curves (4/5) │ │ ├── sklearn.model_selection.validation_curve │ │ └── ✓ Visualizing model performance with varying hyperparameter values │ └── Bias-Variance Trade-off (5/5) │ ├── Conceptual understanding │ └── ✓ Balancing model complexity to minimize generalization error
← Back to Main Pipeline