5. MODEL TRAINING & EVALUATION (CRITICAL - Fifth Priority) (5/5)
│ ├── 5.1 Model Training (5/5)
│ │ ├── Data Splitting (Train/Test/Validation) (5/5)
│ │ │ ├── sklearn.model_selection.train_test_split
│ │ │ └── ✓ Essential for unbiased evaluation of model performance
│ │ ├── Model Fitting (5/5)
│ │ │ ├── model.fit(X_train, y_train)
│ │ │ └── ✓ The core process of learning patterns from training data
│ │ └── Ensemble Methods (4/5)
│ │ ├── Bagging (e.g., Random Forest) (4/5)
│ │ │ ├── sklearn.ensemble.BaggingClassifier
│ │ │ └── ✓ Training multiple models independently and averaging predictions
│ │ ├── Boosting (e.g., AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost) (5/5)
│ │ │ ├── sklearn.ensemble
│ │ │ ├── xgboost
│ │ │ ├── lightgbm
│ │ │ ├── catboost
│ │ │ └── ✓ Sequentially building models to correct errors of previous models
│ │ └── Stacking (3/5)
│ │ ├── sklearn.ensemble.StackingClassifier
│ │ └── ✓ Training a meta-model on predictions of multiple base models
│ │
│ ├── 5.2 Model Evaluation (Metrics) (5/5)
│ │ ├── Classification Metrics (5/5)
│ │ │ ├── Accuracy (5/5)
│ │ │ ├── Precision, Recall, F1-Score (5/5)
│ │ │ │ ├── sklearn.metrics.accuracy_score
│ │ │ │ ├── sklearn.metrics.precision_score
│ │ │ │ ├── sklearn.metrics.recall_score
│ │ │ │ ├── sklearn.metrics.f1_score
│ │ │ │ └── ✓ For evaluating classification model performance
│ │ │ ├── ROC AUC (4/5)
│ │ │ │ ├── sklearn.metrics.roc_auc_score
│ │ │ │ └── ✓ Measures classifier's ability to distinguish between classes
│ │ │ └── Confusion Matrix (5/5)
│ │ │ ├── sklearn.metrics.confusion_matrix
│ │ │ └── ✓ Visualizes performance of a classification model
│ │ └── Regression Metrics (5/5)
│ │ ├── Mean Squared Error (MSE), Root Mean Squared Error (RMSE) (5/5)
│ │ │ ├── sklearn.metrics.mean_squared_error
│ │ │ └── ✓ Common metrics for regression, penalize larger errors more
│ │ ├── Mean Absolute Error (MAE) (4/5)
│ │ │ ├── sklearn.metrics.mean_absolute_error
│ │ │ └── ✓ Less sensitive to outliers than MSE
│ │ └── R-squared ($R^2$) (4/5)
│ │ ├── sklearn.metrics.r2_score
│ │ └── ✓ Proportion of variance in dependent variable predictable from independent variables
│ │
│ └── 5.3 Overfitting/Underfitting Diagnosis (5/5)
│ ├── Learning Curves (4/5)
│ │ ├── sklearn.model_selection.learning_curve
│ │ └── ✓ Visualizing model performance with increasing training data size
│ ├── Validation Curves (4/5)
│ │ ├── sklearn.model_selection.validation_curve
│ │ └── ✓ Visualizing model performance with varying hyperparameter values
│ └── Bias-Variance Trade-off (5/5)
│ ├── Conceptual understanding
│ └── ✓ Balancing model complexity to minimize generalization error