🤖 Practical aspects of building and deploying machine learning systems

Based on Andrew Ng's book Machine Learning Yearning. Created by Vinoo Nedungadan

Is your model performing poorly?

📊 EVALUATE PERFORMANCE METRICS ├── Training Error High? │ ├── YES → HIGH BIAS PROBLEM │ │ ├── Solutions: │ │ │ ├── 🔧 Increase model complexity │ │ │ │ ├── Add more layers (neural networks) │ │ │ │ ├── Add more features │ │ │ │ └── Use more sophisticated architecture │ │ │ ├── 📉 Reduce regularization │ │ │ │ ├── Decrease L1/L2 penalty │ │ │ │ ├── Reduce dropout rate │ │ │ │ └── Increase learning rate │ │ │ └── ⏰ Train longer │ │ │ ├── More epochs │ │ │ └── Until convergence │ │ └── ⚠️ Risk: May increase variance │ │ │ └── NO → Check Dev/Test Error Gap │ ├── Dev Error >> Training Error? │ │ ├── YES → HIGH VARIANCE PROBLEM │ │ │ ├── Solutions: │ │ │ │ ├── 📈 Get more training data │ │ │ │ │ ├── Collect new samples │ │ │ │ │ ├── Data augmentation │ │ │ │ │ └── Synthetic data generation │ │ │ │ ├── 🔧 Reduce model complexity │ │ │ │ │ ├── Fewer layers/parameters │ │ │ │ │ ├── Feature selection │ │ │ │ │ └── Simpler architecture │ │ │ │ ├── 📈 Increase regularization │ │ │ │ │ ├── Higher L1/L2 penalty │ │ │ │ │ ├── Increase dropout │ │ │ │ │ └── Early stopping │ │ │ │ └── 🎲 Ensemble methods │ │ │ │ ├── Bagging │ │ │ │ ├── Random forests │ │ │ │ └── Model averaging │ │ │ └── ⚠️ Risk: May increase bias │ │ │ │ │ └── NO → Check Test Performance │ │ ├── Test Error >> Dev Error? │ │ │ ├── YES → DATA MISMATCH PROBLEM │ │ │ │ ├── Solutions: │ │ │ │ │ ├── 🔄 Analyze train vs dev/test distributions │ │ │ │ │ ├── 📊 Create train-dev set │ │ │ │ │ ├── 🎯 Make training data more representative │ │ │ │ │ ├── 🔧 Domain adaptation techniques │ │ │ │ │ └── 📈 Collect more dev-like training data │ │ │ │ └── 📋 Error Analysis Required │ │ │ │ │ │ │ └── NO → SATISFACTORY PERFORMANCE │ │ │ └── 🎉 Model is working well! │ │ │ │ │ └── Both errors acceptable but still not meeting targets? │ │ └── TARGETS TOO AMBITIOUS │ │ ├── 📊 Reassess human-level performance │ │ ├── 🎯 Adjust success metrics │ │ └── 🔍 Bayes error analysis

🔍Detailed Diagnostic Framework

1. Bias vs Variance Identification

🧪 DIAGNOSTIC TESTS ├── Compare Training vs Dev Error │ ├── Training Error: 8%, Dev Error: 10% → Acceptable variance │ ├── Training Error: 15%, Dev Error: 16% → High bias problem │ └── Training Error: 1%, Dev Error: 11% → High variance problem │ ├── Learning Curves Analysis │ ├── Training & dev error converge high → High bias │ ├── Large gap between training & dev → High variance │ └── Both decreasing with more data → Get more data │ └── Human-Level Performance Comparison ├── Model << Human performance → Focus on bias ├── Model ≈ Human performance → Focus on variance └── Model > Human performance → Check for overfitting

2. Error Analysis Methodology

🔎 ERROR ANALYSIS PROCESS ├── Manual Error Examination │ ├── 📋 Categorize error types │ ├── 📊 Count frequency of each error type │ ├── 🎯 Focus on most common errors first │ └── 💡 Generate improvement hypotheses │ ├── Ceiling Analysis │ ├── 🔧 Fix one component perfectly │ ├── 📈 Measure improvement │ ├── 🔄 Repeat for each component │ └── 📊 Prioritize based on potential gains │ └── Ablation Studies ├── ➖ Remove features/components ├── 📉 Measure performance drop ├── 🎯 Keep most impactful components └── 🗑️ Remove less useful parts

3. Data Strategy Decisions

📊 DATA MANAGEMENT ├── Train/Dev/Test Split Strategy │ ├── 🎯 Dev & test from same distribution │ ├── 📊 Size based on total data available │ │ ├── Small dataset: 60/20/20 │ │ ├── Medium dataset: 70/15/15 │ │ └── Large dataset: 98/1/1 │ └── 🔄 Shuffle before splitting │ ├── Data Mismatch Solutions │ ├── 📊 Create train-dev set │ ├── 🔍 Identify distribution differences │ ├── 🎯 Make training data more representative │ └── 📈 Synthesize training data to match test │ └── When to Get More Data ├── ✅ High variance problem identified ├── ✅ Learning curves show benefit ├── ✅ Cost-effective to collect └── ❌ High bias problem (won't help much)

4. Model Optimization Strategies

⚙️ OPTIMIZATION APPROACHES ├── Architecture Changes │ ├── 🧠 For High Bias: │ │ ├── Deeper networks │ │ ├── Wider networks │ │ ├── More complex models │ │ └── Advanced architectures │ │ │ └── 🎯 For High Variance: │ ├── Simpler models │ ├── Fewer parameters │ ├── Regularized architectures │ └── Ensemble methods │ ├── Hyperparameter Tuning │ ├── 🔍 Grid/Random search │ ├── 📊 Bayesian optimization │ ├── 🎯 Focus on most impactful parameters │ └── 📈 Use dev set for validation │ └── Advanced Techniques ├── 🔄 Transfer learning ├── 📈 Multi-task learning ├── 🎯 End-to-end vs pipeline └── 🧠 Attention mechanisms

🎓Key Principles from Machine Learning Yearning

The ML Strategy Loop

1 📊 Define metrics → What success looks like

2 🔍 Diagnose problems → Bias/variance/data mismatch

3 🛠️ Apply solutions → Based on diagnosis

4 📈 Measure improvement → Validate on dev set

5 🔄 Iterate → Repeat until satisfactory

Critical Insights

🎯 Single Number Evaluation Metric

Essential for team alignment and decision making

📊 Representative Dev/Test Sets

Should reflect real-world distribution you care about

🔍 Error Analysis First

Systematic analysis beats intuition for improvements

📈 Human-Level Performance

Provides crucial bias/variance intuition

⚡ Rapid Iteration

Quick cycles better than slow perfection

📊 Satisficing vs Optimizing

For multi-objective problems, optimize one, satisfice others

Common Pitfalls to Avoid

❌ Optimizing the wrong metric - Ensure your metric aligns with business goals

❌ Mismatched data distributions - Train/dev/test should come from same distribution

❌ Dev set too small - Need sufficient size to detect meaningful improvements

❌ Changing dev/test during development - Stick to original sets for consistency

❌ Assuming more data always helps - Only helps with high variance problems

❌ Skipping error analysis - Always analyze errors before optimizing