2. FEATURE ENGINEERING (HIGH IMPORTANCE - Second Priority) (4/5)
│ ├── Categorical Encoding (4/5)
│ │ ├── One-Hot Encoding (5/5)
│ │ │ ├── sklearn.preprocessing.OneHotEncoder
│ │ │ ├── pandas.get_dummies()
│ │ │ └── ✓ Best for nominal categories, avoids ordinal assumptions
│ │ ├── Label Encoding (4/5)
│ │ │ ├── sklearn.preprocessing.LabelEncoder
│ │ │ └── ✓ Suitable for ordinal categories with natural ordering
│ │ ├── Ordinal Encoding (4/5)
│ │ │ ├── sklearn.preprocessing.OrdinalEncoder
│ │ │ └── ✓ Preserves ordinal relationships between categories
│ │ ├── Target Encoding (3/5)
│ │ │ ├── category_encoders.TargetEncoder
│ │ │ ├── category_encoders.MEstimateEncoder
│ │ │ └── ✓ Uses target statistics, good for high-cardinality categories
│ │ ├── Binary Encoding (2/5)
│ │ │ ├── category_encoders.BinaryEncoder
│ │ │ └── ✓ Reduces dimensionality compared to one-hot
│ │ ├── Hash Encoding (2/5)
│ │ │ ├── category_encoders.HashingEncoder
│ │ │ └── ✓ Fixed-size output, handles unseen categories
│ │ └── Frequency Encoding (2/5)
│ │ ├── category_encoders.CountEncoder
│ │ └── ✓ Replaces categories with their occurrence frequency
│ │
│ ├── Numerical Transformations (4/5)
│ │ ├── Scaling & Normalization (5/5)
│ │ │ ├── StandardScaler (Z-score) (5/5)
│ │ │ │ ├── sklearn.preprocessing.StandardScaler
│ │ │ │ └── ✓ Mean=0, Std=1, best for normally distributed data
│ │ │ ├── MinMaxScaler (5/5)
│ │ │ │ ├── sklearn.preprocessing.MinMaxScaler
│ │ │ │ └── ✓ Scales to [0,1] range, preserves relationships
│ │ │ ├── RobustScaler (4/5)
│ │ │ │ ├── sklearn.preprocessing.RobustScaler
│ │ │ │ └── ✓ Uses median and IQR, robust to outliers
│ │ │ ├── MaxAbsScaler (3/5)
│ │ │ │ ├── sklearn.preprocessing.MaxAbsScaler
│ │ │ │ └── ✓ Scales by maximum absolute value, preserves sparsity
│ │ │ └── Normalizer (3/5)
│ │ │ ├── sklearn.preprocessing.Normalizer
│ │ │ └── ✓ Scales individual samples to unit norm
│ │ ├── Distribution Transformation (4/5)
│ │ │ ├── Log Transform (4/5)
│ │ │ │ ├── numpy.log1p()
│ │ │ │ ├── numpy.log()
│ │ │ │ └── ✓ Reduces right skewness, handles positive values
│ │ │ ├── Square Root Transform (3/5)
│ │ │ │ ├── numpy.sqrt()
│ │ │ │ └── ✓ Moderate skewness reduction
│ │ │ ├── Box-Cox Transform (3/5)
│ │ │ │ ├── scipy.stats.boxcox()
│ │ │ │ ├── sklearn.preprocessing.PowerTransformer(method='box-cox')
│ │ │ │ └── ✓ Optimal power transformation for positive data
│ │ │ └── Yeo-Johnson Transform (3/5)
│ │ │ ├── sklearn.preprocessing.PowerTransformer(method='yeo-johnson')
│ │ │ └── ✓ Handles both positive and negative values
│ │ └── Binning/Discretization (4/5)
│ │ ├── Equal-Width Binning (4/5)
│ │ │ ├── sklearn.preprocessing.KBinsDiscretizer(strategy='uniform')
│ │ │ ├── pandas.cut()
│ │ │ └── ✓ Equal-sized intervals, may have unequal frequencies
│ │ ├── Equal-Frequency Binning (4/5)
│ │ │ ├── sklearn.preprocessing.KBinsDiscretizer(strategy='quantile')
│ │ │ ├── pandas.qcut()
│ │ │ └── ✓ Equal sample sizes per bin
│ │ └── K-Means Binning (3/5)
│ │ ├── sklearn.preprocessing.KBinsDiscretizer(strategy='kmeans')
│ │ └── ✓ Clusters data points into bins using K-means
│ │
│ ├── Feature Creation (4/5)
│ │ ├── Polynomial Features (4/5)
│ │ │ ├── sklearn.preprocessing.PolynomialFeatures
│ │ │ └── ✓ Creates polynomial and interaction terms
│ │ ├── Interaction Features (4/5)
│ │ │ ├── sklearn.preprocessing.PolynomialFeatures(interaction_only=True)
│ │ │ └── ✓ Only interaction terms, no polynomial terms
│ │ ├── Domain-Specific Features (5/5)
│ │ │ ├── Date/Time Features (5/5)
│ │ │ │ ├── pandas.dt.year, pandas.dt.month, pandas.dt.dayofweek
│ │ │ │ ├── featuretools.primitives.TimeSeriesFeatures
│ │ │ │ └── ✓ Extract temporal patterns and cyclical features
│ │ │ ├── Text Features (TF-IDF, N-grams) (4/5)
│ │ │ │ ├── sklearn.feature_extraction.text.TfidfVectorizer
│ │ │ │ ├── sklearn.feature_extraction.text.CountVectorizer
│ │ │ │ └── ✓ Convert text to numerical features
│ │ │ └── Geospatial Features (3/5)
│ │ │ ├── geopy.distance
│ │ │ └── ✓ Distance calculations, coordinate transformations
│ │ └── Automated Feature Engineering (2/5)
│ │ ├── featuretools.dfs()
│ │ ├── tsfresh.extract_features()
│ │ └── ✓ Automatically generates features from relational data