Machine Learning Cheat Sheet

START
START
>50
samples
>50...
get
more
data
get...
NO
NO
predicting a
category
predicting...
YES
YES
do you have
labeled
data
do you hav...
YES
YES
predicting a
quantity
predicting...
NO
NO
just
looking
just...
NO
NO
predicting
structure
predicting...
NO
NO
tough
luck
tough...
<100K
samples
<100K...
YES
YES
SGD
Classifier
SGD...
NO
NO
Linear
SVC
Linear...
YES
YES
text
data
text...
😭
😭
Kernel
Approximation
Kernel...
😭
😭
KNeighbors
Classifier
KNeighbors...
NO
NO
SVC
SVC
Ensemble
Classifiers
Ensemble...
😭
😭
Naive
Bayes
Naive...
YES
YES
classification
classification
number of
categories
known
number of...
NO
NO
<10K
samples
<10K...
<10K
samples
<10K...
NO
NO
NO
NO
YES
YES
MeanShift
MeanShift
VBGMM
VBGMM
YES
YES
MiniBatch
KMeans
MiniBatch...
NO
NO
clustering
clustering
KMeans
KMeans
YES
YES
Spectral
Clustering
Spectral...
GMM
GMM
😭
😭
<100K
samples
<100K...
YES
YES
few features
should be
important
few features...
YES
YES
SGD
Regressor
SGD...
NO
NO
Lasso
Lasso
ElasticNet
ElasticNet
YES
YES
RidgeRegression
RidgeRegression
SVR(kernel="linear")
SVR(kernel="linea...
NO
NO
SVR(kernel="rbf")
SVR(kernel="rbf...
Ensemble
Regressors
Ensemble...
😭
😭
regression
regression
Ramdomized
PCA
Ramdomized...
YES
YES
<10K
samples
<10K...
😭
😭
Kernel
Approximation
Kernel...
NO
NO
IsoMap
IsoMap
Spectral
Embedding
Spectral...
YES
YES
LLE
LLE
😭
😭
dimensionality
reduction
dimensionality...
scikit-learn
algorithm cheat sheet
scikit-learn...
Text is not SVG - cannot display

Supervised Learning

Supervised learning models are models that map inputs to outputs, and attempt to extrapolate patterns learned in past data on unseen data. Supervised learning models can be either regression models, where we try to predict a continuous variable, like stock prices—or classification models, where we try to predict a binary or multi-class variable, like whether a customer will churn or not. In the section below, we'll explain two popular types of supervised learning models: linear models, and tree-based models.

Linear Models

In a nutshell, linear models create a best-fit line to predict unseen data. Linear models imply that outputs are a linear combination of features. In this section, we'll specify commonly used linear models in machine learning, their advantages, and disadvantages.

Algorithm Description Applications Advantages Disadvantages
Linear Regression A simple algorithm that models a linear relationship between inputs and a continuous numerical output variable 1. Stock Price Prediction
2. Predicting housing prices
3. Predicting customer lifetime value
1. Explainable method
2. Interpretable results by its output coefficient
3. Faster to train than other machine learning models
1. Assumes linearity between inputs and output
2. Sensitive to outliers
3. Can underfit with small, high-dimensional data
Logistic Regression A simple algorithm that models a linear relationship between inputs and a categorical output (1 or 0) 1. Predicting credit risk score
2. Customer churn prediction
1. Interpretable and explainable
2. Less prone to overfitting when using regularization
3. Applicable for multi-class predictions
1. Assumes linearity between inputs and outputs
2. Can overfit with small, high-dimensional data
Ridge Regression Part of the regression family — it penalizes features that have low predictive outcomes by shrinking their coefficients closer to zero. Can be used for classification or regression 1. Predictive maintenance for automobiles
2. Sales revenue prediction
1. Less prone to overfitting
2. Best suited where data suffer from multicollinearity
3. Explainable & interpretable
1. All the predictors are kept in the final model
2. Doesn't perform feature selection
Lasso Regression Part of the regression family — it penalizes features that have low predictive outcomes by shrinking their coefficients to zero. Can be used for classification or regression 1. Predicting housing prices
2. Predicting clinical outcomes based on health data
1. Less prone to overfitting
2. Can handle high-dimensional data
3. No need for feature selection
1. Can lead to poor interpretability as it can keep highly correlated variables

Pasted image 20241123174020.png

Tree-based models

In a nutshell, tree-based models use a series of "if-then" rules to predict from decision trees. In this section, we'll specify commonly used linear models in machine learning, their advantages, and disadvantages.

Algorithm Description Applications Advantages Disadvantages
Decision Tree Decision Tree models make decision rules on the features to produce predictions. It can be used for classification or regression 1. Customer churn prediction
2. Credit score modeling
3. Disease prediction
1. Explainable and interpretable
2. Can handle missing values
1. Prone to overfitting
2. Sensitive to outliers
Random Forests An ensemble learning method that combines the output of multiple decision trees 1. Credit score modeling
2. Predicting housing prices
1. Reduces overfitting
2. Higher accuracy compared to other models
1. Training complexity can be high
2. Not very interpretable
Gradient Boosting Regression Gradient Boosting Regression employs boosting to make predictive models from an ensemble of weak predictive learners 1. Predicting car emissions
2. Predicting ride-hailing fare amount
1. Better accuracy compared to other regression models
2. It can handle multicollinearity
It can handle non-linear relationships
1. Sensitive to outliers and can therefore cause overfitting
2. Computationally expensive and has high complexity
XGBoost Gradient Boosting algorithm that is efficient & flexible. Can be used for both classification and regression tasks 1. Churn prediction
2. Claims processing in insurance
1. Provides accurate results
2. Captures non-linear relationships
1. Hyperparameter tuning can be complex
2. Does not perform well on sparse datasets
LightGBM Regressor A gradient boosting framework that is designed to be more efficient than other implementations 1. Predicting flight time for airlines
2. Predicting cholesterol levels based on health data
1. Can handle large amounts of data
2. Computational efficient & fast training speed
3. Low memory usage
1. Can overfit due to leaf-wise splitting and high sensitivity
2. Hyperparameter tuning can be complex

Unsupervised Learning

Unsupervised learning is about discovering general patterns in data. The most popular example is clustering or segmenting customers and users. This type of segmentation is generalizable and can be applied broadly, such as to documents, companies, and genes. Unsupervised learning consists of clustering models, that learn how to group similar data points together, or association algorithms, that group different data points based on pre-defined rules.

Clustering models

Algorithm Description Applications Advantages Disadvantages
K-Means K-Means is the most widely used clustering approach—it determines K clusters based on euclidean distances 1. Customer segmentation
2. Recommendation systems
1. Scales to large datasets
2. Simple to implement and interpret
3. Results in tight clusters
1. Requires the expected number of clusters from the beginning
2. Has troubles with varying cluster sizes and densities
Hierarchical Clustering A "bottom-up" approach where each data point is treated as its own cluster—and then the closest two clusters are merged together iteratively 1. Fraud detection
2. Document clustering based on similarity
1. There is no need to specify the number
of clusters
2. The resulting dendrogram is informative
1. Doesn’t always result in the best clustering
2. Not suitable for large datasets due to high complexity
Gaussian Mixture Models A probabilistic model for modeling normally distributed clusters within a dataset 1. Customer segmentation
2. Recommendation systems
1. Computes a probability for an observation belonging to a cluster
2. Can identify overlapping clusters
3. More accurate results compared to K-means
1. Requires complex tuning
2. Requires setting the number of expected mixture components or clusters