Machine Learning is the engine behind modern AI — from spam filters to self-driving cars to large language models. This page covers the full ML landscape: mathematical foundations, classical algorithms, deep learning, NLP, computer vision, reinforcement learning, and production deployment.
For data wrangling and EDA see Data Science. For game-specific AI see Game AI. For Python libraries see PyTorch, Scikit Learn, NumPy, Pandas.
History & The AI Landscape
How Machine Learning Was Born
The seeds of machine learning were planted in 1943 when Warren McCulloch and Walter Pitts modeled the first mathematical neuron. In 1950, Alan Turing asked “Can machines think?” and proposed the Turing Test. The field was formally founded at the Dartmouth Conference (1956) where John McCarthy coined the term “Artificial Intelligence.”
Early AI was rule-based (expert systems) — humans encoded knowledge as if-then rules. This worked in narrow domains but collapsed under real-world complexity. You can’t write rules for every way a face can look.
The shift came with machine learning — instead of programming rules, you feed algorithms data and let them discover the rules themselves. Frank Rosenblatt’s Perceptron (1957) was the first learning algorithm. Backpropagation was developed in the 1970s–80s and enabled training multi-layer networks.
Two AI winters (periods of lost funding and interest) followed unrealistic expectations. The modern renaissance began with ImageNet (2012), when AlexNet — a deep convolutional network trained on GPUs — halved the image classification error rate overnight. Every major AI breakthrough since traces back to that moment.
2017: The Transformer (Attention Is All You Need, Vaswani et al.) changed NLP forever. 2018: BERT, GPT-1. 2020: GPT-3. 2022: ChatGPT. 2023: GPT-4, Claude, Gemini. The large language model era had arrived.
The AI Hierarchy
graph TD
AI["🤖 Artificial Intelligence\nMachines simulating human intelligence"]
ML["📊 Machine Learning\nLearning from data"]
DL["🧠 Deep Learning\nMulti-layer neural networks"]
GenAI["✨ Generative AI\nCreate text, images, code, audio"]
LLM["💬 Large Language Models\nGPT, Claude, Gemini, Llama"]
CV["👁️ Computer Vision\nCNN, YOLO, SAM, CLIP"]
RL["🎮 Reinforcement Learning\nTrial and error from rewards"]
AI --> ML --> DL --> GenAI
DL --> LLM
DL --> CV
ML --> RL
Branch
What It Does
Key Models
Used In
Supervised Learning
Learn from labeled examples
Random Forest, XGBoost, SVM
Classification, regression
Unsupervised Learning
Find hidden structure
K-Means, PCA, VAE
Clustering, anomaly detection
Deep Learning
Learn via neural networks
CNN, RNN, Transformer
Vision, NLP, speech
Generative AI
Create new content
GPT, Stable Diffusion, DALL-E
Text, images, code
Reinforcement Learning
Learn from rewards
PPO, DQN, AlphaZero
Games, robotics, trading
Computer Vision
Understand images/video
ResNet, YOLO, SAM
Detection, segmentation
NLP
Understand language
BERT, GPT, T5
Chatbots, translation, summarization
AI Timeline
timeline
title Milestones in AI & Machine Learning
1950 : Turing Test proposed
1956 : Dartmouth Conference — AI coined
1957 : Perceptron — first learning algorithm
1986 : Backpropagation popularized
1997 : Deep Blue beats Kasparov at chess
1998 : LeNet-5 — CNN for digit recognition
2006 : Hinton coins "Deep Learning"
2012 : AlexNet — ImageNet breakthrough
2014 : GANs invented by Goodfellow
2017 : Transformer — Attention Is All You Need
2018 : BERT and GPT-1
2020 : GPT-3 — 175B parameters
2021 : DALL-E — text to image
2022 : ChatGPT — 100M users in 2 months
2023 : GPT-4, Claude, Llama, Mistral, Gemini
2024 : Multimodal AI, AI agents, reasoning models
Introduction
Machine Learning is the practice of building systems that improve automatically through experience. Instead of explicitly programming every decision, you show the system examples and let it learn the pattern — then apply that pattern to new data it has never seen.
ML Knowledge Map
mindmap
root((Machine Learning))
Mathematical Foundations
Linear Algebra
Calculus Gradients
Probability Statistics
Information Theory
Classical ML
Supervised Learning
Unsupervised Learning
Ensemble Methods
Feature Engineering
Deep Learning
Neural Networks
CNNs Vision
RNNs LSTMs
Transformers
Attention Mechanism
Generative AI
Large Language Models
Diffusion Models
GANs VAEs
Prompt Engineering
RAG Systems
Specialized
Computer Vision
NLP
Reinforcement Learning
Time Series
Anomaly Detection
Production MLOps
Training Pipelines
Model Serving
Monitoring Drift
Experiment Tracking
Mathematical Foundations
Linear Algebra — The Language of Data
Data in ML is always represented as vectors and matrices. An image is a matrix of pixel values. A sentence is a matrix of token embeddings. A batch of training examples is a 3D tensor.
Core concepts:
─────────────────────────────────────────────────────────────
Vector: x = [x₁, x₂, ..., xₙ] (1D array — one sample's features)
Matrix: X ∈ ℝ^(m×n) (m samples, n features)
Tensor: 3D+ array (batch × height × width × channels)
Dot product: W·x = Σ wᵢxᵢ (core operation in every neural layer)
Matrix multiply: Y = XW (linear transformation)
Transpose: Xᵀ
Eigenvalues: Used in PCA for finding principal components
SVD: Matrix factorization for recommendations, compression
Linear algebra in NumPy — the foundation
import numpy as np# Data matrix: 100 samples, 5 featuresX = np.random.randn(100, 5)# Weight vector: 5 features → 1 outputW = np.random.randn(5)b = 0.0# Forward pass: linear prediction (dot product + bias)y_pred = X @ W + b # (100, 5) @ (5,) → (100,)# Covariance matrix (used in PCA)cov = np.cov(X.T) # (5, 5) — how features vary together# Eigendecomposition — principal componentseigenvalues, eigenvectors = np.linalg.eig(cov)
NumPy is the Python library that makes linear algebra fast and readable. It’s the foundation layer that PyTorch, Scikit Learn, and Pandas all build on.
Calculus — How Models Learn
Neural networks learn by computing the gradient of the loss function with respect to every parameter — how much does changing this weight improve the prediction? Then they nudge all weights in the direction that reduces loss.
This is gradient descent — the core optimization algorithm in all of machine learning.
Gradient Descent:
─────────────────────────────────────────────────────────────
Loss L = function of predictions vs ground truth
Gradient ∂L/∂w = how much L changes when w changes
Update: w ← w - α · (∂L/∂w)
α = learning rate (step size)
Variants:
SGD — update per single sample (noisy but fast)
Mini-batch SGD — update per batch of 32/64/128 samples (standard)
Adam — adaptive learning rates per parameter (default choice)
AdaGrad — accumulates gradient magnitudes
RMSProp — exponential moving average of squared gradients
Manual gradient descent — the concept
# Simple linear regression: y = wx + bw, b = 0.0, 0.0alpha = 0.01 # learning rateepochs = 100for epoch in range(epochs): y_pred = w * X + b # forward pass loss = ((y_pred - y) ** 2).mean() # MSE loss # Gradients (calculus / chain rule) dL_dw = (2 * (y_pred - y) * X).mean() dL_db = (2 * (y_pred - y)).mean() # Update parameters w -= alpha * dL_dw b -= alpha * dL_db if epoch % 10 == 0: print(f"Epoch {epoch:3d} | Loss: {loss:.4f}")
Probability & Statistics — Uncertainty in ML
ML models are probabilistic — they output confidence scores, not certainties. Understanding probability is what separates an engineer who tunes parameters from one who understands why a model behaves the way it does.
Concept
Role in ML
Probability distributions
Model outputs, label noise, data generation
Bayes’ Theorem
Foundation of Naive Bayes, Bayesian neural nets
MLE (Max Likelihood Estimation)
Training objective for most supervised models
KL Divergence
Loss function in VAEs, knowledge distillation
Entropy
Decision tree splitting criterion (information gain)
Cross-entropy
Classification loss function — most common in practice
Central Limit Theorem
Why training on batches works
Hypothesis testing
Evaluating if model A is truly better than model B
The ML Pipeline — End to End
Full Workflow
graph LR
DC["📥 Data Collection\nAPIs · Scraping\nDatabases · Sensors"]
DW["🧹 Data Wrangling\nMissing values\nOutliers · Types"]
EDA["🔍 EDA\nDistributions\nCorrelations\nVisualization"]
FE["⚙️ Feature Engineering\nScaling · Encoding\nSelection · Creation"]
MT["🏋️ Model Training\nAlgorithm selection\nTrain / Val / Test split"]
ME["📊 Model Evaluation\nMetrics · Confusion matrix\nROC-AUC · Cross-val"]
HT["🔧 Hyperparameter Tuning\nGrid · Random · Bayesian search"]
DEP["🚀 Deployment\nAPI · Batch · Edge"]
MON["📡 Monitoring\nDrift detection\nModel refresh"]
DC --> DW --> EDA --> FE --> MT --> ME --> HT --> DEP --> MON --> MT
Data Preprocessing
Complete preprocessing pipeline with sklearn
import numpy as npimport pandas as pdfrom sklearn.pipeline import Pipelinefrom sklearn.compose import ColumnTransformerfrom sklearn.preprocessing import ( StandardScaler, MinMaxScaler, RobustScaler, OneHotEncoder, OrdinalEncoder, LabelEncoder)from sklearn.impute import SimpleImputer, KNNImputerfrom sklearn.model_selection import train_test_splitdf = pd.read_csv("data.csv")X, y = df.drop("target", axis=1), df["target"]# Split first — NEVER fit scaler/encoder on full datasetX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y)# Numeric features: impute missing → scalenumeric_features = X.select_dtypes(include=[np.number]).columns.tolist()numeric_transformer = Pipeline([ ("imputer", KNNImputer(n_neighbors=5)), # smarter than mean fill ("scaler", StandardScaler()), # zero mean, unit variance])# Categorical features: impute → encodecategorical_features = X.select_dtypes(include=["object", "category"]).columns.tolist()categorical_transformer = Pipeline([ ("imputer", SimpleImputer(strategy="most_frequent")), ("encoder", OneHotEncoder(handle_unknown="ignore", sparse_output=False)),])# Combine into one transformerpreprocessor = ColumnTransformer([ ("num", numeric_transformer, numeric_features), ("cat", categorical_transformer, categorical_features),])# The preprocessor fits on train data onlyX_train_processed = preprocessor.fit_transform(X_train)X_test_processed = preprocessor.transform(X_test) # transform only — no fit
Pandas and NumPy handle the data manipulation layer. Scikit Learn provides the preprocessing transformers. Data Science covers EDA and visualization in depth.
Supervised Learning
Regression — Predicting Continuous Values
Regression maps input features to a continuous output: house price from location and size, stock return from market indicators, temperature from humidity and pressure.
Classification metric guide:
─────────────────────────────────────────────────────────────
Accuracy = (TP+TN)/total — misleading on imbalanced data
Precision = TP/(TP+FP) — "when I say positive, am I right?"
Recall = TP/(TP+FN) — "did I catch all the positives?"
F1 = harmonic mean(P, R) — balance between precision and recall
ROC-AUC = area under ROC curve — threshold-independent, gold standard
PR-AUC = precision-recall area — better for very imbalanced data
Rule of thumb:
Spam filter → maximize Precision (false positives hurt user experience)
Cancer screen → maximize Recall (false negatives are deadly)
Fraud detect → PR-AUC (very imbalanced positive class)
A neural network is a stack of linear transformations interleaved with non-linear activations. Each layer learns to detect increasingly abstract features: edges → shapes → faces → identities in a vision model; characters → words → phrases → meaning in a language model.
Neural Network Math:
─────────────────────────────────────────────────────────────
Layer: h = activation(W·x + b)
↑
learned matrix (weights)
Forward pass: input → layer₁ → layer₂ → ... → output → loss
Backward pass: loss → ∂loss/∂W_n → ∂loss/∂W_(n-1) → ... (backprop)
Update: W ← W - α · ∂loss/∂W (gradient descent)
PyTorch has a dedicated page covering the full API, custom datasets, distributed training, and ONNX export. For the math underneath see the Mathematical Foundations section above.
Activation Functions
Function
Formula
Range
Use Case
Problem
Sigmoid
1/(1+e^-x)
(0,1)
Binary output layer
Vanishing gradients
Tanh
(e^x-e^-x)/(e^x+e^-x)
(-1,1)
RNN gates
Vanishing gradients
ReLU
max(0, x)
[0,∞)
Hidden layers (default)
Dying ReLU (x<0)
Leaky ReLU
max(0.01x, x)
(-∞,∞)
When ReLU dies
Small negative slope
GELU
x·Φ(x)
(-∞,∞)
Transformers (BERT, GPT)
Computationally heavier
Softmax
e^xᵢ/Σe^xⱼ
(0,1) sum=1
Multi-class output
Numerically unstable
Regularization Techniques
Regularization patterns in PyTorch
# ── Dropout — randomly zero neurons during training ────────────────# Forces the network to learn redundant representationsnn.Dropout(p=0.5) # 50% of neurons zeroed each forward pass# ── Batch Normalization — normalize activations per batch ──────────# Reduces internal covariate shift, allows higher learning ratesnn.BatchNorm1d(num_features) # after Linear for tabularnn.BatchNorm2d(num_channels) # after Conv2d for images# ── Weight decay (L2 in optimizer) ────────────────────────────────optim.AdamW(model.parameters(), weight_decay=1e-4)# ── Early stopping — stop when validation loss stops improving ─────class EarlyStopping: def __init__(self, patience=10): self.patience = patience self.counter = 0 self.best = float("inf") def __call__(self, val_loss) -> bool: if val_loss < self.best - 1e-4: self.best = val_loss; self.counter = 0 else: self.counter += 1 return self.counter >= self.patience # True = stop training
Convolutional Neural Networks (CNNs)
How CNNs Work
CNNs learn to detect spatial patterns in images through learned filters (kernels) that slide across the image. Early layers detect edges and gradients. Middle layers detect shapes and textures. Deep layers detect high-level concepts like “dog face” or “wheel.”
The three key operations: Conv2d (detect patterns), MaxPool (downsample + translation invariance), Flatten → Linear (classify based on detected patterns).
Custom CNN + transfer learning with PyTorch
import torchimport torch.nn as nnimport torchvisionimport torchvision.transforms as Tfrom torchvision.models import resnet50, ResNet50_Weights# ── Custom CNN from scratch ────────────────────────────────────────class SimpleCNN(nn.Module): def __init__(self, num_classes: int): super().__init__() self.features = nn.Sequential( # Block 1: 3×224×224 → 32×112×112 nn.Conv2d(3, 32, kernel_size=3, padding=1), nn.BatchNorm2d(32), nn.ReLU(inplace=True), nn.MaxPool2d(2), # Block 2: 32×112×112 → 64×56×56 nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(2), # Block 3: 64×56×56 → 128×28×28 nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.MaxPool2d(2), ) self.classifier = nn.Sequential( nn.AdaptiveAvgPool2d((4, 4)), nn.Flatten(), nn.Linear(128 * 4 * 4, 256), nn.ReLU(inplace=True), nn.Dropout(0.5), nn.Linear(256, num_classes), ) def forward(self, x): return self.classifier(self.features(x))# ── Transfer Learning — use pretrained ResNet50 ────────────────────# Almost always better than training from scratch unless huge datasetmodel = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)# Freeze all layers — only train the final classifierfor param in model.parameters(): param.requires_grad = False# Replace final layer for our number of classesmodel.fc = nn.Sequential( nn.Dropout(0.3), nn.Linear(model.fc.in_features, num_classes))# Only the new layer's parameters are in the optimizeroptimizer = torch.optim.AdamW(model.fc.parameters(), lr=1e-3)
The Transformer architecture (2017) replaced recurrence with self-attention — a mechanism that lets every token in a sequence directly attend to every other token, capturing long-range dependencies that RNNs struggled with.
Self-Attention Math:
─────────────────────────────────────────────────────────────
Input: sequence of token embeddings X ∈ ℝ^(n × d_model)
Q = X·Wq (Queries — what this token is looking for)
K = X·Wk (Keys — what each token offers)
V = X·Wv (Values — what each token contributes)
Attention(Q, K, V) = softmax(Q·Kᵀ / √d_k) · V
──────────────────────
attention scores: how much
each token attends to each other
Multi-Head: run h attention heads in parallel, concatenate
→ captures different types of relationships simultaneously
Using HuggingFace Transformers — the practical way
from transformers import ( AutoTokenizer, AutoModel, AutoModelForSequenceClassification, AutoModelForCausalLM, pipeline, BitsAndBytesConfig)import torch# ── Embeddings — encode text to vectors ───────────────────────────tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")def embed(texts: list[str]) -> torch.Tensor: inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") with torch.no_grad(): out = model(**inputs) # Mean pool over token dimension mask = inputs["attention_mask"].unsqueeze(-1).float() embeddings = (out.last_hidden_state * mask).sum(1) / mask.sum(1) return torch.nn.functional.normalize(embeddings, p=2, dim=1)vecs = embed(["Machine learning is fascinating", "AI is changing the world"])similarity = (vecs[0] @ vecs[1]).item()print(f"Cosine similarity: {similarity:.4f}")# ── Sentiment classification ───────────────────────────────────────classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english", device=0 if torch.cuda.is_available() else -1)results = classifier(["I love this product!", "This is terrible."])# ── LLM inference with 4-bit quantization (run on consumer GPU) ───quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True,)llm_model = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.2", quantization_config=quantization_config, device_map="auto",)llm_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
Fine-Tuning LLMs
LoRA fine-tuning with PEFT — efficient fine-tuning
from peft import LoraConfig, get_peft_model, TaskTypefrom transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq# LoRA: add small trainable rank-decomposition matrices# Train <1% of parameters vs full fine-tuninglora_config = LoraConfig( task_type=TaskType.CAUSAL_LM, r=16, # rank — higher = more capacity, more params lora_alpha=32, # scaling factor lora_dropout=0.1, target_modules=["q_proj", "v_proj"], # which layers to adapt bias="none",)model = get_peft_model(base_model, lora_config)model.print_trainable_parameters()# trainable params: 6,553,600 || all params: 6,744,596,480 || trainable%: 0.097training_args = TrainingArguments( output_dir = "./lora-finetuned", num_train_epochs = 3, per_device_train_batch_size = 4, gradient_accumulation_steps = 4, # effective batch = 16 learning_rate = 2e-4, fp16 = True, logging_steps = 10, save_strategy = "epoch", warmup_ratio = 0.03, lr_scheduler_type = "cosine",)
RAG — Retrieval-Augmented Generation
RAG is the pattern that grounds LLM responses in your own data — preventing hallucination and keeping knowledge current. Instead of fine-tuning, you retrieve relevant documents at query time and inject them into the prompt.
RAG pipeline — full implementation
from langchain_community.vectorstores import Chromafrom langchain_community.embeddings import HuggingFaceEmbeddingsfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.schema import Documentimport anthropic# ── Step 1: Ingest and embed documents ────────────────────────────def build_vector_store(docs: list[str]) -> Chroma: splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=64, separators=["\n\n", "\n", ". ", " "] ) chunks = splitter.create_documents(docs) embeddings = HuggingFaceEmbeddings( model_name="sentence-transformers/all-MiniLM-L6-v2" ) return Chroma.from_documents(chunks, embeddings)# ── Step 2: Retrieve relevant chunks + generate ───────────────────def rag_query(question: str, vector_store: Chroma) -> str: # Semantic search — find top 4 most relevant chunks docs = vector_store.similarity_search(question, k=4) context = "\n\n".join(f"[{i+1}] {d.page_content}" for i, d in enumerate(docs)) # Inject context into prompt prompt = f"""Answer the question based on the provided context only.If the context doesn't contain enough information, say so.Context:{context}Question: {question}Answer:""" # Call LLM with grounded context client = anthropic.Anthropic() response = client.messages.create( model = "claude-3-5-sonnet-20241022", max_tokens= 1024, messages = [{"role": "user", "content": prompt}] ) return response.content[0].text
Natural Language Processing (NLP)
The NLP Pipeline
graph LR
Raw["📝 Raw Text"]
Clean["🧹 Cleaning\nLowercase · HTML · Unicode"]
Token["✂️ Tokenization\nWord · Subword · BPE"]
Embed["📐 Embedding\nWord2Vec · FastText\nBERT · Sentence-BERT"]
Task["🎯 Downstream Task\nClassify · NER · QA · Generate"]
Raw --> Clean --> Token --> Embed --> Task
spaCy is the production NLP library for tokenization, POS tagging, named entity recognition, and dependency parsing. NLTK covers classical NLP with stemming, WordNet, and BLEU scoring. Both have dedicated pages.
Text Classification & Named Entity Recognition
NLP with spaCy + transformer pipeline
import spacyfrom spacy import displacy# Load transformer-based model for maximum accuracynlp = spacy.load("en_core_web_trf") # or "en_core_web_sm" for speedtext = "Apple Inc. acquired DeepMind for $500 million in London on January 15, 2024."doc = nlp(text)# Named Entity Recognitionfor ent in doc.ents: print(f"{ent.text:30} → {ent.label_:10} ({spacy.explain(ent.label_)})")# Apple Inc. → ORG (Companies, agencies)# DeepMind → ORG (Companies, agencies)# $500 million → MONEY (Monetary values)# London → GPE (Countries, cities)# January 15, 2024 → DATE (Absolute or relative dates)# Dependency parsingfor token in doc: print(f"{token.text:15} {token.dep_:10} → {token.head.text}")# Visualize in browserdisplacy.serve(doc, style="ent")
Sentiment Analysis & Text Generation
Sentiment analysis and generation pipelines
from transformers import pipeline# Sentiment analysis — zero-shot, no training neededsentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")results = sentiment([ "This product is absolutely amazing!", "Worst purchase I've ever made.", "It's okay, nothing special.",])for r in results: print(f"{r['label']:10} ({r['score']:.3f})")# Zero-shot classification — classify into any categorieszero_shot = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")result = zero_shot( "The new iPhone has a 48MP camera and 4K video recording.", candidate_labels=["technology", "sports", "politics", "entertainment"])print(dict(zip(result["labels"], [f"{s:.3f}" for s in result["scores"]])))# Text summarizationsummarizer = pipeline("summarization", model="facebook/bart-large-cnn", min_length=30, max_length=130)summary = summarizer(long_article_text)[0]["summary_text"]
Reinforcement Learning
The RL Framework
RL agents learn by interacting with an environment, receiving rewards or penalties, and gradually discovering the policy (behavior) that maximizes cumulative reward. Unlike supervised learning, there is no labeled dataset — the signal comes from the environment.
The core challenge: exploration vs exploitation — the agent must balance trying new actions (exploration) against sticking with what it knows works (exploitation).
For game-specific RL applications with Unity ML-Agents, see Game AI — Reinforcement Learning section.
Q-Learning from scratch — the classic algorithm
import numpy as npimport gymnasium as gym# CartPole: balance a pole on a cartenv = gym.make("CartPole-v1")# Discretize continuous state space into binsN_BINS = [6, 6, 12, 12] # bins per observation dimensionobs_space = list(zip(env.observation_space.low, env.observation_space.high))def discretize(obs): bins = [np.linspace(low, high, n) for (low, high), n in zip(obs_space, N_BINS)] return tuple(int(np.digitize(o, b)) for o, b in zip(obs, bins))# Q-table: state → action valuesQ = np.zeros(tuple(n + 1 for n in N_BINS) + (env.action_space.n,))# Trainingalpha = 0.1 # learning rategamma = 0.99 # discount factorepsilon = 1.0 # exploration ratefor episode in range(10000): state, _ = env.reset() state = discretize(state) done = False while not done: # Epsilon-greedy: explore or exploit if np.random.random() < epsilon: action = env.action_space.sample() # explore else: action = np.argmax(Q[state]) # exploit next_obs, reward, terminated, truncated, _ = env.step(action) done = terminated or truncated next_state = discretize(next_obs) # Bellman equation update Q[state][action] += alpha * ( reward + gamma * np.max(Q[next_state]) - Q[state][action] ) state = next_state epsilon = max(0.01, epsilon * 0.995) # decay exploration if episode % 1000 == 0: print(f"Episode {episode:5d} | ε={epsilon:.3f}")
Deep RL Algorithms
Algorithm
Type
Action Space
Key Idea
Used In
DQN
Value-based
Discrete
Deep Q-network with replay buffer
Atari games
PPO
Policy gradient
Both
Clip objective, stable training
Unity ML-Agents, OpenAI
SAC
Actor-Critic
Continuous
Entropy regularization, sample efficient
Robotics
A3C/A2C
Actor-Critic
Both
Async workers / advantage estimate
Multi-env training
AlphaZero
MCTS + RL
Discrete
Self-play + tree search
Chess, Go, StarCraft
RLHF
RL from human feedback
Both
Human preferences as reward signal
ChatGPT, Claude
Model Evaluation & Selection
Cross-Validation Strategies
Advanced cross-validation patterns
from sklearn.model_selection import ( KFold, StratifiedKFold, TimeSeriesSplit, cross_validate, learning_curve)from sklearn.pipeline import Pipelineimport numpy as np# ── Stratified K-Fold — preserves class balance in each fold ──────skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)# Cross-validate with multiple metrics at oncecv_results = cross_validate( Pipeline([("prep", preprocessor), ("model", xgb.XGBClassifier())]), X, y, cv=skf, scoring=["accuracy", "f1_weighted", "roc_auc"], return_train_score=True, n_jobs=-1,)for metric in ["accuracy", "f1_weighted", "roc_auc"]: train = cv_results[f"train_{metric}"] test = cv_results[f"test_{metric}"] print(f"{metric:20} train={train.mean():.4f}±{train.std():.4f}" f" test={test.mean():.4f}±{test.std():.4f}")# ── Learning curve — diagnose bias vs variance ─────────────────────train_sizes, train_scores, val_scores = learning_curve( model, X, y, cv=5, train_sizes=np.linspace(0.1, 1.0, 10), scoring="accuracy", n_jobs=-1)# Plot: if val_score plateau early → high variance (need more data or regularize)# if val_score ≈ train_score low → high bias (need more complex model)
MLOps (Machine Learning Operations) applies DevOps principles to ML: versioning, CI/CD, monitoring, and automation for the full ML lifecycle. Without MLOps, models are deployed once and forgotten — silently degrading as the world changes around them.
from fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelimport mlflow.pyfuncimport pandas as pdimport numpy as npapp = FastAPI(title="ML Model API")# Load model at startup — not per-requestMODEL_URI = "models:/ChurnPredictor/Production"model = mlflow.pyfunc.load_model(MODEL_URI)class PredictionRequest(BaseModel): age: float tenure: float monthly_charges: float contract: strclass PredictionResponse(BaseModel): prediction: int # 0 = no churn, 1 = churn probability: float # churn probability 0-1 model_version: str@app.post("/predict", response_model=PredictionResponse)async def predict(request: PredictionRequest) -> PredictionResponse: try: df = pd.DataFrame([request.model_dump()]) proba = model.predict(df)[0] return PredictionResponse( prediction = int(proba > 0.5), probability = float(proba), model_version = MODEL_URI, ) except Exception as e: raise HTTPException(status_code=500, detail=str(e))@app.get("/health")async def health(): return {"status": "ok", "model": MODEL_URI}
Data Drift Detection
Detecting distribution shift in production
from evidently.report import Reportfrom evidently.metric_preset import DataDriftPreset, ClassificationPresetimport pandas as pd# Compare training distribution vs live production datareference_data = pd.read_parquet("training_data.parquet")current_data = pd.read_parquet("production_data_last_7d.parquet")# Data drift reportdrift_report = Report(metrics=[DataDriftPreset()])drift_report.run(reference_data=reference_data, current_data=current_data)drift_report.save_html("drift_report.html")# Check programmaticallydrift_result = drift_report.as_dict()drifted_features = [ feature for feature, stats in drift_result["metrics"][0]["result"]["drift_by_columns"].items() if stats["drift_detected"]]if drifted_features: print(f"⚠️ Drift detected in: {drifted_features}") # Trigger retraining pipeline via Kestra or Airflow
kestra (see DevOps ecosystem) and Celery handle the automated retraining pipelines that drift detection triggers. Automation covers the broader MLOps workflow automation patterns.
Explore Further
ML is built on every layer of the engineering stack. Here is where to go deeper.
The data layer — before modeling, you need clean, well-understood data. Data Science covers the full EDA workflow with Pandas, NumPy, statistical testing, and visualization with Matplotlib, Seaborn, and Plotly. The data pipeline and feature store patterns there feed directly into the preprocessing section of this page.
The deep learning framework — PyTorch has its own dedicated page covering custom datasets, distributed training, ONNX export, and TorchScript. For production vision workloads, OpenCV handles real-time video processing and image augmentation. scikit-image provides classical computer vision algorithms. Scikit Learn covers the classical ML algorithms in depth with tuning examples, pipelines, and ensemble methods.
Scientific computing underneath — SciPy provides the statistical tests, signal processing, and optimization algorithms that ML relies on. SymPy handles symbolic math — useful for deriving and verifying gradients.
NLP tooling — spaCy is the production NLP library for entity recognition, dependency parsing, and text pipelines. NLTK covers classical NLP (stemming, WordNet, BLEU scores). Together they handle everything from quick text preprocessing to full annotation pipelines.
Game AI and RL in engines — Game AI covers Unity ML-Agents, reward design, imitation learning, and how RL connects to FSMs and behavior trees in real game development. It bridges the gap between the RL algorithms on this page and shipping an actual game.
Deploying models to production — DevOps covers containerizing models with Docker, deploying to Kubernetes, and CI/CD pipelines for ML. Automation covers the orchestration layer: training pipeline scheduling, model validation workflows, and automated retraining with kestra. Continuous Monitoring & Logging covers how to monitor model endpoints — latency, error rates, and input distribution drift detection.
The architecture your model runs on — System Design covers model serving at scale: inference caches, async queues for batch prediction, GPU cluster design, and the CAP theorem tradeoffs in distributed training. System Design Scalability & CAP is particularly relevant when your model needs to handle millions of inference requests per day.
Security and responsible AI — Cybersecurity covers adversarial ML attacks (model inversion, membership inference, prompt injection in LLMs) and defenses. Understanding the attack surface of deployed models is increasingly important as AI becomes more widely integrated into critical systems.