- Linear Regression
- Fits a straight line through data by minimising the root mean squared error between predictions and actual values.
- Gradient Descent
- The core optimisation algorithm behind all ML models — iteratively steps in the direction of steepest loss reduction using the gradient.
- Neural Network
- Stacked layers of weighted neurons where each forward pass propagates activations and backpropagation adjusts weights to minimise error.
- Overfitting
- The bias–variance tradeoff: too simple a model underfits, too complex memorises noise and fails to generalise to new data.
- Attention
- The mechanism behind transformers — each token attends to all others via query–key dot products, enabling context-aware representations.
- Softmax
- Converts raw logit scores into a probability distribution summing to 1. Temperature scaling controls sharpness — the foundation of every LLM output layer.
- Embeddings
- Dense vector representations where geometric proximity reflects semantic similarity. The basis of similarity search, RAG retrieval, and representation learning.
- Tokenization
- BPE subword tokenization — how LLMs split text before processing. Explains why token count ≠ word count and why unusual words are harder to predict.
- K-Means Clustering
- Unsupervised learning that alternates between assigning points to nearest centroids and updating centroid positions until convergence.
- Decision Tree
- Learns if/else splits that maximise information gain (Gini impurity reduction), naturally interpretable and the basis of Random Forest and XGBoost.
- CNN Filter
- Convolutional filters slide across images computing dot products at each position, building a feature map that detects edges, textures, and shapes.