Linear
K(x, y) = x · y
Best when data is (nearly) linearly separable. Fast and interpretable. The decision boundary is a straight line (or hyperplane).
Explore how SVMs find optimal decision boundaries. Learn the theory, understand kernels, and then build intuition with a live, interactive simulator.
A Support Vector Machine (SVM) is a supervised learning algorithm that finds the hyperplane that maximally separates two classes. The data points closest to the boundary are the support vectors — they are the only points that actually define the decision surface.
When data is not linearly separable, SVM uses the kernel trick to implicitly transform the input data into a higher-dimensional feature space where a linear separator can exist. Instead of computing the transformation directly, a kernel function measures similarities between data points efficiently. Common kernels include the linear, polynomial, and RBF (Gaussian) kernels, which allow SVMs to handle complex nonlinear classification problems.
SVM doesn't just find any separator — it finds the one with the maximum margin, giving the best generalization to unseen data.
Kernels let SVMs operate in high-dimensional spaces without ever computing the transformation explicitly. Each kernel defines a different notion of “similarity” between points.
K(x, y) = x · y
Best when data is (nearly) linearly separable. Fast and interpretable. The decision boundary is a straight line (or hyperplane).
K(x, y) = (γ x · y + coef0)d
Captures curved boundaries. Degree d controls flexibility — higher degree means more complex shapes but risk of overfitting.
K(x, y) = exp(−γ ‖x − y‖²)
The most popular kernel — maps to infinite-dimensional space. Works well for arbitrary, smooth decision boundaries.
SVM is fundamentally a binary classifier. To separate K classes (K ≥ 3) we need to combine multiple binary SVMs. There are two standard approaches:
Train K binary classifiers. Each one answers “is this point class i, yes or no?” The final prediction is the class whose classifier outputs the highest decision value (argmax).
Train K(K−1)/2 binary classifiers, one for every pair of classes. Each classifier sees only the points from those two classes. Prediction is by majority vote.
Try both: on most datasets the predictions are nearly identical. The difference becomes visible on datasets where one class is much smaller than the others (OvA struggles, OvO usually doesn’t).
The full source code — solver, kernels, canvas renderer, and UI — is open and free to explore, fork, or build on.
Launch the interactive playground and build your own intuition for how SVMs work.
Open the Playground