SVM Playground - Learn Support Vector Machines

The Fundamentals

What is a Support Vector Machine?

A Support Vector Machine (SVM) is a supervised learning algorithm that finds the hyperplane that maximally separates two classes. The data points closest to the boundary are the support vectors — they are the only points that actually define the decision surface.

When data is not linearly separable, SVM uses the kernel trick to implicitly transform the input data into a higher-dimensional feature space where a linear separator can exist. Instead of computing the transformation directly, a kernel function measures similarities between data points efficiently. Common kernels include the linear, polynomial, and RBF (Gaussian) kernels, which allow SVMs to handle complex nonlinear classification problems.

Key Idea

SVM doesn't just find any separator — it finds the one with the maximum margin, giving the best generalization to unseen data.

The Kernel Trick

Kernel types covered

Kernels let SVMs operate in high-dimensional spaces without ever computing the transformation explicitly. Each kernel defines a different notion of “similarity” between points.

Linear

K(x, y) = x · y

Best when data is (nearly) linearly separable. Fast and interpretable. The decision boundary is a straight line (or hyperplane).

Polynomial

K(x, y) = (γ x · y + coef0)^d

Captures curved boundaries. Degree d controls flexibility — higher degree means more complex shapes but risk of overfitting.

RBF (Gaussian)

K(x, y) = exp(−γ ‖x − y‖²)

The most popular kernel — maps to infinite-dimensional space. Works well for arbitrary, smooth decision boundaries.

Beyond Binary

Multi-class strategies

SVM is fundamentally a binary classifier. To separate K classes (K ≥ 3) we need to combine multiple binary SVMs. There are two standard approaches:

OvA

One-vs-All (OvA)

Train K binary classifiers. Each one answers “is this point class i, yes or no?” The final prediction is the class whose classifier outputs the highest decision value (argmax).

Pros: Few classifiers (4 for K=4, 10 for K=10). Each classifier sees all the training data.

Cons: Each binary problem is imbalanced — one class against K−1 others.

OvO

One-vs-One (OvO)

Train K(K−1)/2 binary classifiers, one for every pair of classes. Each classifier sees only the points from those two classes. Prediction is by majority vote.

Pros: Each binary problem is balanced. Each problem is also smaller, so training is faster. This is scikit-learn's default.

Cons: Many more classifiers as K grows (45 for K=10). Voting can produce ties.

Try both: on most datasets the predictions are nearly identical. The difference becomes visible on datasets where one class is much smaller than the others (OvA struggles, OvO usually doesn’t).

Open Source

mo2menwael / SVM-playground

The full source code — solver, kernels, canvas renderer, and UI — is open and free to explore, fork, or build on.

View on GitHub

Support Vector Machines