About NN-Visual

Where it started

A few years ago I decided to build a neural network entirely from scratch in Python — no PyTorch, no Keras, just NumPy and the math. The goal was to understand exactly what happens at each step: how the weights update, what the gradients represent, why activation functions change a network's behavior. Working at that level gives you access to everything the higher-level libraries abstract away: intermediate activations, pre-activation values, model weights and biases, per-layer deltas during backprop.

Around the same time, a professor I respect mentioned he wished he had a better way to show students how neural networks actually work — not the equations in isolation, not a high-level diagram, but something that bridged the two. The gap between the underlying math and the overall picture is where most students get lost.

I realized I already had what was needed to close that gap. The scratch implementation surfaced every intermediate value at every layer. All that was left was building a way to show it that allowed other students to explore and build intuition for how it all fit together.

How it's evolved

The first version was simple — a graphical visualization of a small network training on toy data and the underlying math that powered it. Over the past couple of years the site has grown to cover more ideas, realistic datasets, and advanced visualizations that enable deeper exploration. You can now step through both forward propagation and backpropagation, tracing how individual weights and biases update while training on real-world datasets such as handwritten digits or car fuel economy data.

The transformer visualizer came later, when the same professor challanged me to create an intuitive visualization of attention mechanisms as they became the concept everyone wanted to understand but few could explain concretely. The same principle applied: take the intermediate state of query and key vectors, raw attention scores, per-head weights and make it something you can actually click through. It helps students draw conclusions as to how an attention mechanism draws connections and semantic understanding.

Who uses it

That professor has been showing it to his students consistently ever since. It has become a regular part of how he introduces deep learning in his courses — something I genuinely didn't expect when I started. Knowing it works as a teaching tool is the most useful and rewarding feedback I could get.

Beyond the classroom, the site is free and open to anyone building intuition for how these models work — students, developers who are new to ML, or people who just want to understand what a gradient descent step actually does to the weights.

Try it

All three visualizers are live and require no setup.

Neural Network Visualizer — configure a network, pick a dataset, and train it. Step through forward and backpropagation to see exactly how each activation and weight changes at every layer.
Attention Visualizer — explore self-attention using real BERT weights. Click any token to step through the Q·K scoring, see multi-head patterns, and compare encoder vs. decoder masking.
Transformer Architecture Visualizer — see the full transformer pipeline with a live GPT-2 inference diagram. Explore how embeddings, stacked attention blocks, and feed-forward layers work together to produce an output distribution.

Get in touch

If you're using this in a course, have feedback, or just want to say hello — I'd like to hear from you and connect!

grantwasserman.com LinkedIn grantmwasserman@gmail.com