Minimal Transformer in PyTorch

A self-contained transformer encoder built from scratch in PyTorch: multi-head scaled dot-product attention, positional encoding, and a feed-forward sublayer — under 150 lines with annotated shapes at every step.

GitHub ↗

machine-learningtransformersnlpdeep-learning

A transformer encoder in under 150 lines of PyTorch — every tensor shape printed so you can follow the data through attention.

← All projects