Minimal Transformer in PyTorch
A self-contained transformer encoder built from scratch in PyTorch: multi-head scaled dot-product attention, positional encoding, and a feed-forward sublayer — under 150 lines with annotated shapes at every step.
machine-learningtransformersnlpdeep-learning
A transformer encoder in under 150 lines of PyTorch — every tensor shape printed so you can follow the data through attention.