BERT Attention Visualiser
Loads bert-base-uncased and extracts per-layer, per-head attention weights for any sentence. Includes a matplotlib heatmap showing how token pairs attend to each other across all 12 layers.
nlpberttransformerspython
Python script that loads bert-base-uncased, runs a forward pass with output_attentions=True, and plots a 12×12 heatmap of attention weights — one cell per layer/head combination.