Multi-Head Attention from scratch with PyTorch
A few weeks ago I have implemented Self Attention mechanism from scratch using PyTorch and this post is a sequel to the first one. If not already, do read that article from here Building Self-Attention from scratch. In this post we will explore various steps involved in building Multi-Head Attention, how it’s different from Self-Attention and why it’s needed. What is a Multi-Head Attention In simple words, Multi-Head Attention is an extension of Self-Attention but the main idea here is to use the Self-Attention multiple times in parallel on a same input sequence to understand the hidden intricate relationships. ...