Table of Contents Introduction Input Embedding Positional Encoding (PE) The Encoder Self-attention Mecanism Multi-head attention mechanism Feedforward Network The Decoder Masked Multi-head Attention Multi-head Attention Feedforward Network Linear and Softmax Layer Transformer Training Conclusion Introduction The Transformer is currently one of the most popular architectures for NLP. We can periodically...
Continue reading...