Transformer

The Transformer is a deep learning architecture introduced in the 2017 paper “Attention Is All You Need.” It relies on self-attention mechanisms to process input data in parallel, making it more efficient and powerful than previous recurrent neural network (RNN) architectures for natural language processing. It is the foundation for most modern large language models, including GPT, Claude, and Gemini.