From the course: Generative AI: Working with Large Language Models

Transformer: Architecture overview

From the course: Generative AI: Working with Large Language Models

Transformer: Architecture overview

- [Instructor] You're probably wondering what the transformer architecture looks like. So let me head over to the attention is all you need paper and show you. We'll divide this architecture into two sections so that we can understand each component. The left half of the diagram is known as an encoder and the right hand side is known as a decoder. We feed in the English sentence such as I like NLP into the encoder at the bottom of the diagram. And the transformer can act as a translator from English to German. And so the output from the decoder at the top of the diagram is the German translation, ich mag NLP. The transformer is not made up of a single encoder, but rather six encoders. Each of these parts can be used independently depending on the task. So encoder-decoder models are good for generative tasks such as translation or summarization. Examples of such encoder-decoder models are Facebook's BART model and Google's T5. Encoder-only models are good for tasks that require understanding of the input, such as sentence classification and named entity recognition. Examples include the family of BERT models such as BERT, RoBERTa, and DistilBERT amongst others. Decoder models are good for generative tasks such as text generation. Examples include the GPT family such as GPT, GPT-2 and GBT-3. In fact, all of the models after GPT-3 that we look at in this course are decoder models. So in summary, transformers are made up of encoders and decoders, and the tasks we can perform will depend on whether we use either or both components.

Contents