From the course: GPT-4 Foundations: Building AI-Powered Apps

What are large language models (LLMs)? - GPT Tutorial

From the course: GPT-4 Foundations: Building AI-Powered Apps

What are large language models (LLMs)?

- [Narrator] So what do ChatGPT and GPT-4 have in common? This might be a question that you've been wondering since they're presented as very similar things. Well, they're both large language models, or LLMs for short. But what does that mean? A large language model is a machine learning model, or ML model for short, that has been trained on large amounts of data and is good at performing language related tasks. Because of LLM size, they require powerful computers to run. So you might ask, if ChatGPT and GPT-4 are large language models, are there any other ones out there? And the answer is yes. GPT-4 is an LLM that's developed at OpenAI. And there's also a recent one that's been released called Claude, by Anthropic. And Google has their own large language model, called Bard. But, there are actually many, many large language models. There are dozens of large language models released by different companies. Some of them are proprietary, like GPT-4, and others are open source, meaning anyone can download them and use them commercially. Now, let's go through a brief history of large language models. The origin of modern LLMs is in 2017 when Google released the Transformer model. On the graph here, the numbers underneath the models represent the size of the models. For example, GPT-2 has 1.2 billion. The larger the model, generally the more powerful it is. Many models were released. Then GPT-3 came out in 2020, which was the beginning of the true LLM moment in time for us. And now, we're at GPT-4. So large language models have been around for at least six years, but we've only really seen a large uptake in the past three. So large language models have been around for a little while, but how do they work? For models, like GPT-4, they work by predicting the next word in a sentence. So let's take a look at some examples. We might have a phrase like "the cat wore a," and then we ask our model to predict what the next word is. And it might be "hat" or it might also be "sweater." Now, similar to predicting the next word, when you ask a large language model, like GPT-4 a question, it predicts the next series of words based on a probability of what might come next. So based on what is two plus two, we might predict two plus two equals four. Now, how does it know what to predict next? Let's take a look at the training data. To train an LLM, we have three basic steps. First, we take lots and lots of data filled with billions, or even trillions of words. And then for each sentence in our data set, we have it predict the next word. And then we compare the prediction to the final result. So it's a self-correcting system. Now, where do we get all this data? We get it from the internet. From what we know, large language models are trained from sources like Reddit, Wikipedia and Common Crawl. Common Crawl is a general crawl of the internet, meaning many webpages that have been indexed and have data gathered about them. However, each LLM provider keeps their training data secret as it's a competitive advantage. Now, there are definitely risks with these large language models making these guesses since they're not based on truth. They're based on predicting what they've seen. So we can imagine that the comments or the posts made on social media might have some pretty nasty stuff in there. So the companies who make large language models need to make a significant effort to reduce some of this toxicity that exists on the internet today. Now, LLMs might seem magical with their abilities, but now we know how they work when they answer a question. They guess a likely completion to the prompt. And these guesses are based on what it's seen before that's been part of its training data. So to answer the question, what do ChatGPT and GPT-4 have in common? They're LLMs trained on internet data and learn to predict the next part of a phrase. There are many large language models out there, and many of them work in similar ways.

Contents