From the course: Learning Excel: Data Analysis

Use the normal distribution - Microsoft Excel Tutorial

From the course: Learning Excel: Data Analysis

Start my 1-month free trial

Use the normal distribution

- [Instructor] The data distribution that you are most likely to use during your analysis is the normal distribution. The normal curve or bell curve has the shape shown in this chart. This chart indicates probabilities for a curve with an average or mean of 100 and a standard deviation of 20. The mean is usually indicated by the Greek letter mu and the standard deviation by the Greek letter sigma. the normal curve has some very useful properties. The first is that approximately 68% of all values will occur within plus or minus one standard deviation. So with our mean of 100, that would mean that about 68% of the values would fall within 20 above or below the average. So 80 to 120. 95% of values will be within two standard deviations, So 60 to 140, and 99.7% within three standard deviations plus or minus, so that's 40 to 160. To see how to work with these values in Excel, We'll switch over to our practice workbook. I've switched over to Excel and my sample file is 04_01_Normal, and you can find it in the chapter four folder of the exercise files collection. I use the values in columns A and B to create the graph of the curve that you see at the bottom right. But let's ask some numerical questions of our data. For example, we can calculate the probability of getting exactly 92. So we have an average of 100, standard aviation of 20, 92 is close to the middle, so let's calculate the probability of getting exactly that value if we're generating random numbers. So I'll click in cell E1 and then type an equal sign. And the function we use is NORM.DIST and as you might guess, that stands for normal distribution. The value we're working with our X is 92. So I'll type that in, then a comma. The mean is in B1, comma, standard deviation in B2, then a comma. And we are looking for the probability mass function which is also called a point probability. And that means that for the last argument, I need to select FALSE so I highlight that. Press tab to accept it, type a right parenthesis and enter. And we see the probability of getting exactly 92 is 1.84%. And that might seem pretty low, but remember, within three standard deviations, we go from 40 to 160. So the fact that 92 is as probable as it is at a random selection is an indication of how close to the average it is. Now let's calculate the probability of getting 92 or more. And I will do it incorrectly the first time and then show you how to fix what is a very common mistake. So in E2, I'll type equal, NORM.DIST. As before our X is 92, the mean is in B1, standard deviation in B2 and then a comma, but now we do want to look for the accumulative distribution function. And that's because we're looking for 92 or more. So we want a spread of values instead of a single point probability. So I highlight TRUE, type a right parenthesis, and again, this is going to be an incorrect result. I get 34.46 of getting 92 or more. And here's why that's wrong. 92 is to the left of the mean. And if you look at the normal curve, half the values are greater than the mean and the other half are less than the mean. So the fact that our calculation shows only 34.46% of values are greater than 92, which is less than the mean, must be incorrect. The way to fix this error is to subtract that calculation from one. So I will double click in cell E2, and then I will add one minus our previous calculation. Now, when I press tab, I get 65.54% and that makes a lot more sense because 92 is approximately here, I've highlighted 90, and I'll just leave the mouse pointer there to show you the approximate point. You can see that about 65.54% of the values are to the right so our calculation makes intuitive sense. We can also ask about percentages of values within a distribution. So let's say, what is the value inside this curve or as part of this data distribution that 33% of values are below? So I will click in cell H1 and type an equal sign. We can't use NORM.DIST for this calculation but we can use a different function, NORM.INV and this is the inverse of the normal distribution where we got probabilities with NORM.DIST, the inverse of that gives us values based on a probability. So our probability is 33% then a comma, our mean is still in B1, standard deviation is still in B2. We don't need any other arguments so I'll type a right parenthesis and enter. And we get 91.2. And this again makes sense. About 33% of our values are below 91, which again is here on the curve, approximately, and that will show that about 33% of the values are to the left so our value checks out. If I want to find the value for which 90% of values in this curve are above, Then I can do NORM.INV. And if you're suspecting that we need to do one minus something, as we did with probability of 92 or more, you are correct but we put it in a different place. So in H2 I'll type an equal sign, NORM.INV. The result we would get by typing in 90% would be to return a value that 90% of values are below. So we need to subtract the percentage from one as part of the probability calculation. So for the first argument, I'll type 1 minus 90%, then a comma, B1 for the mean, B2 for the standard deviation, right parenthesis and enter, and we get 74.37 approximately. And again, that makes sense. If I go down to 74, that's 72, oh, there's 74. We can see where it lies on the curve and it makes sense that about 90% of values, including the fat part of the curve in the middle would be above the return value of about 74.37. So as you can see, you can do a lot with the normal curve, especially with the functions NORM.DIST and NORM.INV.

Contents