From the course: Machine Learning with Logistic Regression in Excel, R, and Power BI

Calculating linear regression

- Games like basketball are great examples of logistic regression models, but how do we know that's the case. In our basketball dataset, I added a new sheet to note at linear X one, where I added the independent variable X one for the points scored in the first column and the dependent variable Y for the game outcome in the second column. Notice the values and the outcome the win-loss column are either one, which means they won or zero which means they lost. We could set win to zero and loss to one. We just need to be consistent with the rest of the calculations in the model, as per the way we originally classified them. It looks like this is an example of logistic regression, but let's see what this model looks like by running the standard linear regression calculations on these numbers. If you look at the formulas in these cells, you can see they referenced the training data set. We don't have to put our models in separate sheets within Excel, but I find doing this keeps my work more organized. Let's start by selecting points scored as the independent variable we'll analyze against the win-loss dependent variable outcome in a two dimensional space. Well, first create a scatter plot. Notice the points are tightly packed into a range of values for the points scored along the X axis, but there's only two outcomes for the y-axis zero and one, and they're clustered together quite closely. We also see an overlap between the points scored in the outcome of the game. This occurs because sometimes even if a team scores a lot of points, if they're playing a tougher opponent, this may not be enough points to win the game. If we add a trend line, the scatter plot by selecting the plus icon and selecting the trend line, we see that the line is a linear line through the points. We see it estimates the relationship between the independent variable and the X-axis and the dependent variable on the Y-axis. It's a straight line. It doesn't appear to be a good fit for our model. We want our model to show the relationships between either zero or a one, instead of outside of this range. Now let's calculate the slope for the best fit line for this model to see what it gives us. We'll use the Excel function slope, and we'll then reference our dependent variables, first the known Ys or the known Y. And then we'll reference our known Xs, which is our one independent variable. Let's also calculate the intercept. We use the intercept formula in Excel and we'll reference the same Y and X values that we did for our slope calculation. - We'll make a copy of our scatter plot and put it at the top. If we take a closer look at the trend line compared to the slope and intercept which is calculated, we see the trend line calculates its best fit, using the linear regression slope and intercept values because the linear regression model yields results outside our expected outcome range between zero and one. We know it's not a good fit for our own data. So let's use logistic regression instead. Unfortunately, we can't solve directly using formulas like this for the logistic regression model, but we can model it in Excel, which we'll do using solver.

Contents