From the course: Introduction to Career Skills in Data Analytics

Tips on preparing the data in Excel

From the course: Introduction to Career Skills in Data Analytics

Tips on preparing the data in Excel

- [Instructor] In the life of every data analyst, you reach a point where it's time to prepare the data. This is the part where we clean and transform our data to meet the requirements, and if you haven't heard, we do this a lot. You've profiled your data, you've reviewed all the business rules, and now it's time to dig in and actually get started. I want to work with my sales order data to prepare it for a template to import it into a new system. I typically start with a new blank workbook and I'll use Power Query to connect to my data, and then I'll do my data transformations there. I'll go to my data tab, I'll choose get data, and I'll choose from file. There are several connections here, but because my data is an export that's stored in an Excel workbook, I can choose from file and from workbook. Okay, I'll navigate to my data for template, I'll double click it, and this is what is establishing the connection between my Excel file and Power Query. I'll choose sales orders, and then I have two options. I can go ahead and load the data to this spreadsheet, or I can choose transform. Because I know I have transformations to make, I'll go ahead and choose transform data. Okay, so I'm connected to my data, and I can see my sales order query. I see my query settings and my applied steps. First off, I want to show you that it promoted my headers. Now, what that actually means is it took the first row of information that it saw from my spreadsheet and made that my column headers. And then it changed the type. What that means is that it looked at the second row of information, which was actually the first row of values, and tried to determine what the data types would be based on the values that it sees. Okay, for example, sales order ID. It has numbers. So it automatically translated that as a number. Order date has a date and a time, so it automatically made that date and time. Okay, as part of my requirements, I know that I have to change sales order to be text. So I'll hit the one, two, three, and change it to a text data type. It's asking me do I want to replace the current step or add a new step? I don't want to change how it read every single data type, so I'll go ahead and add a new step, and then on the right hand side, you see my applied steps has a new step where I changed the sales order ID to text. We also know from our technical requirements that we can have sales order number. It is also supposed to be text, but it cannot contain any letters. So I need to remove the S and the O from the front of the data. So what I'll do is I'll highlight that whole column and I can right click and choose replace values. I can also just select a single field and choose replace values. I can highlight the whole column and choose replace values up top. Okay, so I'll choose replace values. No matter what step I choose, the outcome will be the same. So I want to find all of the SOs in this column, and I want to replace them with nothing because I want just the number. I'll take a look at the advanced options. It's asking me do I want to match the entire sale contents or replace using special characters? Neither of these apply. Okay. I'll choose okay, and then immediately, I see sales order number. It's still text, which is appropriate for my requirements, but it no longer contains the S and the O. On the right hand side in my applied steps, I see replaced value. And if I needed to change anything, I could hit the little gear shape, and that takes me right back into my steps. Okay, I'll choose cancel there. Because Power Query keeps all of our steps, it's similar to what people do with recording macros or coding VBA for data cleaning. Except we're not having to code or record. We're just actually performing the actions and it's keeping up with it. Let me show you what I mean. So, let me click on navigation. Notice that the first row contains my column headers. So now when I choose the next step, it shows me that it promoted those headers. Then it changed all the data types based on the data that it sees. And then I started my first step which was changing the sales order ID, and notice I still see the SO until I choose replaced value. That means if my data changes, I can update my data source, and it will reapply all the same steps. Okay. Let's go ahead and change the data types for dates. I don't need the timestamp, and also notice they're all set to midnight anyway. I'll go ahead and hit the dropdown and choose date, choose date again, and then date again. So now I have my dates in order. Perfect. Let's go ahead and work with parsing text. So, first of all, we have an account number. This account number actually really needs to be referred to as the main account GL. So I'll go ahead and double click account number and change it to main account GL. Now each piece of this main account GL actually represents another field of data that I need. So, I need to actually parse this text. I need to split it apart and I'll use what's called a delimiter to do that. Notice there's a dash in between each section. The first thing I'll do is duplicate this field, and that will throw it all the way to the right. That way I can keep the main account GL and then also create the three new fields. I'll right click, I'll choose split column, and I'll choose delimiter. Notice there's several options here. I'll choose delimiter. My delimiter is a dash, although I have multiple options here. Okay, so custom dash, and I do want to split it at each occurrence of the delimiter. All right, I'll go ahead and click okay. Let me scroll over. And I have my three new fields built from the main account GL. Let's go ahead and name these. So I'll call this gentle. This should be labeled GL number. This field will be called account number 'cause that's what it represents. And then this last number here is called category. Okay, perfect. Now, if you look up top, you see what's called M for mashup. This is the language that's keeping all of my steps. If you want to see all of those steps, you can go to the advanced editor, and this is its recording of everything we're completing. Okay. I'll go ahead and close that advanced editor. I also need to remove columns. Now, I can actually right click any column and choose remove. I can keep the columns that I want and then right click and tell it to remove all other columns, or I can go to choose columns up top and then just deselect the ones I do not need. So I do not need territory ID or comments for my final file. I'll go ahead and do okay. Now that all my transformations are made, I can go ahead and close and load this data to my sheet. It tells me that I have 3,500 rows loaded. This is perfect. Okay, great, tells me where my data sources are and the time of my last refresh. These are basic steps that anyone can perform to clean up columns, convert data types, and break text apart.

Contents