From the course: NLP with Python for Machine Learning Essential Training

Unlock the full course today

Join today to access over 22,800 courses taught by industry experts or purchase this course individually.

Implementation: Removing stop words

Implementation: Removing stop words - Python Tutorial

From the course: NLP with Python for Machine Learning Essential Training

Start my 1-month free trial

Implementation: Removing stop words

- [Instructor] So let's jump in where we left off previously. If you're just joining us, go ahead and rerun all the cells prior to this remove stopwords heading. We're almost through the basic cleaning steps now. Thus far, we've removed punctuation and tokenized to create a list of words out of a sentence. The last step in cleaning up this data is to remove stopwords. Now we've discussed stopwords previously. They are commonly-used words like the, but, if, that don't contribute much to the meaning of a sentence. So we want to remove them, to limit the number of tokens Python actually has to look at when building our model. For instance, take the sentence, I am learning NLP. After tokenizing, it would have four tokens, I, am, learning, and NLP. Then after removing stopwords, instead of a list with four tokens, you're now left with just learning and NLP. So it gets across the same message, and now, your machine learning model only has to look at half the number of tokens. So I'll use…

Contents