From the course: NLP with Python for Machine Learning Essential Training
Unlock the full course today
Join today to access over 22,800 courses taught by industry experts or purchase this course individually.
Implementation: Removing stop words - Python Tutorial
From the course: NLP with Python for Machine Learning Essential Training
Implementation: Removing stop words
- [Instructor] So let's jump in where we left off previously. If you're just joining us, go ahead and rerun all the cells prior to this remove stopwords heading. We're almost through the basic cleaning steps now. Thus far, we've removed punctuation and tokenized to create a list of words out of a sentence. The last step in cleaning up this data is to remove stopwords. Now we've discussed stopwords previously. They are commonly-used words like the, but, if, that don't contribute much to the meaning of a sentence. So we want to remove them, to limit the number of tokens Python actually has to look at when building our model. For instance, take the sentence, I am learning NLP. After tokenizing, it would have four tokens, I, am, learning, and NLP. Then after removing stopwords, instead of a list with four tokens, you're now left with just learning and NLP. So it gets across the same message, and now, your machine learning model only has to look at half the number of tokens. So I'll use…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
(Locked)
What are NLP and NLTK?4m 7s
-
(Locked)
NLTK setup and overview6m 15s
-
(Locked)
Reading in text data11m 41s
-
(Locked)
Exploring the dataset6m 56s
-
(Locked)
What are regular expressions?4m 8s
-
(Locked)
Learning how to use regular expressions8m 44s
-
(Locked)
Regular expression replacements6m 3s
-
(Locked)
Machine learning pipeline4m 45s
-
(Locked)
Implementation: Removing punctuation9m 10s
-
(Locked)
Implementation: Tokenization3m 37s
-
(Locked)
Implementation: Removing stop words4m 2s
-
(Locked)
-
-
-
-
-