[go: up one dir, main page]

Open In App

Python NLTK | nltk.tokenizer.word_tokenize()

Last Updated : 12 Jun, 2019
Improve
Improve
Like Article
Like
Save
Share
Report

With the help of nltk.tokenize.word_tokenize() method, we are able to extract the tokens from string of characters by using tokenize.word_tokenize() method. It actually returns the syllables from a single word. A single word can contain one or two syllables.

Syntax : tokenize.word_tokenize()
Return : Return the list of syllables of words.

Example #1 :
In this example we can see that by using tokenize.word_tokenize() method, we are able to extract the syllables from stream of words or sentences.




# import SyllableTokenizer() method from nltk
from nltk import word_tokenize
     
# Create a reference variable for Class word_tokenize
tk = SyllableTokenizer()
     
# Create a string input
gfg = "Antidisestablishmentarianism"
     
# Use tokenize method
geek = tk.tokenize(gfg)
     
print(geek)


Output :

[‘An’, ‘ti’, ‘dis’, ‘es’, ‘ta’, ‘blish’, ‘men’, ‘ta’, ‘ria’, ‘nism’]

Example #2 :




# import SyllableTokenizer() method from nltk
from nltk.tokenize import word_tokenize
     
# Create a reference variable for Class word_tokenize
tk = SyllableTokenizer()
     
# Create a string input
gfg = "Gametophyte"
     
# Use tokenize method
geek = tk.tokenize(gfg)
     
print(geek)


Output :

[‘Ga’, ‘me’, ‘to’, ‘phy’, ‘te’]


Similar Reads

Python NLTK | nltk.tokenize.TabTokenizer()
With the help of nltk.tokenize.TabTokenizer() method, we are able to extract the tokens from string of words on the basis of tabs between them by using tokenize.TabTokenizer() method. Syntax : tokenize.TabTokenizer() Return : Return the tokens of words. Example #1 : In this example we can see that by using tokenize.TabTokenizer() method, we are abl
1 min read
Python NLTK | nltk.tokenize.SpaceTokenizer()
With the help of nltk.tokenize.SpaceTokenizer() method, we are able to extract the tokens from string of words on the basis of space between them by using tokenize.SpaceTokenizer() method. Syntax : tokenize.SpaceTokenizer() Return : Return the tokens of words. Example #1 : In this example we can see that by using tokenize.SpaceTokenizer() method, w
1 min read
Python NLTK | nltk.tokenize.StanfordTokenizer()
With the help of nltk.tokenize.StanfordTokenizer() method, we are able to extract the tokens from string of characters or numbers by using tokenize.StanfordTokenizer() method. It follows stanford standard for generating tokens. Syntax : tokenize.StanfordTokenizer() Return : Return the tokens from a string of characters or numbers. Example #1 : In t
1 min read
Python NLTK | nltk.tokenize.mwe()
With the help of NLTK nltk.tokenize.mwe() method, we can tokenize the audio stream into multi_word expression token which helps to bind the tokens with underscore by using nltk.tokenize.mwe() method. Remember it is case sensitive. Syntax : MWETokenizer.tokenize() Return : Return bind tokens as one if declared before. Example #1 : In this example we
1 min read
Python NLTK | nltk.WhitespaceTokenizer
With the help of nltk.tokenize.WhitespaceTokenizer() method, we are able to extract the tokens from string of words or sentences without whitespaces, new line and tabs by using tokenize.WhitespaceTokenizer() method. Syntax : tokenize.WhitespaceTokenizer() Return : Return the tokens from a string Example #1 : In this example we can see that by using
1 min read
Python NLTK | nltk.tokenize.LineTokenizer
With the help of nltk.tokenize.LineTokenizer() method, we are able to extract the tokens from string of sentences in the form of single line by using tokenize.LineTokenizer() method. Syntax : tokenize.LineTokenizer() Return : Return the tokens of line from stream of sentences. Example #1 : In this example we can see that by using tokenize.LineToken
1 min read
Python NLTK | nltk.tokenize.SExprTokenizer()
With the help of nltk.tokenize.SExprTokenizer() method, we are able to extract the tokens from string of characters or numbers by using tokenize.SExprTokenizer() method. It actually looking for proper brackets to make tokens. Syntax : tokenize.SExprTokenizer() Return : Return the tokens from a string of characters or numbers. Example #1 : In this e
1 min read
Python | NLTK nltk.tokenize.ConditionalFreqDist()
With the help of nltk.tokenize.ConditionalFreqDist() method, we are able to count the frequency of words in a sentence by using tokenize.ConditionalFreqDist() method. Syntax : tokenize.ConditionalFreqDist() Return : Return the frequency distribution of words in a dictionary. Example #1 : In this example we can see that by using tokenize.Conditional
1 min read
Python NLTK | nltk.TweetTokenizer()
With the help of NLTK nltk.TweetTokenizer() method, we are able to convert the stream of words into small tokens so that we can analyse the audio stream with the help of nltk.TweetTokenizer() method. Syntax : nltk.TweetTokenizer() Return : Return the stream of token Example #1 : In this example when we pass audio stream in the form of string it wil
1 min read
NLP | Training a tokenizer and filtering stopwords in a sentence
Why do we need to train a sentence tokenizer? In NLTK, default sentence tokenizer works for the general purpose and it works very well. But there are chances that it won't work best for some kind of text as that text may use nonstandard punctuation or maybe it is having a unique format. So, to handle such cases, training sentence tokenizer can resu
3 min read
Article Tags :
Practice Tags :