Python NLTK | nltk.tokenizer.word_tokenize()

Last Updated : 12 Jun, 2019

With the help of nltk.tokenize.word_tokenize() method, we are able to extract the tokens from string of characters by using tokenize.word_tokenize() method. It actually returns the syllables from a single word. A single word can contain one or two syllables.

Syntax : tokenize.word_tokenize()
Return : Return the list of syllables of words.

Example #1 :
In this example we can see that by using tokenize.word_tokenize() method, we are able to extract the syllables from stream of words or sentences.

# import SyllableTokenizer() method from nltk 
from nltk import word_tokenize 
     
# Create a reference variable for Class word_tokenize 
tk = SyllableTokenizer() 
     
# Create a string input 
gfg = "Antidisestablishmentarianism"
     
# Use tokenize method 
geek = tk.tokenize(gfg) 
     
print(geek) 

Output :

[‘An’, ‘ti’, ‘dis’, ‘es’, ‘ta’, ‘blish’, ‘men’, ‘ta’, ‘ria’, ‘nism’]

Example #2 :

# import SyllableTokenizer() method from nltk 
from nltk.tokenize import word_tokenize 
     
# Create a reference variable for Class word_tokenize 
tk = SyllableTokenizer() 
     
# Create a string input 
gfg = "Gametophyte"
     
# Use tokenize method 
geek = tk.tokenize(gfg) 
     
print(geek) 

Output :

[‘Ga’, ‘me’, ‘to’, ‘phy’, ‘te’]

J

Jitender_1998

Improve

Python NLTK | nltk.tokenize.TabTokenizer()

Similar Reads

Python NLTK | nltk.tokenize.TabTokenizer()

With the help of nltk.tokenize.TabTokenizer() method, we are able to extract the tokens from string of words on the basis of tabs between them by using tokenize.TabTokenizer() method. Syntax : tokenize.TabTokenizer() Return : Return the tokens of words. Example #1 : In this example we can see that by using tokenize.TabTokenizer() method, we are abl

Python NLTK | nltk.tokenize.SpaceTokenizer()

With the help of nltk.tokenize.SpaceTokenizer() method, we are able to extract the tokens from string of words on the basis of space between them by using tokenize.SpaceTokenizer() method. Syntax : tokenize.SpaceTokenizer() Return : Return the tokens of words. Example #1 : In this example we can see that by using tokenize.SpaceTokenizer() method, w

Python NLTK | nltk.tokenize.StanfordTokenizer()

With the help of nltk.tokenize.StanfordTokenizer() method, we are able to extract the tokens from string of characters or numbers by using tokenize.StanfordTokenizer() method. It follows stanford standard for generating tokens. Syntax : tokenize.StanfordTokenizer() Return : Return the tokens from a string of characters or numbers. Example #1 : In t

Python NLTK | nltk.tokenize.mwe()

With the help of NLTK nltk.tokenize.mwe() method, we can tokenize the audio stream into multi_word expression token which helps to bind the tokens with underscore by using nltk.tokenize.mwe() method. Remember it is case sensitive. Syntax : MWETokenizer.tokenize() Return : Return bind tokens as one if declared before. Example #1 : In this example we

Python NLTK | nltk.WhitespaceTokenizer

With the help of nltk.tokenize.WhitespaceTokenizer() method, we are able to extract the tokens from string of words or sentences without whitespaces, new line and tabs by using tokenize.WhitespaceTokenizer() method. Syntax : tokenize.WhitespaceTokenizer() Return : Return the tokens from a string Example #1 : In this example we can see that by using

Python NLTK | nltk.tokenize.LineTokenizer

With the help of nltk.tokenize.LineTokenizer() method, we are able to extract the tokens from string of sentences in the form of single line by using tokenize.LineTokenizer() method. Syntax : tokenize.LineTokenizer() Return : Return the tokens of line from stream of sentences. Example #1 : In this example we can see that by using tokenize.LineToken

Python NLTK | nltk.tokenize.SExprTokenizer()

With the help of nltk.tokenize.SExprTokenizer() method, we are able to extract the tokens from string of characters or numbers by using tokenize.SExprTokenizer() method. It actually looking for proper brackets to make tokens. Syntax : tokenize.SExprTokenizer() Return : Return the tokens from a string of characters or numbers. Example #1 : In this e

Python | NLTK nltk.tokenize.ConditionalFreqDist()

With the help of nltk.tokenize.ConditionalFreqDist() method, we are able to count the frequency of words in a sentence by using tokenize.ConditionalFreqDist() method. Syntax : tokenize.ConditionalFreqDist() Return : Return the frequency distribution of words in a dictionary. Example #1 : In this example we can see that by using tokenize.Conditional

Python NLTK | nltk.TweetTokenizer()

With the help of NLTK nltk.TweetTokenizer() method, we are able to convert the stream of words into small tokens so that we can analyse the audio stream with the help of nltk.TweetTokenizer() method. Syntax : nltk.TweetTokenizer() Return : Return the stream of token Example #1 : In this example when we pass audio stream in the form of string it wil

NLP | Training a tokenizer and filtering stopwords in a sentence

Why do we need to train a sentence tokenizer? In NLTK, default sentence tokenizer works for the general purpose and it works very well. But there are chances that it won't work best for some kind of text as that text may use nonstandard punctuation or maybe it is having a unique format. So, to handle such cases, training sentence tokenizer can resu

Article Tags :

Practice Tags :

python