Creating a Word Cloud With Python
Introduction
In Data Visualization, word clouds are used to display textual data in a specific shape. The more frequent a word is in a chunk of text, the larger it will appear in the word cloud. This can provide interesting insights in datasets, such as determining which metatags have the most prominence on a web page. Other uses for word clouds include displaying what topics are usually covered in speeches and excerpts.
In this article, we will learn how to create, mask, and display the following word cloud in Python:
It’s a word cloud of the late and great David Bowie filled with words from his songs with “star” in the lyrics!
We will use the following libraries:
- The wordcloud module will generate our word cloud using a text file.
- NumPy will be used to mask, or shape, an image to be applied to the word cloud.
- Pillow, a fork of Python Imaging Library (PIL), will be used for making a copy of the original image to work with.
- Modules from the
os
library, including theos.path
module, will allow us to access the folders and files in our project’s directory. - Matplotlib will ultimately be used to display our new word cloud.
Note: With the exception of the
os
library, the rest will need to be installed to complete this tutorial. Using pip version 3, the following command be run to install them:pip3 install numpy pillow wordcloud matplotlib
This tutorial assumes familiarity with the Python programming language, including importing libraries and using some of Python’s built-in functions.
Let’s get started with the first step!
Step 1: Project setup
To begin, we will create the root directory for our project. After launching the terminal or command prompt, we’ll run the following commands to create a new my_wordcloud
directory and change to it:
mkdir my_wordcloudcd my_wordcloud
We’re now working from the newly created my_wordcloud
directory. Next, let’s create a new file called my_wordcloud.py
:
touch my_wordcloud.pyls
Running the ls
command confirms we successfully created the my_wordcloud.py
file in the correct directory:
my_wordcloud.py
Note: The file that will ultimately import the
wordcloud
library must not be namedwordcloud.py
. Otherwise, when executing the file, errors will be thrown due to circular imports.
For this project, we’ll need to have access to the file path of our current working directory while creating the word cloud. For this step, let’s open the my_wordcloud.py
file and write the following:
import oscurrent_directory = os.path.dirname(__file__)
In the snippet above, we passed the __file__
path attribute to the os.path.dirname()
method to return the full path string of the parent directory my_wordcloud
.
Let’s save the file and proceed to the next step.
Step 2: Find an image to mask with
Finding a suitable image for the word cloud can be challenging at first. There are a few criteria to keep in mind when searching for one:
- The image must have a white background (
#ffffff
); anything off-white or transparent will be populated by words for the word cloud. - The shape(s) within the image should be well-defined and composed of mostly non-white colors.
- Although this tutorial uses
.png
images, as long as the image is small, a.jpg
image file can be used instead.
Googling for “word cloud mask images” will yield some promising results. For this tutorial, we are going to use the following image:
After downloading the image, let’s rename it to bowie.png
and save it in our my_wordcloud
parent directory.
Next, let’s assign the full path of our bowie.png
file to a variable called bowie_image_path
. Then, we will import the Image
module from the PIL
library to create an object representation of our image with the file path:
import osfrom PIL import Image…bowie_image_path = os.path.join(current_directory, "bowie.png")bowie_image = Image.open(bowie_image_path)
Note: The
Image.open()
method is not to be confused with Python’s built-inopen()
function.
With our image now stored inside the variable bowie_image
, let’s save the my_wordcloud.py
and move to the next step.
Step 3: Create mask with image path
For this step, we will import and use NumPy to create the “mask” for our image. By this, we mean we are going to shape out the part of the image we’d like to generate a word cloud from.
We will create an ndarray of pixel values that indicate which ones should not be included in the mask (255 represents “white”) and which ones should be included; all other colors are fair game. Using the bowie_image
we made in the last step, let’s add the following to the my_wordcloud.py
file:
import osfrom PIL import Imageimport numpy as np…bowie_mask = np.array(bowie_image)print(bowie_mask)
Here, we imported the numpy
library and can access its methods with the np
alias. Then, we invoked the np.array()
method to create a new ndarray
called bowie_mask
.
Let’s run our program thus far with the python3 my_wordcloud.py
command. The print() statement will check that our bowie_mask
array was created correctly. We should see (mostly) values of 255, confirming that our image had a proper white background and the mask can be applied to our word cloud.
Let’s remove the print()
statement, save the my_wordcloud.py
file, and head to the next step.
Step 4: Find text to generate
In this step, we are going to find some text to use for generating our word cloud. All we need to generate a word cloud is a single string of text. The string can be locally defined in the same .py
file or it can be saved to an external file and accessed from there.
For this tutorial, we will use a text (.txt
) file filled with lyrics from Bowie songs featuring the word “star.” They can be accessed here.
Let’s create a new file called bowie_star_lyrics.txt
, copy/paste the lyrics into it, and save the file.
Then, inside the my_wordcloud.py
file, let’s add the following code:
with open(os.path.join(current_directory, "bowie_star_lyrics.txt")) as f:lyrics = f.read()
We did the following in the snippet above:
- We used the
os.path.join()
method to return the full path string for ourbowie_star_lyrics.txt
file. - The built-in open() function created an object representation of our file with the path string.
- Inside the
with
block, the contents of the file were accessed with the .read() method and assigned to a variable calledlyrics
. - Lastly, the file was closed after the program exited the
with
block.
Let’s save our my_wordcloud.py
file once more and go to the next step.
Step 5: Create and generate word cloud
It is now time to begin building the word cloud! To start, we will import the following from the wordcloud
library like so:
import osfrom PIL import Imageimport numpy as npfrom wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
Note:
STOPWORDS
andWordCloud
are both case-sensitive.
The WordCloud
class contains all of the methods we need for generating the word cloud. The STOPWORDS
property filters out superfluous filler words like “an”, “and”, “the”, and “they.” More words can be added with the .add()
method if desired. The ImageColorGenerator
will be utilized later on in this step.
Let’s now create an instance of the WordCloud
class and assign to a variable with the following code:
from wordcloud import WordCloud, ImageColorGenerator…wordcloud = WordCloud(background_color="white", mask=bowie_mask, collocations=False, stopwords=STOPWORDS, contour_color="white", contour_width=1)
We did the following in the snippet above:
- The
bowie_mask
we previously made was assigned to themask
parameter to “shape” our word cloud. - We also assigned the
STOPWORDS
property to the (lowercase)stopwords
parameter. - The default
background_color
of a word cloud is black, so we changed it to"white"
. - The
collocations
parameter was set toFalse
to break up word combinations and better distribute the text throughout the word cloud. - We outlined our word cloud with the
contour_color
andcontour_width
parameters.
Next, we will invoke the .generate()
with our wordcloud
variable, passing in the lyrics
string we defined earlier in this tutorial:
wordcloud = WordCloud(background_color="white", mask=bowie_mask, collocations=False, stopwords=STOPWORDS, contour_color="white", contour_width=1)wordcloud.generate(lyrics)print(wordcloud)
This will populate our word cloud with the words read from the bowie_star_lyrics.txt
file, without the STOPWORDS
.
If we save the file, run python3 my_wordcloud.py
, and confirm our wordcloud
variable is stored in memory, then the word cloud was created and generated correctly.
<wordcloud.wordcloud.WordCloud object at 0x7fb404433be0>
After confirming our wordcloud
was created, let’s remove the print()
statement.
The last matter we’ll cover in this step is attaching a color generator to our word cloud. Let’s add the following to the my_wordcloud.py
file:
wordcloud.generate(lyrics)image_colors = ImageColorGenerator(bowie_mask)
Note: An RGB image must be passed to the
ImageColorGenerator
constructor.
The image_colors
object maps the color(s) of the generated words to the original image as closely as possible.
save the my_wordcloud.py
file, and head to the final step.
Step 6: Show word cloud on pyplot figure
The last library we will use for this tutorial will be Matplotlib. The methods from the library’s pyplot
interface will display our new word cloud. Let’s first import matplotlib.pyplot
at the top of our my_wordcloud.py
file:
import osfrom PIL import Imageimport numpy as npfrom wordcloud import WordCloud, ImageColorGeneratorimport matplotlib.pyplot as plt
Like with NumPy, we use the alias plt
as a shorthand for tools we imported. Let’s now add the following to our file:
plt.imshow(wordcloud, interpolation="bilinear")plt.axis("off")plt.show()
In the snippet above, we used the following methods:
- The
.imshow()
method creates and draws the figure on which ourwordcloud
is placed; the"bilinear"
setting helps smooth out the image. - We turned off the labels of the x- and y-axes with
.axis("off")
. - Lastly, the figure with our word cloud was displayed with the
.show()
method.
Let’s save our my_wordcloud.py
file and run it with python3 my_wordcloud.py
. The following figure window should appear on our screen:
The word cloud figure should render onto the screen within a few seconds after running the file. The generated lyrics should be shaped like the Bowie image.
The last task in this tutorial is to apply coloring to the word cloud with the image_colors
object. Let’s close the figure window and add the following change to the plt.imshow()
method:
plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation="bilinear")plt.axis("off")plt.show()
The .recolor()
method will display our wordcloud
with the same colors as the original image. If we save our my_wordcloud.md
file and re-run, we should see the following figure:
Conclusion
We have successfully rendered our first word cloud! All in all, we were able to generate a word cloud with less than 40 lines of code.
In this tutorial, we learned how to utilize various Python libraries and modules to create, mask, generate, and display a word cloud.
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Getting Started with Image Processing in Python using Pillow
Learn how to use the Pillow library in Python - Article
Create and View a Web Page on Your Computer
If you've completed many Codecademy courses, but still find yourself asking, "Where can I write code on my own computer?", then start here!