Tutorial: Improve text visualization in Python with Wordcloud

If your data contains text and you want to have interesting visualization out of your data. Instead of showing it in graphs (bar/line) which are not strongly associated with the idea of text analysis. It is way better and interesting to apply Wordcloud.

Thuong Nguyen

--

Image by Dallas Reedy on unsplash.com

This tutorial is here to give you a simple and easy way to visualize your text data using Wordcloud in Python.

Firstly, this is a list of tools and packages you need:

from wordcloud import WordCloud
from PIL import Image
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

Why we need such packages?

Wordcloud: to generate from the text the image in the form of wordcloud.

PIL .. image: your Python now can open and read the file in the form of an image (.jpg, or .png).

Numpy and pandas are there as we need to use them to process our data.

Matplotlib for visualization display.

Our data

You have dataset and load it to pandas data frame. I take 1 example which is the dataset from Kaggle. Our data will look like in the below snapshot. Column ‘text’ is where we want to visualize.

Image by author

Step1: Capture and clean necessary string

It is important to clean our data in this case, so I use regex and string methods to clean content of column ‘text’ belongs to pandas dataframe ‘data_sen’.

(replace to replace, strip to strip of empty space at the begin and end of string, title to uppercase the first letter of each word)

Then, we transform ‘li’ pandas series to a list type ‘l’

Finally, use ‘join’ method to have final text which is the combination of all clean text in the columns.

li=data_sen.text.str.replace(r'#.+$',
'').str.replace(r'@[a-zA-Z0-9]+',
'').str.replace('https',
'').str.strip().str.title()
l=li.to_list()
te=' '.join(l)

Step 2: Create a mask (optional)

If you want your visualization look fancy and creative, I recommend to apply a mask. A mask is like a frame taken from an image/photo where you want your text to appear within a shape.

Since our example data is about Tweet, I take the logo photo of Twister as a mask.

mask1 = np.array(Image.open("image.jpg"))
Twitter’s logo for making mask

Photo to satisfy making a mask is: white background, or some white area. The color area will be undertood by Python as value area to fill the Wordcloud in.

Step 3: Create image of Wordcloud

Simply pass the code like this:

w=WordCloud(background_color="white",mask=mask1,width=mask1.shape[1],height=mask1.shape[0]).generate(te)

Step 4: Display the visualization

From step 3, we already create an image of Wordcloud named ‘w’, but it haven’t yet in a form to display. We need to use Matplotlib to display. The code to do the job is as below:

plt.figure(figsize=(20,10))
plt.imshow(w,interpolation="bilinear")
plt.axis('off')
plt.margins(x=0,y=0)
plt.show()

So now you will have a result like in this photo. Ting ting! 🌟

Image by author

P/S: If you like my tutorial, just react. Your interaction will be a motivation and orientation for me to write more topics like this. Thanks 💗

--

--

Thuong Nguyen

I am Data Analyst and interested in logical thinking, technology & programming.