The Linux Rain Linux General/Gaming News, Reviews and Tutorials

Python for Data Science: Data Visualization

By Kalyani Rajalingham, published 01/02/2021 in Tutorials


Python can be used to generate from simple to very complex graphs. In this segment, we’ll learn how to graph using python.

Simple Linear Plot

The first graph we should learn how to plot is a simple linear plot. Suppose that we have the following:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]

plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("y")
plt.title(“Line Graph”)
plt.show()

In this case, plt(x, y) defines the x and y to plot. Xlabel and ylabel are used to label the axes. Plt.title() is used to insert a title. Plt.show() is used to show the graph - without this last component, the graph will not show up.

Two Lines

In this case, we wish to graph two lines onto one graph. In this case, the only way for python to know which graph is which is by using the “label” tag to add “Line 1” and “Line 2”.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [10, 20, 30, 40, 50]
y2 = [25, 36, 45, 55, 66]

line1 = plt.plot(x, y1, marker='o', label='Line 1')
line2 = plt.plot(x, y2, marker='o', label='Line 2')
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("Line Graphs")
plt.legend()
plt.show()

To add a legend for two or more lines, the “label” tag (for example, label= “Line 1”) is absolutely necessary.

FacetGrid

In this instance, we’ll use a dataset that is built into python. First, let’s import what we need:

import matplotlib.pyplot as plt
import seaborn as sns

Next, let’s load the dataset we want:

data = sns.load_dataset("tips")

The following is a sample of the “tips” dataset (the first 5 data).

However, you can get choose another dataset by typing the following:

print(sns.get_dataset_names())

Now, we need to create the templates. Here, we must first specify the dataset that we will use, then the row tag, and the column tag. This will generate four blank graphs.

graph = sns.FacetGrid(data, row="sex", col="time")

Now, let’s choose to add a scatter plot to the empty templates. Here, using the map function, we ask that a scatter plot be drawn with the x-axis as “total_bill” and y-axis as “tip”.

graph = graph.map(plt.scatter, 'total_bill', 'tip')

plt.show()

Joint Plot

In a joint plot, you have two plots on one graph.

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("tips")
sns.jointplot(x="tip", y="total_bill", data=graph, kind="reg")
plt.show()

With the data tag, we specify the dataset, and the kind tag, we have asked for a regression (however, you can specify others).

JointGrid

In this particular graph, I’m going to join two different graphs into one. Using .plot_joint(sns.kdeplot, fill=True), we have asked python that we want a kdeplot as the main plot. Using .plot_marginals(sns.boxplot), we have added boxplots on the margins.

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("tips")
matrix = sns.JointGrid(data=graph, x="total_bill", y="tip")
matrix = matrix.plot_joint(sns.kdeplot, fill=True)
matrix = matrix.plot_marginals(sns.boxplot)
plt.show()

Rel Plot

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="sex", size="time",data=graph)
plt.show()

Regression Plot

As the name suggests, in this plot, we can plot regression plots.

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("tips")
sns.lmplot(x="tip", y="total_bill", col="time", row="sex", data=graph)
plt.show()

Pair Plot

In a pair plot, you get a matrix of graphs. The hue tag allows us to separate categorical data; in this case, the data points are coloured orange and blue based on sex.

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("tips")

sns.pairplot(graph, hue=”sex”)

plt.show()

PairGrid

Pairgrid gives you a lot more control over the plots that you see in a Pairplot. For example:

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("penguins")
matrix = sns.PairGrid(graph, hue='sex')
matrix.map_diag(plt.hist)
matrix.map_upper(plt.scatter)
matrix.map_lower(sns.kdeplot)
plt.show()

In this case, we use the map_diag, map_upper, and map_lower to specify the type of graphs we want in the upper, lower and diagonal sections of the graph. In this case, we have asked python to plot histograms on the diagonal, scatterplots on the upper right section, and kdeplots on the lower left section.

HeatMap

In a heatmap, data is displayed based on a correlation matrix. As such, the first thing to do is to generate the correlation matrix using .corr(). Once the matrix has been generated, you just plot it. In this case, the annot tag will add numbers onto the graph.

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("tips")
matrix = graph.corr()
sns.heatmap(matrix, annot=True)
plt.show()

Alternatively, one can also do the following:

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("flights")

matrix = graph.pivot_table(index="year", columns="month", values="passengers")

sns.heatmap(matrix)
plt.show()

ClusterMap

In a clustermap, similarity between samples is used to re-order the heatmap.

import matplotlib.pyplot as plt
import seaborn as sns

graph = sns.load_dataset("iris")
matrix = graph.pop("species")
sns.clustermap(graph)
plt.show()

Happy graphing!



About the author

Kalyani Rajalingham (from Sri Lanka, lives in Canada), and is a Linux and code lover.

Tags: data-science python graphs data-visualization tutorials
blog comments powered by Disqus