The great-looking cousin of stacked space charts
Despite the fact that it’d look intimidating, stream graphs might be fairly easy to attract.
They’re nearly similar to stacked space charts however and not using a mounted axis on the backside. This small change can cut back the steep angles giving the chart extra fluidity, often at the price of accuracy.
Nonetheless, this visualization is superb, displaying patterns, tendencies, and adjustments within the composition. General, the distinctive type of Stream graphs makes them extra interesting and, for my part, extra aesthetically pleasing.
This text will undergo the fundamentals of plotting stream graphs with Matplotlib, from line charts and space charts to streams.
Establishing
We’ll want Pandas, Matplotlib, and a dummy dataset for the next examples. Our knowledge body has a column with the yr, one with a worth, and the final one has a class.
import matplotlib.pyplot as plt
import pandas as pdurl = 'https://gist.githubusercontent.com/Thiagobc23/6cbe0f2ae9fe39032b6c3e623817a8ff/uncooked/4ad9f5eb8998f2d548b371b7c5e91f36098b87b0/dummy_stack.csv'
df = pd.read_csv(url)
channels = df.Channel.distinctive()df.groupby('Channel').describe()
Strains
We’ll begin easy with a line chart, iterating by way of a listing of classes and plotting every line individually.
fig, ax = plt.subplots(1, figsize=(16,4))for c in channels:
temp = df[df['Channel'] == c]
plt.plot(temp.Date, temp.Worth)plt.legend(channels)
plt.present()
The road chart just isn’t dangerous for visualizing this knowledge; it provides us a reasonably good thought of what is taking place. However we will not evaluate the entire values right here, and that is the essential distinction between line charts and stacked space or Stream graphs.
Areas
Earlier than we transfer to Stream graphs, we’ll draw an space chart. To plot it, we’ll combination every line’s worth and use it as a baseline for the next line.
fig, ax = plt.subplots(1, figsize=(16,4))stack = 0
for c in channels:
temp = df[df['Channel'] == c]
stack = stack + temp['Value'].values
plt.plot(temp.Date, stack)plt.legend(channels)
plt.present()
Now we will fill the realm of every class with colour; for that, we’ll use the fill_between
perform.
The perform requires three parameters, x, y1, and y2, the place y1 defines the underside of the realm and y2 the highest.
fig, ax = plt.subplots(1, figsize=(16,4))stack = 0
for c in channels:
temp = df[df['Channel'] == c]
stack = stack + temp['Value'].values
plt.fill_between(temp.Date, stack - temp['Value'].values, stack)plt.legend(channels)
plt.present()
Streams
There are completely different approaches to drawing a Stream Graph. We’ll maintain it easy and make our baseline exactly on the heart.
To search out the middle, we’d like the sum of all classes, by day, divided by 2.
modify = df.groupby('Date').sum().Worth.values / 2fig, ax = plt.subplots(1, figsize=(16,4))stack = 0
for c in channels:
temp = df[df['Channel'] == c]
stack = stack + temp['Value'].values
plt.fill_between(temp.Date, stack - temp['Value'].values - modify, stack-adjust)plt.legend(channels)
plt.present()
There it’s, our Stream Graph.
Going additional
Despite the fact that it is fairly easy to attract a Stream Graph, it may be laborious to make a very good one. Many elements can change how the info is perceived.
For instance, this paper explores completely different algorithms and methods of defining the baseline and the order of the classes.
The next and final instance explores a distinct form of the dataset, some interpolation, and a slight adjustment to the baseline.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.interpolate import pchipurl = 'https://gist.githubusercontent.com/Thiagobc23/6cbe0f2ae9fe39032b6c3e623817a8ff/uncooked/4ad9f5eb8998f2d548b371b7c5e91f36098b87b0/dummy_stack.csv'
df = pd.read_csv(url)
channels = df.Channel.distinctive()
yrs = df.Date.distinctive()# reshape the df, one column for every class / "Channel"
df = pd.pivot_table(df, values='Worth', index='Date',
columns='Channel', aggfunc=np.sum).reset_index()# create the df with the brand new x
df2 = pd.DataFrame(np.linspace(0, len(df)-1, num=len(df)*4))
df2.columns = ['x']# interpolate every line
for c in channels:
x = np.arange(0,len(df))
y = df[c]
f2 = pchip(x, y)xnew = np.linspace(0, len(df)-1, num=len(df)*4)
######### fig, ax = plt.subplots(1, figsize=(160,40), facecolor='#2C2C2C')
df2[c] = f2(xnew)
ax.set_facecolor('#2C2C2C')colours = ['#DB504A', '#FC9F5B', '#7DCFB6', '#FFF275', '#63B0CD']# get the middle worth for every date
modify = df2[channels].sum(axis=1)/2
# modify the modify :)
modify = modify * 0.3stack = np.zeros(len(df2))
for i, c in enumerate(channels):
# y1 is the underside line of every class
y1 = stack-adjust# plot
plt.fill_between(df2.x, y1, y1+df2[c], edgecolor='w', lw=0, colour=colours[i])# stack is the cummulative worth of the underside classes
# ticks
stack = stack + df2[c]
plt.xticks(np.arange(0, len(yrs)), yrs, colour='lightgrey', fontsize=80)
plt.yticks([])# take away spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)# grid
ax.set_axisbelow(True)
ax.xaxis.grid(colour='lightgrey', linestyle='dashed', alpha=1, lw=5)#plt.legend(channels)
plt.savefig('stream.svg')
Conclusion
Stream graphs might be fairly forgiving since we’re not targeted on correct knowledge illustration, however that does not imply we needs to be careless.
We have to be cautious when designing and presenting Stream graphs. Good practices corresponding to explicitly informing our viewers concerning the visible’s focus and what questions the chart supposed to reply can go a good distance.
General, I imagine the chart has two decisive benefits: It is distinctive sufficient to seize individuals’s consideration and correct sufficient to reveal a easy sample or development, particularly if we annotate the values and be clear about what we’re attempting to point out.