Wednesday, July 27, 2022
HomeData ScienceStream Graphs Fundamentals with Python's Matplotlib | by Thiago Carvalho | Jul,...

Stream Graphs Fundamentals with Python’s Matplotlib | by Thiago Carvalho | Jul, 2022


The great-looking cousin of stacked space charts

Stream Graph — Picture by Writer

Despite the fact that it’d look intimidating, stream graphs might be fairly easy to attract.

They’re nearly similar to stacked space charts however and not using a mounted axis on the backside. This small change can cut back the steep angles giving the chart extra fluidity, often at the price of accuracy.

Stream Graph — Picture by Writer

Nonetheless, this visualization is superb, displaying patterns, tendencies, and adjustments within the composition. General, the distinctive type of Stream graphs makes them extra interesting and, for my part, extra aesthetically pleasing.

This text will undergo the fundamentals of plotting stream graphs with Matplotlib, from line charts and space charts to streams.

Establishing

We’ll want Pandas, Matplotlib, and a dummy dataset for the next examples. Our knowledge body has a column with the yr, one with a worth, and the final one has a class.

import matplotlib.pyplot as plt
import pandas as pd
url = 'https://gist.githubusercontent.com/Thiagobc23/6cbe0f2ae9fe39032b6c3e623817a8ff/uncooked/4ad9f5eb8998f2d548b371b7c5e91f36098b87b0/dummy_stack.csv'
df = pd.read_csv(url)
channels = df.Channel.distinctive()
df.groupby('Channel').describe()
Information body — Picture by writer

Strains

We’ll begin easy with a line chart, iterating by way of a listing of classes and plotting every line individually.

fig, ax = plt.subplots(1, figsize=(16,4))for c in channels:
temp = df[df['Channel'] == c]
plt.plot(temp.Date, temp.Worth)
plt.legend(channels)
plt.present()
Line chart— Picture by Writer

The road chart just isn’t dangerous for visualizing this knowledge; it provides us a reasonably good thought of what is taking place. However we will not evaluate the entire values right here, and that is the essential distinction between line charts and stacked space or Stream graphs.

Areas

Earlier than we transfer to Stream graphs, we’ll draw an space chart. To plot it, we’ll combination every line’s worth and use it as a baseline for the next line.

fig, ax = plt.subplots(1, figsize=(16,4))stack = 0
for c in channels:
temp = df[df['Channel'] == c]
stack = stack + temp['Value'].values
plt.plot(temp.Date, stack)
plt.legend(channels)
plt.present()
Unfilled Space Graph — Picture by Writer

Now we will fill the realm of every class with colour; for that, we’ll use the fill_between perform.

The perform requires three parameters, x, y1, and y2, the place y1 defines the underside of the realm and y2 the highest.

fig, ax = plt.subplots(1, figsize=(16,4))stack = 0
for c in channels:
temp = df[df['Channel'] == c]
stack = stack + temp['Value'].values
plt.fill_between(temp.Date, stack - temp['Value'].values, stack)
plt.legend(channels)
plt.present()
Stacked Space Chart — Picture by writer

Streams

There are completely different approaches to drawing a Stream Graph. We’ll maintain it easy and make our baseline exactly on the heart.

To search out the middle, we’d like the sum of all classes, by day, divided by 2.

modify = df.groupby('Date').sum().Worth.values / 2fig, ax = plt.subplots(1, figsize=(16,4))stack = 0
for c in channels:
temp = df[df['Channel'] == c]
stack = stack + temp['Value'].values
plt.fill_between(temp.Date, stack - temp['Value'].values - modify, stack-adjust)
plt.legend(channels)
plt.present()
Stream Graph — Picture by Writer

There it’s, our Stream Graph.

Going additional

Despite the fact that it is fairly easy to attract a Stream Graph, it may be laborious to make a very good one. Many elements can change how the info is perceived.

For instance, this paper explores completely different algorithms and methods of defining the baseline and the order of the classes.

Supply: Chuan Bu, Quanjie Zhang, Qianwen Wang, Jian Zhang, Michael Sedlmair, Oliver Deussen, Yunhai Wang— SineStream: Bettering the Readability of Streamgraphs by Minimizing Sine Phantasm Results IEEE Transactions on Visualization and Laptop Graphics (Proc. InfoVis 2020), 2021. http://www.yunhaiwang.web/infoVis2020/sinestream/index.html

The next and final instance explores a distinct form of the dataset, some interpolation, and a slight adjustment to the baseline.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.interpolate import pchip
url = 'https://gist.githubusercontent.com/Thiagobc23/6cbe0f2ae9fe39032b6c3e623817a8ff/uncooked/4ad9f5eb8998f2d548b371b7c5e91f36098b87b0/dummy_stack.csv'
df = pd.read_csv(url)
channels = df.Channel.distinctive()
yrs = df.Date.distinctive()
# reshape the df, one column for every class / "Channel"
df = pd.pivot_table(df, values='Worth', index='Date',
columns='Channel', aggfunc=np.sum).reset_index()
# create the df with the brand new x
df2 = pd.DataFrame(np.linspace(0, len(df)-1, num=len(df)*4))
df2.columns = ['x']
# interpolate every line
for c in channels:
x = np.arange(0,len(df))
y = df[c]
f2 = pchip(x, y)

xnew = np.linspace(0, len(df)-1, num=len(df)*4)
df2[c] = f2(xnew)

######### fig, ax = plt.subplots(1, figsize=(160,40), facecolor='#2C2C2C')
ax.set_facecolor('#2C2C2C')
colours = ['#DB504A', '#FC9F5B', '#7DCFB6', '#FFF275', '#63B0CD']# get the middle worth for every date
modify = df2[channels].sum(axis=1)/2
# modify the modify :)
modify = modify * 0.3
stack = np.zeros(len(df2))
for i, c in enumerate(channels):
# y1 is the underside line of every class
y1 = stack-adjust

# plot
plt.fill_between(df2.x, y1, y1+df2[c], edgecolor='w', lw=0, colour=colours[i])

# stack is the cummulative worth of the underside classes
stack = stack + df2[c]

# ticks
plt.xticks(np.arange(0, len(yrs)), yrs, colour='lightgrey', fontsize=80)
plt.yticks([])
# take away spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
# grid
ax.set_axisbelow(True)
ax.xaxis.grid(colour='lightgrey', linestyle='dashed', alpha=1, lw=5)
#plt.legend(channels)
plt.savefig('stream.svg')
*Title and annotations added after exporting the graph.

Conclusion

Stream graphs might be fairly forgiving since we’re not targeted on correct knowledge illustration, however that does not imply we needs to be careless.

We have to be cautious when designing and presenting Stream graphs. Good practices corresponding to explicitly informing our viewers concerning the visible’s focus and what questions the chart supposed to reply can go a good distance.

General, I imagine the chart has two decisive benefits: It is distinctive sufficient to seize individuals’s consideration and correct sufficient to reveal a easy sample or development, particularly if we annotate the values and be clear about what we’re attempting to point out.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments