Utilizing Python to visualise the adjustments in rank over time
Rating information is ordering information place in a numerically ordered sequence. That is a straightforward method to talk the knowledge because it helps the reader effortlessly perceive the sequence. The rating is a good suggestion for dealing with a number of observations or categorical information.
Nonetheless, issues change on a regular basis. As time go, the place in rating might be consistently altered. Visualizing positions of the ranks throughout a interval helps notify the change and progress.
This text will information you with some concepts to visualise the adjustments in rank over time.
Let’s get began
Get information
To point out that the tactic talked about right here might be utilized to real-world datasets, I’ll use the ‘Air Air pollution in Seoul’ dataset from Kaggle (hyperlink). The info was offered by the Seoul Metropolitan Authorities (hyperlink). The info is used beneath the phrases of the Inventive Commons License CC-BY.
The dataset consists of the air air pollution information: SO2, NO2, CO, O3, PM10, and PM2.5 recorded between 2017 and 2019 from 25 districts in Seoul, South Korea.
On this article, we are going to work with Carbon monoxide (CO), a standard air pollutant that’s dangerous to people. The measurement unit is part-per-million (ppm).
Import Information
After downloading the dataset, begin with import libraries.
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import seaborn as sns%matplotlib inline
Use Pandas to learn ‘Measurement_summary.csv’
df = pd.read_csv('<file location>/Measurement_summary.csv')
df.head()
Discover information
Exploring the dataset as step one is at all times a good suggestion. Luckily, the outcome under reveals that we would not have to cope with lacking values.
df.data()
Let’s take a look at the entire variety of the variable ‘Station code.’
df['Station code'].nunique()## output
## 25
There are 25 districts in whole.
set(df['Station code'])## output
## {101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114,
## 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125}
Choose and put together information
For instance, I’ll choose Station codes 111-118. If you wish to plot different station numbers, be at liberty to switch the code under.
list_stations = [111, 112, 113, 114, 115, 116, 117, 118]
df_select = df[df['Station code'].isin(list_stations)]
df_select.head()
The retrieved dataset just isn’t able to be plotted. Some columns are wanted to be created or modified earlier than use.
## crete year_month, 12 months and month columns
year_month = [i[0:7] for i in record(df_select['Measurement date'])]
df_select['year_month'] = year_month
df_select['year'] = [i[0:4] for i in year_month]
df_select['month'] = [i[-2:] for i in year_month]## create district identify column
district = [i.split(', ')[2] for i in df_select['Address']]
df_select['District'] = district
## change Station code column sort
df_select = df_select.astype({'Station code': str})
## groupby with location and level of time
df_month = df_select.groupby(['Station code','District',
'year_month','year','month']).imply()
df_month.reset_index(inplace=True)
df_month.head()
Right here comes an essential step. The principle concept of this text is to create visualizations for rating information. Subsequent, we are going to create a column for rating the districts’ CO quantity (ppm) throughout every time level.
maintain = []
for i in record(set(df_month['year_month'])):
df = df_month[df_month['year_month']==i]
order = df['CO'].rank(ascending=0)
df['rank'] = [int(i) for i in order]
maintain.append(df)df_month = pd.concat(maintain)
df_month.sort_values(['year_month', 'Station code'], ascending=True,
inplace=True, ignore_index=True)
df_month.head()
Earlier than persevering with, we are going to outline a dictionary of colours to facilitate the plotting course of.
#extract coloration palette, the palette might be modified
list_dist = record(set(df_select['District']))
pal = record(sns.color_palette(palette='Spectral',
n_colors=len(list_dist)).as_hex())
dict_color = dict(zip(list_dist, pal))
Information visualization
This text intends to information with some visualization concepts for rating information over time. Thus, the obtained outcome ought to be simple to know whereas permitting the reader to check the information ranks between totally different cut-off dates.
One thing is required to be clarified earlier than persevering with. Every graph has its execs and cons. After all, nothing is ideal. Some concepts introduced right here could also be only for an attention grabbing impact. However all of them have the identical function of displaying the adjustments in information ranks over time.
The charts on this article might be categorized into two teams: animations and charts.
Animation
Apart from being a good suggestion to catch consideration, animation can simply present the adjustments in rank over time.
1. Evaluating bar top with an Animated bar chart
Plotly is a helpful graphing library for making interactive and animated graphs. The idea of making use of an animated bar chart is to repair every district’s place. Every bar will likely be annotated with the rating quantity. By doing this, the quantity of CO might be in contrast over time.
import plotly.categorical as px
fig = px.bar(df_month, x='District', y='CO',
coloration='District', textual content='rank',
color_discrete_map= dict_color,
animation_frame='year_month',
animation_group='Station code',
range_y=[0,1.2],
labels={ 'CO': 'CO (ppm)'},
)
fig.update_layout(width=1000, top=600, showlegend=False,
xaxis = dict(tickmode = 'linear', dtick = 1))
fig.update_traces(textfont_size=16, textangle=0)
fig.present()
Voila!!
The hooked up outcome above might look quick since that is simply an instance of the end result. Don’t be concerned; there’s a pause button to pause and a button to pick out a particular time level.
2. Racing with an Animated scatter plot
Now let’s change the viewpoint by shifting every district in response to its rank at totally different cut-off dates. The sizes of the scatter dots can be utilized to indicate the CO quantity.
To facilitate plotting with Plotly, we have to add two extra columns to the DataFrame, place on the X-axis, and textual content for annotation.
ym = record(set(year_month))
ym.kind()df_month['posi'] = [ym.index(i) for i in df_month['year_month']]
df_month['CO_str'] = [str(round(i,2)) for i in df_month['CO']]
df_month['CO_text'] = [str(round(i,2))+' ppm' for i in df_month['CO']]
df_month.head()
Subsequent, plot an animated scatter plot.
import plotly.categorical as px
fig = px.scatter(df_month, x='posi', y='rank',
dimension= 'CO',
coloration='District', textual content='CO_text',
color_discrete_map= dict_color,
animation_frame='year_month',
animation_group='District',
range_x=[-2,len(ym)],
range_y=[0.5,6.5]
)
fig.update_xaxes(title='', seen=False)
fig.update_yaxes(autorange='reversed', title='Rank',
seen=True, showticklabels=True)
fig.update_layout(xaxis=dict(showgrid=False),
yaxis=dict(showgrid=True))
fig.update_traces(textposition='center left')
fig.present()
Ta-da…
Charts
Animated charts are usually restricted by with the ability to categorical one time limit. To point out a number of time factors, some charts and strategies might be utilized to exhibit many time factors without delay.
3. Drawing strains with a Bump chart
Mainly, a bump chart applies a number of strains to indicate the adjustments in rating over time. Plotting a bump chart with Plotly permits customers to filter the outcome and supply extra info when hovering the cursor over every information level, as proven within the outcome under.
import plotly.categorical as px
fig = px.line(df_month, x = 'year_month', y = 'rank',
coloration = 'District',
color_discrete_map= dict_color,
markers=True,
hover_name = 'CO_text')
fig.update_traces(marker=dict(dimension=11))
fig.update_yaxes(autorange='reversed', title='Rank',
seen=True, showticklabels=True)
fig.update_xaxes(title='', seen=True, showticklabels=True)
fig.update_layout(xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False) )
fig.present()
4. Creating a photograph collage of bar charts
A easy bar chart can categorical rating at a time level. With many time factors, we are able to create many bar charts after which mix them into a photograph collage. Begin with utilizing the Seaborn library to create a bar chart.
df_select = df_month[df_month['year_month']=='2017-01']
fig, ax = plt.subplots(figsize=(15, 6))sns.set_style('darkgrid')
sns.barplot(information = df_select,
x = 'District', y ='CO',
order=df_select.sort_values('CO', ascending=False)['District'],
palette=dict_color)
ax.bar_label(ax.containers[0],
labels=df_select.sort_values('CO', ascending=False)['CO_str'],
label_type='edge', dimension=11)
plt.ylabel('CO (ppm)')
plt.title('2017-01')
plt.present()
Use the for-loop operate to create the bar charts at totally different time factors. Please bear in mind that the code under will export the charts to your laptop for importing later.
keep_save = []
for t in ym:
df_ = df_month[df_month['year_month']==t]
fig, ax = plt.subplots(figsize=(8.5, 5))
sns.set_style('darkgrid')
sns.barplot(information = df_,
x = 'District', y ='CO',
order = df_.sort_values('CO', ascending=False)['District'],
palette=dict_color)
ax.bar_label(ax.containers[0],
labels=df_.sort_values('CO', ascending=False)['CO_str'],
label_type='edge', dimension=11)
plt.ylim([0, 1.2])
plt.ylabel('CO (ppm)')
plt.title(t)
plt.tight_layout()
s_name = t + '_bar.png'
keep_save.append(s_name)
plt.savefig(s_name)
plt.present()
Create a operate to mix the charts. I discovered a wonderful code to mix many plots from this hyperlink on Stack Overflow.
Apply the operate.
## get_collage(n_col, n_row, width, top, save_name, 'output.png')
# width = n_col * determine width
# top = n_row * determine top get_collage(12, 3, 12*850, 3*500, keep_save, 'order_bar.png')
Ta-da…
The outcome reveals every district’s month-to-month CO quantity whereas presenting the rating order over time. Thus, we are able to evaluate the district ranks and the quantity of air pollution of many time factors on the identical time.
5. Fancy the bar charts with a Round bar chart
With the identical idea because the earlier concept, we are able to flip regular bar charts into round bar charts(aka race monitor plots) and mix them into a photograph collage.
As beforehand talked about that every thing has its execs and cons. Every bar on the round chart could also be onerous to check because of the unequal size ratio of every bar. Nonetheless, this may be thought-about a great possibility for creating an attention grabbing impact.
Begin with an instance of making a round bar chart.
Making use of the for-loop operate to get different round bar charts. The outcomes will likely be exported to your laptop for import later.
Use the operate to acquire a photograph collage.
get_collage(12, 3, 12*860, 3*810, keep_cir, 'order_cir.png')
6. One other method to fancy the bar charts with a Radial bar chart
Altering the route of the bar charts to start out from the middle with radial bar charts. That is one other concept for catching consideration. Nonetheless, it may be seen that the bars not situated close to one another are onerous to check.
Begin with an instance of a radial bar chart.
Making use of the for-loop operate to create different radial bar charts. The outcomes can even be exported to your laptop for import later.
Apply the operate to acquire a photograph collage.
get_collage(12, 3, 12*800, 3*800, keep_rad, 'order_rad.png')
7. Utilizing coloration with Warmth Map
Usually, the warmth map is a standard chart for presenting information right into a two-dimensional chart and displaying values with colours. With our dataset, the colour might be utilized to indicate the rank numbers.
Begin with making a pivot desk with pd.pivot().
df_pivot = pd.pivot(information=df_month, index='District',
columns='year_month', values='rank')
df_pivot
After getting the pivot desk, we are able to simply create a warmth map with only a few strains of code.
plt.determine(figsize=(20,9.5))
sns.heatmap(df_pivot, cmap='viridis_r', annot=True, cbar=False)
plt.present()
With the colour and annotation, we are able to spot the district with the very best (yellow coloration) and lowest(darkish blue coloration) variety of CO. The change in rating might be seen over time.
Abstract
This text has introduced seven visualization concepts with Python code to specific the adjustments in information ranks over time. As beforehand talked about, every thing has its execs and limits. An essential factor is discovering the suitable chart that fits the information.
I am certain there are extra graphs for rating information over time than talked about right here. This text solely guides with some concepts. If in case you have any solutions or suggestions, please be at liberty to go away a remark. I’d be comfortable to see it.
Thanks for studying
These are my information visualization articles that you could be discover attention-grabbing:
- 8 Visualizations with Python to Deal with A number of Time-Sequence Information (hyperlink)
- 6 Visualization with Python Tips to Deal with Extremely-Lengthy Time-Sequence Information (hyperlink)
- 9 Visualizations with Python to indicate Proportions as a substitute of a Pie chart (hyperlink)
- 9 Visualizations with Python that Catch Extra Consideration than a Bar Chart (hyperlink)