Sorting pandas DataFrames utilizing a number of columns
When inspecting our information, we might typically need and even should type it based mostly on one or a number of columns. This straightforward course of may also help us examine a particular use-case, discover edge circumstances and so forth.
In at this time’s tutorial we’ll clarify intimately type pandas DataFrames both in ascending or descending order. Moreover, we will even display contain a number of columns on the subject of sorting the info. We are going to even focus on type a subset of columns in ascending order and the remaining subset in descending order.
First, let’s create an instance DataFrame that we are going to be referencing all through this tutorial with the intention to display a couple of ideas and showcase successfully type pandas DataFrames.
import pandas as pd
df = pd.DataFrame(
[
(1, 'A', 140, False, 3.5),
(2, 'B', 210, True, 4.0),
(3, 'A', 562, True, 1.1),
(4, 'D', 133, False, 2.3),
(5, 'C', 109, False, 9.8),
(6, 'C', None, True, 3.9),
(7, 'B', 976, False, 7.8),
(8, 'D', 356, False, 4.5),
(9, 'C', 765, True, 2.1),
],
columns=['colA', 'colB', 'colC', 'colD', 'colE']
)print(df)
colA colB colC colD colE
0 1 A 140.0 False 3.5
1 2 B 210.0 True 4.0
2 3 A 562.0 True 1.1
3 4 D 133.0 False 2.3
4 5 C 109.0 False 9.8
5 6 C NaN True 3.9
6 7 B 976.0 False 7.8
7 8 D 356.0 False 4.5
8 9 C 765.0 True 2.1
pandas.DataFrame.sort_values()
is the strategy you’ll want to make use of with the intention to type a DataFrame based mostly on particular situations. Within the subsequent few sections we’ll focus on a couple of potential use-cases and display use sort_values
with the intention to infer the specified end result.
Sorting on one column
Now let’s suppose that we need to type the DataFrame we’ve created moments in the past, based mostly on the values of column colC
. All we have to do is specify the column title in by
argument:
>>> df.sort_values(by='colC')
The end result will comprise all of the information ordered by colC
in ascending order (by default).
colA colB colC colD colE
4 5 C 109.0 False 9.8
3 4 D 133.0 False 2.3
0 1 A 140.0 False 3.5
1 2 B 210.0 True 4.0
7 8 D 356.0 False 4.5
2 3 A 562.0 True 1.1
8 9 C 765.0 True 2.1
6 7 B 976.0 False 7.8
5 6 C NaN True 3.9
Alternatively, you can specify ascending=False
with the intention to type the DataFrame on column colC
in descending order:
>>> df.sort_values(by='colC', ascending=False)
colA colB colC colD colE
6 7 B 976.0 False 7.8
8 9 C 765.0 True 2.1
2 3 A 562.0 True 1.1
7 8 D 356.0 False 4.5
1 2 B 210.0 True 4.0
0 1 A 140.0 False 3.5
3 4 D 133.0 False 2.3
4 5 C 109.0 False 9.8
5 6 C NaN True 3.9
Sorting on a number of columns
Now let’s suppose we need to order the DataFrame based mostly on two columns, particularly colA
and colC
. All we have to do that time is to offer the column names as a listing and cross it into by
argument:
>>> df.sort_values(by=['colB', 'colC'])
The above assertion will type the DataFrame into ascending order based mostly on the values of columns colB
and colC
:
colA colB colC colD colE
0 1 A 140.0 False 3.5
2 3 A 562.0 True 1.1
1 2 B 210.0 True 4.0
6 7 B 976.0 False 7.8
4 5 C 109.0 False 9.8
8 9 C 765.0 True 2.1
5 6 C NaN True 3.9
3 4 D 133.0 False 2.3
7 8 D 356.0 False 4.5
Word nonetheless that the order that column names are specified issues — in different phrases sort_values(by=['colB', 'colC']
and sort_values(by=['colC', 'colB']
received’t produce the identical outcomes:
>>> df.sort_values(by=['colC', 'colB'])
colA colB colC colD colE
4 5 C 109.0 False 9.8
3 4 D 133.0 False 2.3
0 1 A 140.0 False 3.5
1 2 B 210.0 True 4.0
7 8 D 356.0 False 4.5
2 3 A 562.0 True 1.1
8 9 C 765.0 True 2.1
6 7 B 976.0 False 7.8
5 6 C NaN True 3.9
Sorting in ascending or descending order with a number of columns
Going ahead, we will even specify the sorting order for every column when sorting on a number of ones. Because of this we’re allowed to order in ascending order for one column, whereas we will go along with descending order for different columns.
So as to obtain this, all we have to do is specify a listing containing boolean values that correspond to each column laid out in by
argument and represents whether or not we need to type in ascending order or not.
The next command will type the DataFrame based mostly on colB
(ascending order) and colC
(descending order).
>>> df.sort_values(by=['colB', 'colC'], ascending=[True, False])
colA colB colC colD colE
2 3 A 562.0 True 1.1
0 1 A 140.0 False 3.5
6 7 B 976.0 False 7.8
1 2 B 210.0 True 4.0
8 9 C 765.0 True 2.1
4 5 C 109.0 False 9.8
5 6 C NaN True 3.9
7 8 D 356.0 False 4.5
3 4 D 133.0 False 2.3
Coping with lacking values
You’ll have observed already that vacant values will at all times be positioned on the very finish of the outcomes, regardless the order we select.
We will change this behaviour and as a substitute place them on the very prime of the outcomes, by merely specifying na_position='first'
(defaults to 'final'
).
>>> df.sort_values(by='colC', ascending=True, na_position='first')
colA colB colC colD colE
5 6 C NaN True 3.9
4 5 C 109.0 False 9.8
3 4 D 133.0 False 2.3
0 1 A 140.0 False 3.5
1 2 B 210.0 True 4.0
7 8 D 356.0 False 4.5
2 3 A 562.0 True 1.1
8 9 C 765.0 True 2.1
6 7 B 976.0 False 7.8
Last Ideas
In at this time’s quick tutorial we demonstrated how sorting works in pandas. Given a reasonably easy DataFrame, we showcased type the info based mostly on one and even a number of columns.
Moreover, we demonstrated how one can type in descending or ascending order and even resolve which columns must be sorted on descending order and which ones needs to be sorted in ascending order.
Lastly, we showcased how we will select whether or not empty values will seem on the prime or backside of the outcomes, regardless the order we select to type the information.
Develop into a member and skim each story on Medium. Your membership payment instantly helps me and different writers you learn. You’ll additionally get full entry to each story on Medium.
Associated articles you might also like