Introduction
Information cleansing is a vital step in any information science venture. In Python, Pandas DataFrame is a generally used information construction for information manipulation and evaluation.
On this Byte, we are going to concentrate on dealing with non-NaN
(Not a Quantity) values in DataFrame columns. We are going to learn to depend and calculate complete non-NaN
values, and likewise deal with empty strings as NA values.
Counting Non-NaN Values in DataFrame Columns
Pandas supplies the depend()
perform to depend the non-NaN
values in DataFrame columns. Let’s begin by importing the pandas library and making a easy DataFrame.
import pandas as pd
import numpy as np
information = {'Title': ['Tom', 'Nick', 'John', np.nan],
'Age': [20, 21, 19, np.nan]}
df = pd.DataFrame(information)
print(df)
Output:
Title Age
0 Tom 20.0
1 Nick 21.0
2 John 19.0
3 NaN NaN
Now, we will depend the non-NaN
values in every column utilizing the depend()
technique:
print(df.depend())
Output:
Title 3
Age 3
dtype: int64
Calculating Complete Non-NaN Values in DataFrame
If you wish to get the full variety of non-NaN
values within the DataFrame, you should use the depend()
perform mixed with sum()
.
print(df.depend().sum())
Output:
6
This means that there are a complete of 6 non-NaN
values within the DataFrame.
Treating Empty Strings as NA Values
In some instances, you may need to deal with empty strings as NA values. You need to use the exchange()
perform to switch empty strings with np.nan
.
information = {'Title': ['Tom', 'Nick', '', 'John'],
'Age': [20, 21, '', 19]}
df = pd.DataFrame(information)
print(df)
Output:
Title Age
0 Tom 20
1 Nick 21
2
3 John 19
Now, exchange the empty strings with np.nan
:
df.exchange('', np.nan, inplace=True)
print(df)
Output:
Title Age
0 Tom 20.0
1 Nick 21.0
2 NaN NaN
3 John 19.0
Be aware: This operation modifications the DataFrame in-place. If you wish to hold the unique DataFrame intact, do not use the inplace=True
argument.
Utilizing notna() to Rely Non-Lacking Values
A barely extra direct approach to filter and depend non-NaN
values is with the notna()
technique.
Let’s begin with a easy DataFrame:
import pandas as pd
information = {'Title': ['John', 'Anna', None, 'Mike', 'Sarah'],
'Age': [28, None, None, 32, 29],
'Metropolis': ['New York', 'Los Angeles', None, 'Chicago', 'Boston']}
df = pd.DataFrame(information)
print(df)
It will output:
Title Age Metropolis
0 John 28.0 New York
1 Anna NaN Los Angeles
2 None NaN None
3 Mike 32.0 Chicago
4 Sarah 29.0 Boston
You possibly can see that our DataFrame has some lacking values (NaN
or None
).
Now, if you wish to depend the non-missing values within the ‘Title’ column, you should use notna()
:
print(df['Name'].notna().sum())
It will output:
4
The notna()
perform returns a Boolean Collection the place True
represents a non-missing worth and False
represents a lacking worth. The sum()
perform is then used to depend the variety of True
values, which symbolize the non-missing values.
Conclusion
On this Byte, we have realized the way to depend non-NaN
values in DataFrame columns. Dealing with lacking information is a vital step in information preprocessing. The notna()
perform, amongst different capabilities in Pandas, supplies a simple approach to depend non-missing values in DataFrame columns.