Tuesday, August 15, 2023
HomeProgrammingCounting Non-NaN Values in DataFrame Columns

Counting Non-NaN Values in DataFrame Columns


Introduction

Information cleansing is a vital step in any information science venture. In Python, Pandas DataFrame is a generally used information construction for information manipulation and evaluation.

On this Byte, we are going to concentrate on dealing with non-NaN (Not a Quantity) values in DataFrame columns. We are going to learn to depend and calculate complete non-NaN values, and likewise deal with empty strings as NA values.

Counting Non-NaN Values in DataFrame Columns

Pandas supplies the depend() perform to depend the non-NaN values in DataFrame columns. Let’s begin by importing the pandas library and making a easy DataFrame.

import pandas as pd
import numpy as np

information = {'Title': ['Tom', 'Nick', 'John', np.nan],
        'Age': [20, 21, 19, np.nan]}

df = pd.DataFrame(information)

print(df)

Output:

   Title   Age
0   Tom  20.0
1  Nick  21.0
2  John  19.0
3   NaN   NaN

Now, we will depend the non-NaN values in every column utilizing the depend() technique:

print(df.depend())

Output:

Title    3
Age     3
dtype: int64

Calculating Complete Non-NaN Values in DataFrame

If you wish to get the full variety of non-NaN values within the DataFrame, you should use the depend() perform mixed with sum().

print(df.depend().sum())

Output:

6

This means that there are a complete of 6 non-NaN values within the DataFrame.

Treating Empty Strings as NA Values

In some instances, you may need to deal with empty strings as NA values. You need to use the exchange() perform to switch empty strings with np.nan.

information = {'Title': ['Tom', 'Nick', '', 'John'],
        'Age': [20, 21, '', 19]}

df = pd.DataFrame(information)

print(df)

Output:

   Title Age
0   Tom  20
1  Nick  21
2        
3  John  19

Now, exchange the empty strings with np.nan:

df.exchange('', np.nan, inplace=True)

print(df)

Output:

   Title  Age
0   Tom  20.0
1  Nick  21.0
2   NaN   NaN
3  John  19.0

Be aware: This operation modifications the DataFrame in-place. If you wish to hold the unique DataFrame intact, do not use the inplace=True argument.

Utilizing notna() to Rely Non-Lacking Values

A barely extra direct approach to filter and depend non-NaN values is with the notna() technique.

Let’s begin with a easy DataFrame:

import pandas as pd

information = {'Title': ['John', 'Anna', None, 'Mike', 'Sarah'],
        'Age': [28, None, None, 32, 29],
        'Metropolis': ['New York', 'Los Angeles', None, 'Chicago', 'Boston']}

df = pd.DataFrame(information)

print(df)

It will output:

   Title   Age         Metropolis
0  John  28.0     New York
1  Anna   NaN  Los Angeles
2  None   NaN         None
3  Mike  32.0      Chicago
4 Sarah  29.0       Boston

You possibly can see that our DataFrame has some lacking values (NaN or None).

Now, if you wish to depend the non-missing values within the ‘Title’ column, you should use notna():

print(df['Name'].notna().sum())

It will output:

4

The notna() perform returns a Boolean Collection the place True represents a non-missing worth and False represents a lacking worth. The sum() perform is then used to depend the variety of True values, which symbolize the non-missing values.

Conclusion

On this Byte, we have realized the way to depend non-NaN values in DataFrame columns. Dealing with lacking information is a vital step in information preprocessing. The notna() perform, amongst different capabilities in Pandas, supplies a simple approach to depend non-missing values in DataFrame columns.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments