How one can Choose Columns in Pandas Primarily based on a String Prefix

August 18, 2023

1

Introduction

Pandas is a strong Python library for working with and analyzing knowledge. One operation that you simply may have to carry out when working with knowledge in Pandas is choosing columns primarily based on their string prefix. This may be helpful when you’ve a big DataFrame and also you need to give attention to particular columns that share a typical prefix.

On this Byte, we’ll discover a couple of strategies to realize this, together with making a sequence to pick columns and utilizing DataFrame.loc.

Choose All Columns Beginning with a Given String

Let’s begin with a easy DataFrame:

import pandas as pd

knowledge = {
    'item1': [1, 2, 3],
    'item2': [4, 5, 6],
    'stuff1': [7, 8, 9],
    'stuff2': [10, 11, 12]
}
df = pd.DataFrame(knowledge)
print(df)

Output:

   item1  item2  stuff1  stuff2
0      1      4       7      10
1      2      5       8      11
2      3      6       9      12

To pick out columns that begin with ‘merchandise’, you should use checklist comprehension:

selected_columns = [column for column in df.columns if column.startswith('item')]
print(df[selected_columns])

Output:

   item1  item2
0      1      4
1      2      5
2      3      6

Making a Sequence to Choose Columns

One other strategy to pick columns primarily based on their string prefix is to create a Sequence object from the DataFrame columns, after which use the str.startswith() technique. This technique returns a boolean Sequence the place a True worth signifies that the column identify begins with the required string.

selected_columns = pd.Sequence(df.columns).str.startswith('merchandise')
print(df.loc[:, selected_columns])

Output:

   item1  item2
0      1      4
1      2      5
2      3      6

Utilizing DataFrame.loc to Choose Columns

The DataFrame.loc technique is primarily label-based, however may additionally be used with a boolean array. The ix indexer for DataFrame is deprecated now, because it has numerous issues. .loc will increase a KeyError when the objects will not be discovered.

Think about the next instance:

selected_columns = df.columns[df.columns.str.startswith('item')]
print(df.loc[:, selected_columns])

Output:

   item1  item2
0      1      4
1      2      5
2      3      6

Right here, we first create a boolean array that’s True for columns beginning with ‘merchandise’. Then, we use this array to pick the corresponding columns from the DataFrame utilizing the .loc indexer. This technique is extra environment friendly than the earlier ones, particularly for big DataFrames, because it avoids creating an intermediate checklist or Sequence.

Making use of DataFrame.filter() for Column Choice

The filter() operate in pandas DataFrame gives a versatile and environment friendly option to choose columns primarily based on their names. It’s particularly helpful when coping with giant datasets with many columns.

The filter() operate permits us to pick columns primarily based on their labels. We will use the like parameter to specify a string sample that matches the column names. Nevertheless, if we need to choose columns primarily based on a string prefix, we are able to use the regex parameter.

Here is an instance:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'product_id': [101, 102, 103, 104],
    'product_name': ['apple', 'banana', 'cherry', 'date'],
    'product_price': [1.2, 0.5, 0.75, 1.3],
    'product_weight': [150, 120, 50, 60]
})

# Choose columns that begin with 'product'
df_filtered = df.filter(regex='^product')

print(df_filtered)

This can output:

   product_id product_name  product_price  product_weight
0         101        apple           1.20             150
1         102       banana           0.50             120
2         103       cherry           0.75              50
3         104         date           1.30              60

Within the above code, the ^ image is a daily expression that matches the beginning of a string. Subsequently, '^product' will match all column names that begin with ‘product’.

Subsequent: The filter() operate returns a brand new DataFrame that shares the information with the unique DataFrame. So, any modifications to the brand new DataFrame will not have an effect on the unique DataFrame.

Conclusion

On this Byte, we explored alternative ways to pick columns in a pandas DataFrame primarily based on a string prefix. We discovered methods to create a Sequence and use it to pick columns, methods to use the DataFrame.loc operate, and methods to apply the DataFrame.filter() operate. In fact, every of those strategies has its personal benefits and use instances. The selection of technique will depend on the particular necessities of your knowledge evaluation activity.

Previous articleHow To Configure a DD-WRT VLAN Setup (Router FAQ)

Next articleWhy working from house simply received higher (Trace: it’s your Wi-Fi)

How one can Choose Columns in Pandas Primarily based on a String Prefix

Introduction

Choose All Columns Beginning with a Given String

Making a Sequence to Choose Columns

Utilizing DataFrame.loc to Choose Columns

Making use of DataFrame.filter() for Column Choice

Conclusion

JavaScript: Verify if all Values in Array are True or False

Concentrating on the Steam Deck with Godot

Fixing the "Module not discovered: Can't resolve 'react-icons'" Error

LEAVE A REPLY Cancel reply

Most Popular

Why working from house simply received higher (Trace: it’s your Wi-Fi)

How To Configure a DD-WRT VLAN Setup (Router FAQ)

12 Fixes For Discord Display screen Share Black Display screen Error On PC (2023)

Provide chain enhancements, AI demand propel Cisco earnings

Recent Comments

ABOUT US

POPULAR POSTS

Why working from house simply received higher (Trace: it’s your Wi-Fi)

How To Configure a DD-WRT VLAN Setup (Router FAQ)

12 Fixes For Discord Display screen Share Black Display screen Error On PC (2023)

POPULAR CATEGORY