Thursday, September 14, 2023
HomeProgrammingVerify if Parts in Record Matches a Regex in Python

Verify if Parts in Record Matches a Regex in Python


Introduction

For example you’ve got a listing of dwelling addresses and need to see which of them reside on a “Avenue”, “Ave”, “Lane”, and so on. Given the variability of bodily addresses, you’d most likely need to use a daily expression to do the matching. However how do you apply a regex to a listing? That is precisely what we’ll be taking a look at on this Byte.

Why Match Lists with Common Expressions?

Common expressions are considered one of greatest, if not the perfect, methods to do sample matching on strings. Briefly, they can be utilized to verify if a string comprises a selected sample, exchange components of a string, and even cut up a string primarily based on a sample.

Another excuse you could need to use a regex on a listing of strings: you’ve got a listing of electronic mail addresses and also you need to filter out all of the invalid ones. You should utilize a daily expression to outline the sample of a legitimate electronic mail tackle and apply it to all the checklist in a single go. There are an limitless variety of examples like this as to why you’d need to use a regex over a listing of strings.

Python’s Regex Module

Python’s re module offers built-in help for normal expressions. You possibly can import it as follows:

import re

The re module has a number of features to work with common expressions, resembling match(), search(), and findall(). We’ll be utilizing these features to verify if any factor in a listing matches a daily expression.

Hyperlink: For extra data on utilizing regex in Python, take a look at our article, Introduction to Common Expressions in Python

Utilizing the match() Operate

To verify if any factor in a listing matches a daily expression, you should utilize a loop to iterate over the checklist and the re module’s match() perform to verify every factor. This is an instance:

import re

# Record of strings
list_of_strings = ['apple', 'banana', 'cherry', 'date']

# Common expression sample for strings beginning with 'a'
sample = '^a'

for string in list_of_strings:
    if re.match(sample, string):
        print(string, "matches the sample")

On this instance, the match() perform checks if every string within the checklist begins with the letter ‘a’. The output shall be:

apple matches the sample

Be aware: The ^ character within the common expression sample signifies the beginning of the string. So, ^a matches any string that begins with ‘a’.

This can be a primary instance, however you should utilize extra complicated common expression patterns to match extra particular circumstances. For instance, here’s a regex for matching an electronic mail tackle:

([A-Za-z0-9]+[.-_])*[A-Za-z0-9]+@[A-Za-z0-9-]+(.[A-Z|a-z]{2,})+

Utilizing the search() Operate

Whereas re.match() is nice for checking the beginning of a string, re.search() scans by way of the string and returns a MatchObject if it finds a match anyplace within the string. Let’s tweak our earlier instance to search out any string that comprises “Hey”.

import re

my_list = ['Hello World', 'Python Hello', 'Goodbye World', 'Say Hello']
sample = "Hey"

for factor in my_list:
    if re.search(sample, factor):
        print(f"'{factor}' matches the sample.")

The output shall be:

'Hey World' matches the sample.
'Python Hey' matches the sample.
'Say Hey' matches the sample.

As you may see, re.search() discovered the strings that comprise “Hey” anyplace, not simply initially.

Utilizing the findall() Operate

The re.findall() perform returns all non-overlapping matches of sample in string, as a listing of strings. This may be helpful if you need to extract all occurrences of a sample from a string. Let’s use this perform to search out all occurrences of “Hey” in our checklist.

import re

my_list = ['Hello Hello', 'Python Hello', 'Goodbye World', 'Say Hello Hello']
sample = "Hey"

for factor in my_list:
    matches = re.findall(sample, factor)
    if matches:
        print(f"'{factor}' comprises {len(matches)} incidence(s) of 'Hey'.")

The output shall be:

'Hey Hey' comprises 2 incidence(s) of 'Hey'.
'Python Hey' comprises 1 incidence(s) of 'Hey'.
'Say Hey Hey' comprises 2 incidence(s) of 'Hey'.

Working with Nested Lists

What occurs if our checklist comprises different lists? Python’s re module features will not work immediately on nested lists, similar to it would not work with the basis checklist within the earlier examples. We have to flatten the checklist or iterate by way of every sub-list.

Let’s take into account a listing of lists, the place every sub-list comprises strings. We need to discover out which strings comprise “Hey”.

import re

my_list = [['Hello World', 'Python Hello'], ['Goodbye World'], ['Say Hello']]
sample = "Hey"

for sub_list in my_list:
    for factor in sub_list:
        if re.search(sample, factor):
            print(f"'{factor}' matches the sample.")

The output shall be:

'Hey World' matches the sample.
'Python Hey' matches the sample.
'Say Hey' matches the sample.

We first loop by way of every sub-list in the primary checklist. Then for every sub-list, we loop by way of its components and apply re.search() to search out the matching strings.

Working with Combined Knowledge Sort Lists

Python lists are versatile and might maintain a wide range of knowledge sorts. This implies you may have a listing with integers, strings, and even different lists. That is nice for lots of causes, nevertheless it additionally means it’s a must to take care of potential points when the info sorts matter on your operation. When working with common expressions, we solely take care of strings. So, what occurs when we have now a listing with blended knowledge sorts?

import re

mixed_list = [1, 'apple', 3.14, 'banana', '123', 'abc123', '123abc']

regex = r'd+'  # matches any sequence of digits

for factor in mixed_list:
    if isinstance(factor, str) and re.match(regex, factor):
        print(f"{factor} matches the regex")
    else:
        print(f"{factor} doesn't match the regex or shouldn't be a string")

On this case, the output shall be:

1 doesn't match the regex or shouldn't be a string
apple doesn't match the regex or shouldn't be a string
3.14 doesn't match the regex or shouldn't be a string
banana doesn't match the regex or shouldn't be a string
123 matches the regex
abc123 doesn't match the regex or shouldn't be a string
123abc matches the regex

We first verify if the present factor is a string. Solely then can we verify if it matches the common expression. It’s because the re.match() perform expects a string as enter. For those who attempt to apply it to an integer or a float, Python will throw an error.

Conclusion

Python’s re module offers a number of features to match regex patterns in strings. On this Byte, we discovered methods to use these features to verify if any factor in a listing matches a daily expression. We additionally noticed methods to deal with lists with blended knowledge sorts. Common expressions might be complicated, so take your time to know them. With a little bit of observe, you may discover that they can be utilized to resolve many issues when working with strings.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments