Wednesday, January 4, 2023
HomeData ScienceValidate Balanced Parenthesis utilizing SQL | by Dhruv Matani | Jan, 2023

Validate Balanced Parenthesis utilizing SQL | by Dhruv Matani | Jan, 2023


Validating if a string accommodates balanced parenthesis is a sensible drawback ensuing from string/expression parsing/validation in varied sensible situations. On this article, we have a look at find out how to validate such a string containing solely open ‘(’ and shut ‘)’ parenthesis utilizing simply declarative SQL.

Picture by Elena Mozhvilo on Unsplash

Earlier Article: Longest Rising Subsequence of an array in SQL

Given a string containing solely open and closed parenthesis, can you identify if these parenthesis are balanced? Being balanced means:

  1. Each open parenthesis ‘(’ has precisely one closing parenthesis ‘)’ after it
  2. A closing parenthesis ‘)’ is at all times matched with precisely one open parenthesis ‘(’ that seems earlier than it

The next are examples of legitimate balanced parenthesis strings:

  1. ((()()))
  2. ()()()()
  3. (()()(()()))

The next are instance of invalid balanced parenthesis:

  1. ((()( — right here, the primary 2 and the final open parenthesis don’t have an identical closing parenthesis
  2. ()()()()) — right here, the final closing parenthesis doesn’t match every other unmatched open parenthesis

The enter desk has 2 columns:

  1. idx: The issue index. i.e. the first string to examine may have idx = 1, the 2nd string to examine may have idx = 2, and so forth.
  2. parens: A string containing solely open and shut parenthesis. This string must be checked for well-formedness.
CREATE TABLE inputs(idx SERIAL PRIMARY KEY, parens TEXT NOT NULL);

INSERT INTO inputs(parens) VALUES
('((()()))'), ('((()('), ('()()()())'), ('()()()()'), ('(()()(()()))');

SELECT * FROM inputs;

The enter desk for the balanced parenthesis drawback (Picture by writer)

In crucial programming languages, this drawback is often solved by sustaining a stack which receives an ‘(’ each time it’s encountered within the enter string. Therefore, the stack accommodates solely ‘(’ characters. Each time a ‘)’ is seen within the enter, it’s matched with an ‘(’ on the highest of the stack, and the ‘)’ character is popped off.

Since we have now just one sort of open (and shut) brackets, we are able to get rid of the stack and keep only a counter, which counts what number of unmatched ‘(’ characters have been seen thus far. Each time an ‘(’ character is encountered, we increment the counter, and each time a ‘)’ character is encountered, we decrement the counter.

  1. Detrimental counter worth: If the counter ever reaches a adverse (< 0) worth, it signifies an unmatched closing parenthesis.
  2. Remaining counter worth: After all of the characters within the enter string are processed, if the worth of the counter is != 0, it signifies an issue. A optimistic worth signifies an unmatched ‘(’ character within the enter, and a adverse worth is already thought-about above.

The SQL code for this answer seems like this:

WITH as_split AS (
-- First break up every enter string such that each character is on its
-- personal row. We be certain that we tag every enter row with the unique
-- index of the enter string from which it got here in order that we all know which
-- drawback it is part of.
SELECT
idx,
UNNEST(
STRING_TO_ARRAY(
parens, NULL
)
) AS ch
FROM inputs
),

with_annotation AS (
-- Annotate characters from every drawback (distinctive index) with its
-- place utilizing ROW_NUMBER(). Additionally annotate an open paren with
-- +1 and an in depth paren with a -1 quantity, in order that we are able to keep
-- a working sum of those counters later.
SELECT
idx,
ch,
ROW_NUMBER() OVER(PARTITION BY idx) AS row_num,
CASE WHEN ch = '(' THEN +1 ELSE -1 END AS ctr
FROM as_split
),

with_running_sum AS (
-- Discover the working sum for characters in every drawback. Be aware that we're
-- fixing all the issues without delay (assume "batch API") as a substitute of feeding
-- feeding every drawback as soon as into the answer machine.
SELECT
idx,
ch,
ctr,
row_num,
SUM(ctr) OVER(PARTITION BY idx ORDER BY row_num ASC) AS running_sum,
MAX(row_num) OVER(PARTITION BY idx) AS max_row_num
FROM with_annotation
),

with_result AS (
-- The result's legitimate provided that we by no means hit a adverse working sum
-- (which signifies that we have now an additional shut paren than a corresponding
-- open paren) and if we finish with a working sum of 0. If we finish with a
-- working sum > 0 then we have now a further open paren that's not
-- matched with a corresponding shut paren.
SELECT
idx,
CASE WHEN MIN(running_sum) < 0 THEN TRUE ELSE FALSE END AS has_negative,
CASE WHEN SUM(
CASE WHEN row_num = max_row_num THEN running_sum ELSE 0 END
) = 0 THEN TRUE ELSE FALSE END AS ends_with_zero_sum
FROM with_running_sum
GROUP BY 1
)

SELECT
lhs.idx,
rhs.parens,
CASE
WHEN has_negative OR NOT ends_with_zero_sum THEN FALSE
ELSE TRUE END
AS is_valid_parens
FROM with_result lhs INNER JOIN inputs rhs
ON lhs.idx = rhs.idx
ORDER BY lhs.idx ASC;

There are fairly a number of intermediate tables used within the answer above to assist readability and to separate out the varied processing steps.

That is the primary time we have now used the UNNEST key phrase in SQL. That is additionally the primary time I’ve written an answer that batch-processes a number of inputs and solves them without delay. I leverage the idx discipline which signifies the index of the enter string. All intermediate tables use the idx discipline to separate out options to totally different issues.

The results of the O(n) answer (Picture by writer)

Estimated Price: The estimated price for this question on a desk with 5 distinct enter rows is 45k. Most of this price appears to be coming from using the window aggregation features.

Whereas I’ve tagged the runtime to be O(n), it could rely on how the database engine internally executes the question. For instance, if the engine notices that the project of the row_num column utilizing ROW_NUMBER() ends in rows which have a strictly growing worth for that column, and the database is ready to protect that row order throughout CTE tables, then it may well keep away from doing a kind later when it encounters an ORDER BY clause within the window perform execution right here.

SUM(ctr) OVER(PARTITION BY idx ORDER BY row_num ASC) AS running_sum,

The ORDER BY within the OVER() clause above is important to make sure that we get a working sum, and never an total sum for the complete partition.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments