Technically, the CASE expression in SQL evaluates the circumstances talked about within the WHEN clause. If the situation is True, the code talked about within the THEN clause is executed and if the situation is False, the code talked about within the ELSE half is executed.
Earlier than beginning with the examples, let’s shortly refresh the expression CASE..WHEN..THEN
and methods to put in writing it. ♻️
Probably the most often used type of this assertion is writing a number of WHEN..THEN
clauses beneath a single CASE
expression.
For instance, let’s create a brand new column — Gross sales Supervisor Initials — by placing circumstances on the Gross sales Supervisor column as under.
SELECT Sales_Manager,
Product_Category,
CASE
WHEN Sales_Manager = 'Pablo' THEN 'P'
WHEN Sales_Manager = 'Kristen' THEN 'Ok'
ELSE Sales_Manager
END AS Sales_Manager_Initials
FROM sql_practice.dummy_sales_data_v1;
📍 Observe: The ELSE clause within the above code is elective. In case you don’t point out it, then all of the locations the place circumstances in each the WHEN clauses are False might be crammed with
NULL
.
One other method, when the situation being examined is simply on one column, similar to within the above instance, SQL affords you a simplified method of writing CASE..WHEN
expression as under.
SELECT Sales_Manager,
Product_Category,
CASE Sales_Manager
WHEN 'Pablo' THEN 'P'
WHEN 'Kristen' THEN 'Ok'
ELSE Sales_Manager
END AS Sales_Manager_Initials
FROM sql_practice.dummy_sales_data_v1;
So, you will get precisely the identical output with out mentioning the column identify in every WHEN
clause. Merely write the key phrase CASE
adopted by the column identify on which you wish to apply the situation.
I didn’t discover any execution time or efficiency-related variations between these two strategies. So, it’s a person choice on which technique to observe.
Now, as you bought insights into the way to write CASE-WHEN expression, let’s leap into the sensible use circumstances.
Creating new columns primarily based on current ones is likely one of the mostly accomplished knowledge transformations. And largely it’s worthwhile to suggest circumstances on the prevailing columns.
As you observed within the above instance, you possibly can conditionally take a look at the values in a single column and create a brand new column primarily based on that. However, the WHEN clause offers you the flexibleness so as to add circumstances on a number of columns as nicely. ✅
For instance, suppose for all of the orders the place the Gross sales Supervisor is Pablo and the Product Class is Workplace, you wish to add 50 within the Amount. On the identical time, for all different mixtures of Gross sales Supervisor and Product Class, you wish to preserve the Amount column unchanged.
Ideally, you want circumstances on two columns as — Sales_Manager = ‘Pablo’
, Product_Category = ‘Workplace’
. Furthermore, these each circumstances have to be True to get the brand new values for Amount. Therefore, it’s worthwhile to be a part of these circumstances utilizing logical AND.
SELECT Sales_Manager,
Product_Category,
Amount,
CASE
WHEN Sales_Manager = 'Pablo'
AND Product_Category = 'Workplace'
THEN Amount+50
ELSE Amount
END AS New_Quantity
FROM sql_practice.dummy_sales_data_v1;
As you possibly can see two circumstances within the WHEN clause are joined with the key phrase AND. Solely when each the circumstances are True, the code within the THEN clause is executed i.e. Amount+50, as you possibly can see within the above image.
Right here, ELSE Amount
retains the Amount column unchanged for all different mixtures of Gross sales Supervisor and Product Class.
Shifting forward, typically it’s worthwhile to cut up the information into totally different classes for additional evaluation. You are able to do this within the knowledge extraction part itself utilizing CASE the expression in SQL.
As you may be utilizing if-else logic to separate the information into classes, CASE..WHEN
expression is the right software on this utility.
Information will be cut up into a number of classes primarily based on different numerical in addition to non-numerical columns.
Versus the GROUP BY clause, which teams the rows into totally different teams, CASE-WHEN merely creates a brand new column and mentions a gaggle identify to which every row of the dataset belongs. ✅
For instance, you’ve gotten the amount for every order within the gross sales knowledge. Suppose you wish to categorize it as ‘Excessive’, ‘Low’, or ‘Medium’ Order Quantity primarily based on column Amount. You should use CASE-WHEN assertion as under —
SELECT OrderID,
Amount,
CASE
WHEN Amount > 65 THEN 'Excessive'
WHEN Amount < 15 THEN 'Low'
ELSE 'Medium'
END AS Order_Volume
FROM sql_practice.dummy_sales_data_v1;
Easy!
All it’s worthwhile to do is point out the circumstances, primarily based on which you wish to cut up the dataset rows into totally different teams.
You can even embrace a number of circumstances within the WHEN
clause as you probably did within the earlier instance. And such a sort of categorization additionally works, when you’ve gotten a number of circumstances on columns from totally different tables.
For instance, suppose you’ve gotten two tables Orders and Shipments as under.
As I didn’t have this tables prepared, I used CTE to create them as under —
WITH Orders AS
(
SELECT OrderID,
Sales_Manager,
Amount
FROM sql_practice.dummy_sales_data_v1
LIMIT 20
),Shipments AS
(
SELECT OrderID,
Product_Category,
Shipping_Address,
Shipping_Cost
FROM sql_practice.dummy_sales_data_v1
LIMIT 20
)
You may study extra about utilizing CTE right here.
Now, suppose you wish to categorize the information from the Orders desk into ‘Excessive Precedence’, ‘Medium Precedence’, and ‘Low Precedence’ primarily based on the values within the columns Amount and Delivery Value.
This would possibly look complicated, however fairly straight-forward with CASE..WHEN
expression as under.
SELECT O.OrderID,
O.Amount,
S.Shipping_Cost,
CASE
WHEN O.Amount > 50
OR S.Shipping_Cost > 30
THEN 'Excessive Precedence'WHEN O.Amount < 15
OR S.Shipping_Cost < 10
THEN 'Low Precedence'
ELSE 'Medium Precedence'
END AS Order_Type
FROM Orders O
LEFT JOIN Shipments S
ON O.OrderID = S.OrderID
It’s worthwhile to use the JOIN
clause and column aliases. The remainder half is similar easy CASE..WHEN
expression. You may see within the above image, {that a} new column Order_Type is created with the precedence values for every OrderID.
Shifting forward, generally your dataset incorporates totally different values of the identical which means which makes the information inconsistent and non-uniform.
The only instance is — the gender column in sure datasets can comprise values comparable to ‘W’, ‘F’, ‘Girl’, and ‘Feminine’, and all of them point out the gender is ‘Feminine’. Then why we should always have totally different phrases to point the identical which means?
It’s all the time higher to standardize the dataset to make it extra constant.
The following use case demonstrates this intimately.
Information standardization is used to make knowledge extra constant i.e. making certain that every column has the identical kind of content material and a single time period or worth indicating a single which means.
CASE-WHEN assertion could be very helpful to rework inconsistent or non-uniform knowledge right into a extra structured and uniform one. Utilizing this SQL performance, you possibly can all the time create a brand new column with standardized values and use it for additional evaluation.
Let’s see this with an instance.
Suppose, on this dataset you wish to get the order standing as both Accomplished or Not Accomplished i.e. standardizing the standing to those two values. Nevertheless, you possibly can see the standing column incorporates the values comparable to — Delivered, Not Delivered, Not Shipped, and Shipped.
The CASE WHEN assertion can be utilized on this situation as under
SELECT OrderID,
Standing,
CASE
WHEN Standing = 'Delivered' THEN 'Accomplished'
ELSE 'Not Accomplished'
END as Order_Status
FROM sql_practice.dummy_sales_data_v1;
As you see within the above image, the Order_Status incorporates solely two values. If the Standing is Delivered, then solely the Order_Status is Accomplished and in all different circumstances, it’s Not Accomplished.
Nicely, I agree, this was a fairly weak instance, however when you perceive the use case, you need to use it in complicated eventualities.
Additionally, knowledge analytics is incomplete with out knowledge aggregations. In my final article, 5 Pandas Group By Methods You Ought to Know in Python you learn the way to use mixture capabilities with GROUP BY in SQL.
How about utilizing mixture capabilities on conditional choice of knowledge??
Sure, that is doable in SQL & there may be all the time an “simple method” utilizing CASE.
This is likely one of the sophisticated use-cases of CASE statements and utilizing mixture capabilities with it may be generally tough.
Subsequently, let’s attempt to perceive it with an instance.
Suppose, you wish to get the whole variety of orders every Gross sales Supervisor dealt with which reveals the standing Delivered and Not Delivered.
Ideally, it’s worthwhile to depend the orders which belong to a Gross sales Supervisor and a Standing mixture. However with a CASE assertion, you don’t must kind each mixture. You may merely use an mixture perform COUNT
as under.
SELECT Sales_Manager,
COUNT(CASE WHEN Standing = 'Delivered' THEN OrderID ELSE NULL END) AS Delivered_Orders,
COUNT(CASE WHEN Standing = 'Not Delivered' THEN OrderID ELSE NULL END) AS NonDelivered_Orders
FROM sql_practice.dummy_sales_data_v1
GROUP BY Sales_Manager;
It is best to point out the CASE..WHEN..THEN
throughout the mixture perform, as above. Right here, within the column Delivered_Orders, you might be counting all of the orders WHEN
Standing is Delivered to get the ultimate output as under.
Easy it’s!
You can even get the identical outcomes utilizing one other mixture perform — SUM()
. Solely it’s worthwhile to change the CASE-WHEN assertion barely as under.
SELECT Sales_Manager,
SUM(CASE WHEN Standing = 'Delivered' THEN 1 ELSE 0 END) AS Delivered_Orders,
SUM(CASE WHEN Standing = 'Not Delivered' THEN 1 ELSE 0 END) AS NonDelivered_Orders
FROM sql_practice.dummy_sales_data_v1
GROUP BY Sales_Manager;
Within the above CASE assertion, you might be really creating an intermediate column (invisible) of 1
& 0
primarily based on the Standing and in the end doing an addition of this column’s values to get the required output.
Similarly, you need to use different mixture capabilities as nicely.
Up to now, you witness CASE utilization throughout the SELECT assertion. Nevertheless, it may be used within the ORDER BY clause as nicely.
In SQL, the ORDER BY clause is used to rearrange the output dataset in ascending or descending order. Utilizing CASE
inside ORDER BY
affords you extra flexibility and lets you type knowledge aside from solely customary methods.
For instance, suppose you wish to prepare all of the ‘Delivered’ orders in ascending order of unit worth if they’re shipped to the UK and ascending order of supply time if they’re shipped to some other nation.
Utilizing the usual ORDER BY clause, this appears an inconceivable process.
However utilizing CASE with ORDER BY, your life is way simpler. 💯
Let’s perceive it utilizing the under question.
SELECT OrderID,
Shipping_Address,
UnitPrice,
Delivery_Time
FROM sql_practice.dummy_sales_data_v1
WHERE Standing LIKE 'Delivered'
AND Shipping_Address IN ('UK', 'India')
AND Delivery_Time <19
AND UnitPrice BETWEEN 140 AND 150
ORDER BY CASE
WHEN Shipping_Address = 'UK' THEN UnitPrice
ELSE Delivery_Time
END;
📍 Observe: the circumstances talked about within the WHERE clause are completely not required to get the outcomes. I used them solely to make the outcomes extra seen.
As talked about within the above question, you possibly can specify your circumstances to type the dataset in ORDER BY
clause utilizing CASE..WHEN
expression to get the output as under.
So when Shipping_Address is the UK, all of the orders might be organized within the ascending order or UnitPrice. And for some other worth in Shipping_Address all of the orders are organized in ascending order of Delivery_Time as seen above.
Equally, to get all of the information in descending order you possibly can merely use the key phrase — DESC — after the END key phrase within the above question.
That’s it! 🏆