Easy Logistic Regression for Ordinal Variables in R | by Md Sohel Mahmood | Nov, 2022

November 20, 2022

2

Statistics in R Collection

Introduction

Logistic regression turns into fairly attention-grabbing once we take care of ordinal variables. Not like binary variables, now we now have ordinal predictor variable and response variable. We are able to have instances the place the predictor variable is ordinal and the response variable is binary and vice versa. An ordinal variable is that variable which has ordered knowledge. For instance, we are able to have training degree knowledge as “Excessive Faculty”, “Bachelors”, ”Masters” and “Doctorate”. The information signifies a number of ranges of potential categorical values in an ordered vogue. On this article, we’ll dive into easy logistic regression for ordinal variables.

We now have gone by way of easy and a number of logistic regression in earlier articles. Readers can take a look into these articles for higher understating of logistic regression.

Dataset

Right here, we’ll use Grownup Knowledge Set from UCI Machine Studying Repository. This dataset have greater than 30000 particular person’s demographic knowledge together with race, training, occupation, gender, working hour per week, earnings degree and so forth.

To carry out ordinal logistic regression research, we have to modify the given knowledge just a little. However to begin with, let’s pose a research query.

What’s the affect of training degree on the extent of earnings?

To reply this query, we’d like the label encoded knowledge for training in addition to for earnings degree. The given dataset has training degree ranging from first grade to all the best way to Doctorate. The earnings degree is binary and offers info if the person has earnings higher than $50000 or not. So, we now have ordinal predictor variable and binary response variable. Let’s execute the evaluation in R.

Hyperlink to excel file in github (grownup — v2 — for github.xlsx)

Implementation in R

We now have first accomplished the label encoding for training column. Since that is an ordinal knowledge, we have to set correct order and set numerical values. The order and the set values are proven under. We’ll use the Education_code column and by using it, we’ll attempt to reply the query in hand. We’ll see if there may be any affect of this training degree on the earnings.

Our earnings knowledge is binary. Which suggests we now have solely two ranges: earnings > $50000 and earnings ≤ $50000. We now have additionally encoded the respective knowledge as 1 (earnings > $50000) and 0 (earnings ≤ $50000).

In R, we’ll first learn the modified knowledge and cross the required columns into the clm() perform.