Friday, November 15, 2024
HomeData ScienceHuman-Be taught: Rule-Primarily based Studying as an Different to Machine Studying |...

Human-Be taught: Rule-Primarily based Studying as an Different to Machine Studying | by Khuyen Tran | Jan, 2023


You’re given a labeled dataset and assigned to foretell a brand new one. What would you do?

The primary strategy that you just most likely strive is to coach a machine studying mannequin to search out guidelines for labeling new information.

Picture by Writer

That is handy, however it’s difficult to know why the machine studying mannequin comes up with a selected prediction. You can also’t incorporate your area information into the mannequin.

As an alternative of relying on a machine studying mannequin to make predictions, is there a solution to set the principles for information labeling primarily based in your information?

Picture by Writer

That’s when human-learn is useful.

human-learn is a Python bundle to create rule-based methods which might be simple to assemble and are suitable with scikit-learn.

To put in human-learn, sort:

pip set up human-learn

Within the earlier article, I talked about find out how to create a human studying mannequin by drawing:

On this article, we are going to learn to create a mannequin with a easy operate.

Be happy to play and fork the supply code of this text right here:

To guage the efficiency of a rule-based mannequin, let’s begin with predicting a dataset utilizing a machine studying mannequin.

We are going to use the Occupation Detection Dataset from UCI Machine Studying Repository for example for this tutorial.

Our process is to foretell room occupancy primarily based on temperature, humidity, mild, and CO2. A room shouldn’t be occupied if Occupancy=0 and is occupied if Occupancy=1 .

After downloading the dataset, unzip and skim the information:

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Get prepare and take a look at information
prepare = pd.read_csv("occupancy_data/datatraining.txt").drop(columns="date")
take a look at = pd.read_csv("occupancy_data/datatest.txt").drop(columns="date")

# Get X and y
goal = "Occupancy"
train_X, train_y = prepare.drop(columns=goal), prepare[target]
val_X, val_y = take a look at.drop(columns=goal), take a look at[target]

Check out the primary ten data of the prepare dataset:

prepare.head(10)
Picture by Writer

Practice the scikit-learn’s RandomForestClassifier mannequin on the coaching dataset and use it to foretell the take a look at dataset:

# Practice
forest_model = RandomForestClassifier(random_state=1)

# Preduct
forest_model.match(train_X, train_y)
machine_preds = forest_model.predict(val_X)

# Evalute
print(classification_report(val_y, machine_preds))

Picture by Writer

The rating is fairly good. Nevertheless, we’re not sure how the mannequin comes up with these predictions.

Let’s see if we are able to label the brand new information with easy guidelines.

There are 4 steps to create guidelines for labeling information:

  1. Generate a speculation
  2. Observe the information to validate the speculation
  3. Begin with easy guidelines primarily based on the observations
  4. Enhance the principles

Generate a Speculation

Mild in a room is an effective indicator of whether or not a room is occupied. Thus, we are able to assume that the lighter a room is, the extra possible will probably be occupied.

Let’s see if that is true by wanting on the information.

Observe the Information

To validate our guess, let’s use a field plot to search out the distinction within the quantity of sunshine between an occupied room (Occupancy=1) and an empty room (Occupancy=0).

import plotly.specific as px
import plotly.graph_objects as go

function = "Mild"
px.field(data_frame=prepare, x=goal, y=function)

Picture by Writer

We will see a major distinction within the median between an occupied and an empty room.

Begin with Easy Guidelines

Now, we are going to create guidelines for whether or not a room is occupied primarily based on the sunshine in that room. Particularly, if the quantity of sunshine is above a sure threshold, Occupancy=1 and Occupancy=0 in any other case.

Picture by Writer

However what ought to that threshold be? Let’s begin with choosing 100 to be threshold and see what we get.

Picture by Writer

To create a rule-based mannequin with human-learn, we are going to:

  • Write a easy Python operate that specifies the principles
  • Use FunctionClassifier to show that operate right into a scikit-learn mannequin
import numpy as np
from hulearn.classification import FunctionClassifier

def create_rule(information: pd.DataFrame, col: str, threshold: float=100):
return np.array(information[col] > threshold).astype(int)

mod = FunctionClassifier(create_rule, col='Mild')

Predict the take a look at set and consider the predictions:

mod.match(train_X, train_y)
preds = mod.predict(val_X)
print(classification_report(val_y, preds))
Picture by Writer

The accuracy is healthier than what we received earlier utilizing RandomForestClassifier!

Enhance the Guidelines

Let’s see if we are able to get a greater outcome by experimenting with a number of thresholds. We are going to use parallel coordinates to research the relationships between a particular worth of sunshine and room occupancy.

from hulearn.experimental.interactive import parallel_coordinates

parallel_coordinates(prepare, label=goal, top=200)

Picture by Writer

From the parallel coordinates, we are able to see that the room with a lightweight above 250 Lux has a excessive likelihood of being occupied. The optimum threshold that separates an occupied room from an empty room appears to be someplace between 250 Lux and 750 Lux.

Let’s discover one of the best threshold on this vary utilizing scikit-learn’s GridSearch.

from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(mod, cv=2, param_grid={"threshold": np.linspace(250, 750, 1000)})
grid.match(train_X, train_y)

Get one of the best threshold:

best_threshold = grid.best_params_["threshold"]
best_threshold
> 364.61461461461465

Plot the brink on the field plot.

Picture by Writer

Use the mannequin with one of the best threshold to foretell the take a look at set:

human_preds = grid.predict(val_X)
print(classification_report(val_y, human_preds))
Picture by Writer

The brink of 365 provides a greater outcome than the brink of 100.

Utilizing area information to create guidelines with a rule-based mannequin is sweet, however there are some disadvantages:

  • It doesn’t generalize properly to unseen information
  • It’s tough to provide you with guidelines for advanced information
  • There is no such thing as a suggestions loop to enhance the mannequin

Thus, combing a rule-based mannequin and an ML mannequin will assist information scientists scale and enhance the mannequin whereas nonetheless with the ability to incorporate their area experience.

One simple solution to mix the 2 fashions is to resolve whether or not to cut back false negatives or false positives.

Scale back False Negatives

You may need to cut back false negatives in eventualities corresponding to predicting whether or not a affected person has most cancers (it’s higher to make a mistake telling sufferers that they’ve most cancers than to fail to detect most cancers).

To scale back false negatives, select optimistic labels when two fashions disagree.

Picture by Writer

Scale back False Positives

You may need to cut back false positives in eventualities corresponding to recommending movies that may be violent to children (it’s higher to make the error of not recommending kid-friendly movies than to suggest grownup movies to children).

To scale back false positives, select adverse labels when two fashions disagree.

Picture by Writer

You may also use different extra advanced coverage layers to resolve which prediction to select from.

For a deeper dive into find out how to mix an ML mannequin and a rule-based mannequin, I like to recommend checking this glorious video by Jeremy Jordan.

Congratulations! You may have simply realized what a rule-based mannequin is and find out how to mix it with a machine-learning mannequin. I hope this text provides you the information wanted to develop your personal rule-based mannequin.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments