Saturday, November 9, 2024
HomeData ScienceElastic Internet Regression: From Sklearn to Tensorflow | by Eligijus Bujokas |...

Elastic Internet Regression: From Sklearn to Tensorflow | by Eligijus Bujokas | Sep, 2022


The way to make an equal elastic web regression between sklearn and Tensorflow in Python

Photograph by Mike Cox on Unsplash

This text is meant for the practitioners who wish to examine the sklearn and Keras implementation of elastic web regression. Primarily, go from Sklearn loss perform to Keras (Tensorflow) loss perform.

Most important components of the article:

  • A quick introduction to regularization in regression.
  • Sklearn implementation of the elastic web.
  • Tensorflow implementation of the elastic web.
  • Going from one framework to a different.
  • Writing a customized loss perform.
  • Evaluating the outcomes.

All of the codes are in my repository: https://github.com/Eligijus112/regularization-python

Regression is a course of in machine studying figuring out the connection between the imply worth of the response variable Y and options X.

Easy regression equation; Image by writer

The prediction for a mannequin is denoted as y with a hat and is calculated by estimating the coefficients beta from knowledge:

Prediction equation; Image by writer

The betas with no hat denote the theoretical mannequin and the betas with a hat denote the sensible mannequin, which was obtained utilizing the observable knowledge. How will we calculate the coefficients beta?

The calculation begins with defining an error. An error reveals how a lot a prediction differs from the true worth.

Error for statement i; Image by writer

We’d clearly just like the error to be as small as doable as a result of that will point out that our discovered coefficients approximate the underlying dependant variable Y the very best given our options. Moreover, we wish to give the identical weight to each the adverse and constructive (amongst different causes) errors thus we’ll introduce the squared error time period:

The squared error for statement I; Image by writer

In a regular linear regression, our aim is to attenuate the sq. error alongside all of the observations to a minimal. Thus, we are able to outline a perform known as imply squared error:

MSE perform; Photos by writer

The x and y knowledge are mounted — we can’t change what we observe. The one leverage we now have is to attempt to match completely different beta coefficients to our knowledge to get the MSE worth as little as doable. That is the general aim of linear regression — to attenuate the outlined loss perform to a minimal utilizing solely the beta coefficients.

With a view to reduce the MSE worth, we are able to use varied optimization algorithms. Some of the well-liked algorithms is stochastic gradient descent or SGD for brief.

The gradient half is that we calculate the partial derivatives by way of all coefficients:

Partial derivatives for all I; Photograph by writer

The descent half is that we iteratively replace every coefficient beta:

Descent half; Photograph by writer

The M is the variety of iterations.

The alpha fixed is a constructive worth, known as the educational fee.

The stochastic half is that we don’t use all of the samples within the dataset in a single iteration. We use a subset of knowledge, additionally known as a mini-batch, when updating the coefficients beta.

Regularization in regression is a technique that avoids constructing extra advanced fashions, in order to keep away from the danger of overfitting. There are numerous forms of regularization, however three of the preferred ones are:

We regularize by including sure phrases to the loss perform which we optimize. Within the case of Lasso regression, the normal MSE turns into:

Generic Lasso formulation; Image by writer

We add the sum of absolute coefficient values within the new loss perform. The larger absolutely the sum of the coefficients, the upper the loss. Thus, when optimizing, the algorithm will get penalized for giant coefficients.

Generic Ridge formulation; Image by writer

Ridge regression places a good greater coefficient penalty as a result of the magnitude will get squared. One disadvantage of ridge regression is that such a regression doesn’t set any coefficients to 0, whereas lasso can be utilized for characteristic discount, as a result of it could actually set among the coefficients to zero.

Generic elastic web formulation; Photograph by writer

Elastic web combines each ridge and lasso regression. Lambda values normally vary between 1 and 0.

Now that we now have a short theoretical overview out of the best way, we are able to transfer on to the sensible components. The information we’ll use is taken from Kaggle¹ concerning home gross sales within the US from 2014 to 2015.

Uncooked knowledge; Desk by writer

We’ll attempt to clarify the worth of a home utilizing the 6 options obtainable.

We’ll standardise the options. On this article, I wish to showcase the comparability between TensorFlow and Sklearn, thus I will not spend a lot time characteristic engineering the dataset.

Standardized dataset; Desk by writer

Allow us to strive the Sklearn implementation of Elastic Internet

https://scikit-learn.org/secure/modules/generated/sklearn.linear_model.ElasticNet.html

The loss perform utilized in Sklearn is:

Scikit study elastic web loss perform; Image by writer

Allow us to set alpha = 1 and lambda (known as l1 ratio in Sklearn) to 0.02.

# Becoming the mannequin to knowledge utilizing sklearn
el = ElasticNet(alpha=1.0, l1_ratio=0.02)
el.match(d[features], d[y_var])
# Extracting the coefs
coefs = el.coef_
# Making a dataframe with the coefs
coefs_df = pd.DataFrame({‘characteristic’: options, ‘coef_sk’: coefs})
# Appending the intercept
coefs_df = coefs_df.append({‘characteristic’: ‘intercept’, ‘coef_sk’: el.intercept_[0]}, ignore_index=True)

The ensuing coefficient knowledge body is:

Elastic Internet regression outcomes; Desk by writer

Tensorflow is a very fashionable deep studying community. The framework lets us use the elastic web regularization as effectively utilizing the L1L2 class https://keras.io/api/layers/regularizers/#l1l2-class.

Allow us to examine two formulation — the sklearn one and TensorFlow.
Scikit study:

Scikit study elastic web

Tensorflow:

As we are able to see, TensorFlow doesn’t divide the MSE a part of the equation by 2. With a view to have comparable outcomes, we have to create our personal customized MSE perform for TensorFlow:

Customized MSE that mimics sklearn; Code by writer

After writing this tradition loss perform, we now have the next equation:

Customized loss for TensorFlow; Equation by writer

Now we are able to equalize each equations and attempt to create formulation for lambda 1 and lambda 2 in TensorFlow by way of alpha and lambda from sklearn.

Left-hand aspect: sklearn, right-hand aspect: TensorFlow; Equation by writer

Thus we’re left with:

A perform to rework sklearn regularization to TensorFlow regularization parameters:

From sklearn to TensorFlow; Code by writer

Now let’s put every part collectively and practice a TensorFlow mannequin with regularization:

# Defining the unique constants
alpha = 1.0
l1_ratio = 0.02
# Infering the l1 and l2 params
l1, l2 = elastic_net_to_keras(alpha, l1_ratio)
# Defining a easy regression neural web
numeric_input = Enter(form=(len(options), ))
output = Dense(1, activation=’linear’, kernel_regularizer=L1L2(l1, l2))(numeric_input)
mannequin = Mannequin(inputs=numeric_input, outputs=output)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.0001)
# Compiling the mannequin
mannequin.compile(optimizer=optimizer, loss=NMSE(), metrics=[‘mse’])
# Becoming the mannequin to knowledge utilizing keras
historical past = mannequin.match(
d[features].values,
d[y_var].values,
epochs=100,
batch_size=64
)

When the alpha = 1.0 and l1 ratio is 0.02, the constants for TensorFlow regularization are 0.02 and 0.49.

The coaching appears to be like easy:

Mannequin coaching; Photograph by writer

The TensorFlow and sklearn regression coefficients:

# Creating the coef body for TF
coefs_df_tf = pd.DataFrame({
‘characteristic’: options + [‘intercept’],
‘coef_tf’: np.append(mannequin.get_weights()[0], mannequin.get_weights()[1])
})
## Merging the 2 dataframes
coefs_df_merged = coefs_df.merge(coefs_df_tf, on='characteristic')
coefs_df_merged['percent_diff'] = (coefs_df_merged['coef_sk'] - coefs_df_merged['coef_tf']) / coefs_df_merged['coef_sk'] * 100

The ensuing knowledge body is:

Coefficient comparability; Desk by writer

The coefficients differ by a really small quantity. This is because of the truth that below the hood, sklearn and TensorFlow provoke the coefficients with completely different features, some randomness is concerned and we are able to all the time tune the epochs and batch measurement to get nearer and nearer outcomes.

On this article we:

  • Had a short introduction to regularization.
  • Created a customized TensorFlow loss perform.
  • Created a perform that mimics sklearn regularization in TensorFlow.
  • Created and in contrast two fashions.

Blissful studying and coding!

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments