Tuesday, August 9, 2022
HomeData ScienceWhat Occam’s Razor Means in Machine Studying

What Occam’s Razor Means in Machine Studying


A sensible clarification of this legislation of simplicity

Photograph by Максим Рыжкин on Unsplash

“All the pieces ought to be made so simple as potential, however not less complicated.”

— Albert Einstein

Some time in the past, I wrote an article concerning the No Free Lunch Theorem — a theorem impressed by philosophy that means that no single machine studying algorithm is the universally best-performing algorithm for all issues.

Just like the No Free Lunch Theorem, one other idea in machine studying that has origins in philosophy is Occam’s Razor. Occam’s Razor principally states that given competing theories and explanations, the only ones ought to be most well-liked. Within the phrases of Ockham, “entities should not be multiplied past necessity.”

Occam’s Razor is an thought that will have originated from earlier philosophers. In reality, Aristotle (384–322 BC) acknowledged that “we could assume the prevalence, different issues being equal, of the demonstration which derives from fewer postulates or hypotheses.” Ptolemy (c. AD 90 — c. 168) made a fair less complicated, but equal assertion — “We think about it precept to elucidate the phenomena by the only speculation potential.”

Aristotle was the primary to state the concept behind Occam’s Razor. Picture supply: After Lysippos, Public area, through Wikimedia Commons

It wasn’t till over a thousand years later that an English Franciscan friar and theologian named William of Ockham made the assertion “entities should not be multiplied past necessity”, which later turned referred to as Occam’s Razor. In different phrases, with all else being equal, less complicated options to issues are most well-liked over extra advanced ones.

Occam’s Razor is a direct consequence of primary likelihood principle. By definition, extra advanced theories contain extra assumptions. The extra assumptions we add to the idea, the better the likelihood that one of many assumptions is inaccurate. If an assumption doesn’t make our principle extra correct, it solely will increase the likelihood that your complete principle is fallacious.

For instance, if we wished to elucidate why the sky is blue, a easy clarification primarily based on the properties of sunshine is extra more likely to be right than an evidence that entails aliens in area scattering blue mud into the Earth’s ambiance to present the sky it’s shade. The extra advanced clarification entails extra assumptions, lots of which haven’t be scientifically validated, and thus the extra advanced principle, though it predicts the identical outcome, is extra more likely to be incorrect.

Occam’s Razor sounds easy, however what does it actually imply in sensible machine studying? In 1996, Pedro Domingos formally utilized Occam’s Razor to machine studying, introducing the next implications, which he referred to as “Occam’s Two Razors”:

  • First razor: Given two fashions with the identical generalization error, the less complicated one ought to be most well-liked as a result of simplicity is fascinating in itself.
  • Second razor: Given two fashions with the identical training-set error, the less complicated one ought to be most well-liked as a result of it’s more likely to have decrease generalization error.

Domingos famous that the primary razor is true however the second, though implied by Occam’s Razor, isn’t essentially true. In reality, making selections primarily based on a mannequin’s efficiency on coaching knowledge that it has already seen is a foul follow.

So if you wish to use Occam’s Razor in follow, right here’s learn how to do it: if two fashions have the identical efficiency on the validation/testing dataset choose the less complicated mannequin as a result of it’s extra more likely to generalize nicely. Occam’s Razor is actually simply an instance of the bias-variance tradeoff in machine studying.

The bias-variance tradeoff. Supply: Fundamentals of Scientific Knowledge Science, licensed below CC by 4.0.

As demonstrated within the diagram above, Occam’s Razor is a press release of a common optimization drawback in machine studying. When choosing a mannequin to make use of for any drawback, we would like a mannequin that’s advanced sufficient to keep away from underfitting, and easy sufficient to keep away from overfitting.

Generally Occam’s Razor could also be misinterpreted as stating that less complicated fashions are all the time higher than extra advanced fashions. Nonetheless, this concept violates each the predicate of Occam’s first razor (“given two fashions with the identical generalization error”) and the no free lunch theorem, which principally states that no single machine studying algorithm is universally superior to all different algorithms throughout all machine studying issues. When selecting between two fashions, we are able to solely say an easier mannequin is best if it’s generalization error is the same as or lower than that of the extra advanced mannequin.

Whereas Occam’s Razor solely applies when an easier mannequin achieves the identical or higher generalization error than extra advanced fashions, there are sensible conditions the place we could select an easier mannequin even when it’s generalization error is decrease than that of the extra advanced fashions. In reality less complicated fashions could present the next benefits:

  • Much less reminiscence utilization.
  • Sooner inference instances.
  • Higher explainability.

For instance, think about a fraud detection drawback the place a choice tree achieves an accuracy of 98 p.c and a neural community achieves an accuracy of 99 p.c. Let’s additionally assume that this drawback requires fast inference instances and our deployment server has reminiscence limitations. Lastly, let’s add the requirement that the mannequin’s predictions must be defined to a regulatory physique throughout the firm that’s engaged on this use case.

On this case, a choice tree is a significantly better mannequin than a neural community given the extra necessities of this drawback. The choice tree is probably going a smaller mannequin with quicker inference instances and is far simpler to elucidate than a neural community. Except if the one p.c drop in accuracy is important within the context of this drawback, it will be a sensible resolution to decide on the choice tree over the neural community.

Occam’s Razor is a philosophical concept that can be utilized in machine studying. Within the context of machine studying, the razor means that with all else being equal, an easier mannequin is be most well-liked over a extra advanced mannequin. This assertion doesn’t imply that less complicated fashions are universally higher than advanced fashions, however moderately {that a} mannequin should be advanced sufficient to be taught the patterns in a dataset however easy sufficient to keep away from overfitting.

So if you wish to use Occam’s Razor with out getting lower, ensure you evaluate the generalization errors of various fashions and think about the sensible necessities of the issue you’re fixing earlier than deciding to decide on an easier mannequin over a extra advanced mannequin.

  1. B. Duignan, Occam’s razor, (2021), Brittanica.
  2. P. Domingos, Ockham’s Two Razors: The Sharp and the Blunt, (1998), KDD’98: Proceedings of the Fourth Worldwide Convention on Information Discovery and Knowledge Mining

Be part of my mailing listing to get updates on my knowledge science content material. You’ll additionally get my free Step-By-Step Information to Fixing Machine Studying Issues while you join! You too can comply with me on Twitter for content material updates.

And whilst you’re at it, think about becoming a member of the Medium group to learn articles from 1000’s of different writers as nicely.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments