Introduction
When working with strings or giant quantities of textual content, you’re most likely going to come across conditions the place it’s essential to rely what number of occasions a selected substring occurred inside one other string.
On this article, we’ll check out easy methods to use JavaScript to rely the variety of substring occurrences in a string. We’ll have a look at the varied approaches and strategies for acquiring that quantity.
However earlier than we start, let’s first outline what a substring is.
What Is a Substring?
A substring is a clearly outlined sequence of consecutive characters in a string. For instance, if we now have the string “My title is John Doe”, then “title is” is a substring, however “is title” is just not as a result of it’s now not a consecutive sequence (we have modified the order of phrases). Particular person phrases akin to “is” and “title” are at all times substrings.
Notice: “y title is Jo” is a sound substring of the “My title is John Doe” as effectively. In different phrases, substrings are usually not at all times entire phrases, they are often a lot much less readable.
There are a lot of methods to perform this in JavaScript, however two main strategies are the break up()
methodology and common expressions.
Rely the Variety of Substrings in String With break up() Methodology
The break up()
is a JavaScript methodology for splitting strings into an array of substrings whereas preserving the unique string. This methodology accepts a separator and separates a string based mostly on it. If no separator is equipped, the break up()
returns an array with just one component – the unique string.
Notice: In all probability the obvious instance of the separator is the clean area. Whenever you present it as a separator for the break up()
methodology, the unique string can be sliced up every time a clean area happens. Due to this fact, the break up()
methodology will return an array of particular person phrases from the unique string.
On this article, we’ll use one helpful trick to get the variety of occurrences of a substring in a string. We’ll set the substring to be the separator within the break up()
methodology. That manner, we will extract the variety of occurrences of the substring from the array that the break up()
methodology returned:
let myString = "John Doe has 5 oranges whereas Jane Doe has solely 2 oranges, Jane gave Mike 1 of her orange so she is now left with only one Orange.";
let mySubString = "orange";
let rely = myString.break up(mySubString).size - 1;
console.log(rely);
The code above returned 3
, however the myString
has just one occasion of the string “orange”. Let’s examine what occurred by analyzing the array created after we have break up the unique string with the “orange” because the separator:
console.log(myString.break up(mySubString));
It will give us:
['John Doe has 5 ', 's which Jane Doe has only 2 ', 's, Jane gave Mike 1 of her ', ' so she is now left with only 1 Orange.']
Primarily, the break up()
methodology eliminated all occurrences of the string “orange” from the unique string and sliced it in these locations the place the substring was eliminated.
Notice: Discover how that applies to the string “oranges” – the “orange” is its substring, subsequently, break up()
removes “orange” and leaves us solely with “s”.
Since we have discovered three occurrences of the string “orange”, the unique string was sliced in three locations – subsequently we have produced 4 substrings. That is why we have to subtract 1
from the array size once we calculate the variety of occurrences of the substring.
That is all good, however there’s yet one more orange within the unique string – the final phrase is “Orange”. Why have not we counted it within the earlier instance? That is as a result of the break up()
methodology is case-sensitive, subsequently it considers “orange” and “Orange” as completely different parts.
If it’s essential to make your code case-insensitive, answer could be to first convert the whole string and substring to a selected textual content case earlier than checking for occurrences:
let myString = "John Doe has 5 oranges whereas Jane Doe has solely 2 oranges, Jane gave Mike 1 of her orange so she is now left with only one Orange.";
let mySubString = "ORANGE";
let myStringLC = myString.toLowerCase();
let mySubStringLC = mySubString.toLowerCase();
let rely = myStringLC.break up(mySubStringLC).size - 1;
console.log();
Moreover, the one final thing we might do is to make our code reusable by wrapping it witha a operate:
const countOccurence = (string, phrase) => {
let stringLC = string.toLowerCase();
let wordLC = phrase.toLowerCase();
let rely = stringLC.break up(wordLC).size - 1;
return rely
};
Rely the Variety of Substrings in String With RegEx
One other methodology for counting the variety of occurrences is to make use of common expressions (RegEx). They’re patterns of characters used to go looking, match, and validate strings. In all probability the commonest use case for normal expressions is kind validation – checking whether or not the string is a (legitimate) electronic mail, a cellphone quantity, and many others. However on this article, we’ll use it to rely the variety of occurrences of a substring in a string.
If you wish to get to know extra about common expressions in JavaScript, it is best to learn our complete information – “Information to Common Expressions and Matching Strings in JavaScript”.
Initially, we have to outline a daily expression that may match the substring we’re on the lookout for. Assuming we need to discover the variety of occurrences of the string “orange” in a bigger string, our common expression will look as follows:
let regex = /orange/gi;
Try our hands-on, sensible information to studying Git, with best-practices, industry-accepted requirements, and included cheat sheet. Cease Googling Git instructions and really be taught it!
In JavaScript, we write a daily expression sample between two ahead slashes – /sample/
. Optionally, after the second ahead slash, you may put a listing of flags – particular characters used to alternate the default habits when matching patterns.
For instance, by default, common expressions match solely the primary incidence of the sample in a search string. Additionally, matching is case-sensitive, which is perhaps not what we would like when trying to find substrings. Due to that, we’ll introduce two flags we’ll be utilizing for the aim of this text:
g
– makes positive that we get all occurrences of the sample (not simply the primary one)i
– makes positive that matching is case-insensitive
Notice: Based mostly in your wants, you may select what flags you’ll use. These are usually not necessary.
Now, let’s use a beforehand created common expression to rely the variety of occurrences of the string “orange” within the myString
:
let myString = "John Doe has 5 oranges whereas Jane Doe has solely 2 oranges, Jane gave Mike 1 of her orange so she is now left with only one Orange.";
let regex = /orange/gi;
let rely = (myString.match(regex) || []).size;
console.log(rely);
Notice: We have added || []
in returns an empty array if there is no such thing as a match. Due to this fact, the variety of occurrences can be set to 0
.
Alternatively, we will use the RegExp()
constructor to create a daily expression. It accepts a search sample as the primary argument, and flags because the second:
let myString = "John Doe has 5 oranges whereas Jane Doe has solely 2 oranges, Jane gave Mike 1 of her orange so she is now left with only one Orange.";
let regex = new RegExp("orange", "gi");
let rely = (myString.match(regex) || []).size;
console.log(rely);
Moreover, we will make make this a reusable by wrapping it in a separete operate:
let countOcurrences = (str, phrase) => ;
Strict Matching Precise Phrases
Typically, you need to match for a strict phrase or phrase – in order that “oranges” is not included in your counts, or any phrase that features “orange” in itself, however is not strictly “orange”. This can be a extra particular use case of trying to find strings inside strings, and is thankfully pretty simple!
let regex = /WorangeW/gi;
By wrapping our time period inside W W
, we’re matching strictly for “orange” (case-insensitive) and this regex would match solely twice in our sentence (each “oranges” aren’t matched).
Benchmarking Efficiency
After we run each strategies utilizing the JS Benchmark, the break up methodology will at all times come out sooner than the regex methodology, although this isn’t actually noticeable even for pretty giant textual content corpora. You will most likely be fantastic utilizing both.
Notice: Don’t depend on these benchmarks as your remaining resolution. As an alternative, take a look at them out your self to find out which one is one of the best match on your particular use case.
Conclusion
On this article, we discovered about two commonplace strategies for calculating the variety of occurrences of substrings in a string. We additionally benchmarked the outcomes, noting that it does not actually matter which strategy you’re taking so long as it really works for you.