Saturday, January 14, 2023
HomeData ScienceUtilizing Fourier Remodel of Vector Representations Derived from BERT Embeddings for Semantic...

Utilizing Fourier Remodel of Vector Representations Derived from BERT Embeddings for Semantic Closeness Analysis | by Yuli Vasiliev | Jan, 2023


Photograph by Igor Shabalin

Exploring the mutual affect of phrases in a sentence by evaluating totally different representations of BERT embeddings

BERT embedding is the factor that gives nice alternatives relating to programmatic methods of extracting which means from textual content. It appears every part we (in addition to machines) have to make sense of the textual content is hidden in these numbers. It’s only a matter of correctly manipulating these numbers. I mentioned this idea in my current submit Discovering Traits in BERT Embeddings of Totally different Ranges for the Activity of Semantic Context Figuring out. This text continues this dialogue, whether or not the vector representations obtained by making use of Fourier remodel to BERT embeddings can be helpful in NLP duties of which means extraction.

As you most likely know from physics, the Fourier remodel permits us to know the frequencies inside a sign. Does a vector representing a phrase embedding appear like a sign? That’s, might the information of the frequency area derived from the Fourier remodel be any helpful when processing BERT embeddings for locating semantic closeness? In easy phrases, does the Fourier remodel make sense when analyzing BERT embeddings? Let’s test it out.

The instance mentioned in the remainder of this text assumes you’ve gotten the mannequin outlined as mentioned within the instance within the earlier submit. We’ll additionally want the representations derived from the pattern sentences as mentioned in that earlier submit. Word, nonetheless, that this text makes use of a set of samples which might be a bit totally different from these used within the above submit. So, earlier than creating the representations, you’ll have to outline the next samples:

sents = []
sents.append(‘I had an excellent apple.’)
sents.append(‘I had an excellent orange.’)
sents.append(‘I had an excellent journey.’)

As you may see, all of the sentences have the identical set of the direct object modifiers, whereas the direct objects are totally different in every case. The aim of our experiment is to take a look at how the semantic closeness of direct objects impacts the proximity of modifiers.

The method of tokenizing and producing the hidden states for pattern sentences was coated within the above submit. So we gained’t cowl this course of once more right here.

Let’s concentrate on the phrase modifiers (“a/very/good”) of the direct object (“apple/orange/journey”) in every sentence. To make use of proper indexing when acquiring their embeddings, let’s first take a look at the tokens in our sentences:

for i in vary(len(sents)):
print(tokenizer.convert_ids_to_tokens(tokenized_text[i]))
[‘[CLS]’, ‘i’, ‘had’, ‘a’, ‘very’, ‘good’, ‘apple’, ‘.’, ‘[SEP]’]
[‘[CLS]’, ‘i’, ‘had’, ‘a’, ‘very’, ‘good’, ‘orange’, ‘.’, ‘[SEP]’]
[‘[CLS]’, ‘i’, ‘had’, ‘a’, ‘very’, ‘good’, ‘journey’, ‘.’, ‘[SEP]’]

Since we’re within the modifiers of the direct object (on this specific instance, we have now three modifiers: a, very, good.), we’d like the tokens with indexes: 3,4,5. So we have to shift indexing by 3. Under, we get hold of the contextual embedding in every sentence for every modifier. We’ll maintain modifier embeddings in an inventory outlined for every sentence:

l12_1 = []
l12_2 = []
l12_3 = []
for i in vary(3):
l12_1.append(hidden_states[0][12][0][i+3][:10].numpy())
l12_2.append(hidden_states[1][12][0][i+3][:10].numpy())
l12_3.append(hidden_states[2][12][0][i+3][:10].numpy())

Let’s now have a look at how the semantic closeness of the direct objects in several sentences impacts the closeness of their respective modifiers.

from scipy import spatial
for i in vary(3):
print(1 — spatial.distance.cosine(l12_1[i], l12_2[i]))
0.9003266096115112
0.9178041219711304
0.8865049481391907

Within the above output, we are able to see that the contextual embeddings of Apple and Orange modifiers present a excessive degree of closeness. That is fairly comprehensible as a result of the direct objects themselves: Apple and Orange are very shut.

Whereas the representations derived of the Apple and Journey modifiers will not be so shut, as will be seen from the next:

for i in vary(3):
print(1 — spatial.distance.cosine(l12_1[i], l12_3[i]))
0.49141737818717957
0.7987119555473328
0.6404531598091125

The Orange and Journey modifiers will not be presupposed to be that shut both:

for i in vary(3):
print(1 — spatial.distance.cosine(l12_2[i], l12_3[i]))
0.7402883768081665
0.8417230844497681
0.7215733528137207

Let’s now derive extra sophisticated representations from the embeddings offered by BERT. To start out with, let’s get the preliminary embeddings for the modifiers in every sentence:

l0_1 = []
l0_2 = []
l0_3 = []
for i in vary(3):
l0_1.append(hidden_states[0][0][0][i+3][:10].numpy())
l0_2.append(hidden_states[1][0][0][i+3][:10].numpy())
l0_3.append(hidden_states[2][0][0][i+3][:10].numpy())

Now we are able to, for instance, divide contextual embeddings (generated within the twelfth Encoder layer) by the corresponding preliminary embeddings (because it was mentioned within the earlier submit), to get some new embedding representations for use in additional evaluation.

import numpy as np
l0_12_1 = []
l0_12_2 = []
l0_12_3 = []
for i in vary(3):
l0_12_1.append(np.log(l12_1[i]/l0_1[i]))
l0_12_2.append(np.log(l12_2[i]/l0_2[i]))
l0_12_3.append(np.log(l12_3[i]/l0_3[i]))
for i in vary(3):
l0_12_1[i] = np.the place(np.isnan(l0_12_1[i]), 0, l0_12_1[i])
l0_12_2[i] = np.the place(np.isnan(l0_12_2[i]), 0, l0_12_2[i])
l0_12_3[i] = np.the place(np.isnan(l0_12_3[i]), 0, l0_12_3[i])

For evaluation functions, you may additionally need to create one other set of representations, calculating simply element-wise distinction between the contextual and non-contextual (preliminary) embedding.

_l0_12_1 = []
_l0_12_2 = []
_l0_12_3 = []
for i in vary(3):
_l0_12_1.append(l12_1[i]-l0_1[i])
_l0_12_2.append(l12_2[i]-l0_2[i])
_l0_12_3.append(l12_3[i]-l0_3[i])

Earlier than continuing to guage the representations we simply created, let’s create one other set of representations utilizing Fourier remodel, in order that we are able to then examine all our representations obtained by totally different strategies.

fourierTransform_1=[]
fourierTransform_2=[]
fourierTransform_3=[]
for i in vary(3):
fourierTransform_1.append(np.fft.fft(l0_12_1[i])/len(l0_12_1[i]))
fourierTransform_2.append(np.fft.fft(l0_12_2[i])/len(l0_12_2[i]))
fourierTransform_3.append(np.fft.fft(l0_12_3[i])/len(l0_12_3[i]))

Now we are able to examine the obtained representations for every pair of sentences per modifier.

print (sents[0])
print (sents[1])
print()
for i in vary(3):
print(tokenizer.convert_ids_to_tokens(tokenized_text[0][i+3]))
print(‘diff’, 1 — spatial.distance.cosine(_l0_12_1[i], _l0_12_2[i]))
print(‘log_quotient’, 1 — spatial.distance.cosine(l0_12_1[i], l0_12_2[i]))
print(‘fourier’, 1 — spatial.distance.cosine(abs(fourierTransform_1[i]), abs(fourierTransform_2[i])))
print()

The produced output ought to look as follows:

I had an excellent apple.
I had an excellent orange.
a
diff 0.8866338729858398
log_quotient 0.43184104561805725
fourier 0.9438706822278501
very
diff 0.9572229385375977
log_quotient 0.9539480209350586
fourier 0.9754009221726183
good
diff 0.8211167454719543
log_quotient 0.5680340528488159
fourier 0.7838190546462953

Within the above experiment, we anticipate to see a excessive degree of similarity between the identical modifiers within the first two sentences. In truth, we are able to see that the distinction and Fourier remodel strategies did nicely on the duty.

The aim of the next experiment is decide the closeness of the modifiers whose nouns being modified aren’t that shut.

print (sents[0])
print (sents[2])
print()
for i in vary(3):
print(tokenizer.convert_ids_to_tokens(tokenized_text[0][i+3]))
print(‘diff’, 1 — spatial.distance.cosine(_l0_12_1[i], _l0_12_3[i]))
print(‘log_quotient’, 1 — spatial.distance.cosine(l0_12_1[i], l0_12_3[i]))
print(‘fourier’, 1 — spatial.distance.cosine(abs(fourierTransform_1[i]), abs(fourierTransform_3[i])))
print()

Right here’s the output:

I had an excellent apple.
I had an excellent journey.
a
diff 0.5641788840293884
log_quotient 0.5351020097732544
fourier 0.8501702469740261
very
diff 0.8958494067192078
log_quotient 0.5876994729042053
fourier 0.8582797441535993
good
diff 0.6836684346199036
log_quotient 0.18607155978679657
fourier 0.8857107252606878

The above output exhibits that the distinction and log quotient had been the most effective when evaluating the closeness of the modifiers whose associated nouns aren’t that shut.

print (sents[1])
print (sents[2])
print()
for i in vary(3):
print(tokenizer.convert_ids_to_tokens(tokenized_text[0][i+3]))
print(‘diff’, 1 — spatial.distance.cosine(_l0_12_2[i], _l0_12_3[i]))
print(‘log_quotient’, 1 — spatial.distance.cosine(l0_12_2[i], l0_12_3[i]))
print(‘fourier’, 1 — spatial.distance.cosine(abs(fourierTransform_2[i]), abs(fourierTransform_3[i])))
print()

The outputs are the next:

I had an excellent orange.
I had an excellent journey.
a
diff 0.8232558369636536
log_quotient 0.7186723351478577
fourier 0.8378725099204362
very
diff 0.9369465708732605
log_quotient 0.6996179223060608
fourier 0.9164374584436726
good
diff 0.8077239990234375
log_quotient 0.5284199714660645
fourier 0.9069805698881434

As soon as once more, we are able to see that the distinction and log quotient turned out to be the most effective when evaluating the closeness of the modifiers whose associated nouns aren’t that shut.

Does the Fourier remodel make sense when analyzing BERT embeddings? In line with the experiment carried out on this article, we might conclude that this method can be utilized successfully together with different strategies.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments