To shut out our sequence on constructing advice fashions utilizing Sketchfab information, I’ll enterprise removed from the earlier [posts’]({{ ref “/weblog/implicit-mf-part-2” >}}) factorization-based strategies and as an alternative discover an unsupervised, deep learning-based mannequin. You’ll discover that the implementation is pretty easy with remarkably promising outcomes which is nearly a smack within the face to all of that effort put in earlier.
We’re going to construct a model-to-model recommender utilizing thumbnail pictures of 3D Sketchfab fashions as our enter and the visible similarity between fashions as our advice rating. I used to be impressed to do that after studying Christopher Bonnett’s publish on product classification, so we are going to comply with an analogous strategy.
Since our objective is to measure visible similarity, we might want to generate options from our pictures after which calculate some similarity measure between completely different pictures utilizing mentioned options. Again within the day, perhaps one would make use of fancy wavelets or SIFT keypoints or one thing for creating options, however that is the Period of Deep Studying and handbook characteristic extraction is for outdated folks.
Staying on-trend, we are going to use a pretrained neural community (NN) to extract options. The NN was initially educated to categorise pictures amongst 1000 labels (e.g. “canine”, “practice”, and so on…). We’ll chop off the final 3 fully-connected layers of the community which do the ultimate mapping between deep options and sophistication labels and use the fourth-to-last layer as an extended characteristic vector describing our pictures.
Fortunately, all of that is very simple to do with the pretrained fashions in Keras. Keras permits one to simply construct deep studying fashions on prime of both Tensorflow or Theano. Keras additionally now comes with pretrained fashions that may be loaded and used. For extra details about the out there fashions, go to the Purposes part of the documentation. For our functions, we’ll use the VGG16 mannequin as a result of that’s what different folks appeared to make use of and I don’t know sufficient to have a compelling cause to stray from the norm.
Our job is now as follows:
- Load and course of pictures
- Feed pictures by way of NN.
- Calculate picture similarities.
- Suggest fashions!
Load and course of pictures
Step one, which we received’t undergo right here, was to obtain the entire picture thumbnails. There appears to be a typical thumbnail for every Sketchfab mannequin accessible by way of their API, so I added a perform to the rec-a-sketch crawl.py script to automate downloading of all of the thumbnails.
Let’s load in our libraries and try considered one of these pictures.
import csv
import sys
import requests
import skimage.io
import os
import glob
import pickle
import time
from IPython.show import show, Picture, HTML
from keras.purposes import VGG16
from keras.purposes.vgg16 import preprocess_input
from keras.preprocessing import picture as kimage
import numpy as np
import pandas as pd
import scipy.sparse as sp
import skimage.io
sys.path.append('../')
import helpers
rand_img = np.random.selection(glob.glob('../information/model_thumbs/*_thumb200.jpg'))
img = skimage.io.imread(rand_img)
img.form
(200, 200, 3)
We see that the picture might be represented as a 3D matrix by way of two spatial dimensions (200 x 200) after which a 3rd RGB dimension. We have now to do a few preprocessing steps earlier than feeding a picture by way of the VGG16 mannequin. The photographs should be resized to 224 x 224, the colour channels should be normalized, and an additional dimension should be added resulting from Keras anticipating to recieve a number of fashions. Fortunately, Keras has built-in capabilities to deal with most of this.
img = kimage.load_img(rand_img, target_size=(224, 224))
x = kimage.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print(x.form)
(1, 224, 224, 3)
We will now load our mannequin in and check out feeding the picture by way of.
# image_top=False removes remaining related layers
mannequin = VGG16(include_top=False, weights='imagenet')
pred = mannequin.predict(x)
print(pred.form)
print(pred.ravel().form)
(1, 7, 7, 512)
(25088,)
We’ll later must flatten the output of the mannequin into an extended characteristic vector. One factor that ought to be famous is the time that it takes to run a single mannequin although the NN on my 4-core machine:
%%timeit -n5
pred = mannequin.predict(x)
5 loops, greatest of three: 905 ms per loop
That is fairly large when you think about the truth that we will likely be processing 25,000 pictures! We’ll now undergo the above preprocessing steps for each mannequin that we educated within the earlier recommender weblog posts. We will discover these fashions by importing our “likes” information, filtering out low-interaction fashions and customers (as earlier than), and select the fashions which might be leftover.
df = pd.read_csv('../information/model_likes_anon.psv',
sep='|', quoting=csv.QUOTE_MINIMAL,
quotechar='')
df.drop_duplicates(inplace=True)
df = helpers.threshold_interactions_df(df, 'uid', 'mid', 5, 5)
# model_ids to maintain
valid_mids = set(df.mid.distinctive())
Feed pictures by way of NN
With our set of legitimate mannequin IDs in hand, we are able to now run by way of the lengthy technique of loading in the entire picture recordsdata, preprocessing them, and operating them by way of the VGG
prediction. This takes a very long time, and sure steps blowup reminiscence. I’ve determined to batch issues up beneath and embody some print statements in order that one can observe progress. Beware: this takes a very long time!
# Seize related filenames
get_mid = lambda x: x.cut up(os.path.sep)[-1].cut up('_')[0]
fnames = glob.glob('../information/model_thumbs/*_thumb200.jpg')
fnames = [f for f in fnames if get_mid(f) in valid_mids]
idx_to_mid = {}
batch_size = 500
min_idx = 0
max_idx = min_idx + batch_size
total_max = len(fnames)
n_dims = preds.ravel().form[0]
px = 224
# Initialize predictions matrix
preds = sp.lil_matrix((len(fnames), n_dims))
whereas min_idx < total_max - 1:
t0 = time.time()
X = np.zeros(((max_idx - min_idx), px, px, 3))
# For every file in batch,
# load as row into X
for i in vary(min_idx, max_idx):
fname = fnames[i]
mid = get_mid(fname)
idx_to_mid[i] = mid
img = picture.load_img(fname, target_size=(px, px))
img_array = picture.img_to_array(img)
X[i - min_idx, :, :, :] = img_array
if i % 200 == 0 and i != 0:
t1 = time.time()
print('{}: {}'.format(i, (t1 - t0) / i))
t0 = time.time()
max_idx = i
t1 = time.time()
print('{}: {}'.format(i, (t1 - t0) / i))
print('Preprocess enter')
t0 = time.time()
X = preprocess_input(X)
t1 = time.time()
print('{}'.format(t1 - t0))
print('Predicting')
t0 = time.time()
these_preds = mannequin.predict(X)
shp = ((max_idx - min_idx) + 1, n_dims)
# Place predictions inside full preds matrix.
preds[min_idx:max_idx + 1, :] = these_preds.reshape(shp)
t1 = time.time()
print('{}'.format(t1 - t0))
min_idx = max_idx
max_idx = np.min((max_idx + batch_size, total_max))
Calculate picture similarities
I’d suggest writing the predictions to disk right here (don’t need the kernel to die and lose all that work!). The preds
matrix consists of a single row for every picture with 25,088 sparse options as columns. To calculate item-item suggestions, we should convert this characteristic matrix right into a similarity matrix.
def cosine_similarity(rankings):
sim = rankings.dot(rankings.T)
if not isinstance(sim, np.ndarray):
sim = sim.toarray()
norms = np.array([np.sqrt(np.diagonal(sim))])
return (sim / norms / norms.T)
preds = preds.tocsr()
sim = cosine_similarity(preds)
Suggest fashions!
Utilizing the similarity matrix, we are able to reuse some outdated capabilities from earlier posts to visualise some the suggestions. I’ve added on some HTML in order that clicking on the photographs hyperlinks out to their Sketchfab pages. Let’s take a look at a pair!
def get_thumbnails(sim, idx, idx_to_mid, N=10):
row = sim[idx, :]
thumbs = []
mids = []
for x in np.argsort(-row)[:N]:
response = requests.get('https://sketchfab.com/i/fashions/{}'
.format(idx_to_mid[x])).json()
thumb = [x['url'] for x in response['thumbnails']['images']
if x['width'] == 200 and x['height']==200]
if not thumb:
print('no thumbnail')
else:
thumb = thumb[0]
thumbs.append(thumb)
mids.append(idx_to_mid[x])
return thumbs, mids
def display_thumbs(thumbs, mids, N=5):
thumb_html = "<a href='{}' goal='_blank'>
<img model='width: 160px; margin: 0px;
border: 1px stable black; show:inline-block'
src='{}' /></a>"
pictures = "<div class='line' model='max-width: 640px; show: block;'>"
show(HTML('<font measurement=5>'+'Enter Mannequin'+'</font>'))
hyperlink = 'http://sketchfab.com/fashions/{}'.format(mids[0])
url = thumbs[0]
show(HTML(thumb_html.format(hyperlink, url)))
show(HTML('<font measurement=5>'+'Related Fashions'+'</font>'))
for (url, mid) in zip(thumbs[1:N+1], mids[1:N+1]):
hyperlink = 'http://sketchfab.com/fashions/{}'.format(mid)
pictures += thumb_html.format(hyperlink, url)
pictures += '</div>'
show(HTML(pictures))
Conclusion
Wow! With this utterly unsupervised technique and nil hyperparameter tuning, we get strikingly well-matched pictures. This may really feel considerably irritating – why did we spend all that point with these math-heavy, brain-stretching factorization algorithms after we might simply feed every little thing by way of a deep studying mannequin? Firstly, it could be troublesome to carry out user-to-item suggestions or the tag-recommendations from final publish. Secondly, it appears that evidently this visible similarity mannequin and the implicit suggestions fashions serve completely different functions.
The NN does precisely what we count on – it finds comparable pictures. The implicit suggestions mannequin finds different fashions that comparable customers have appreciated. What tends to occur is that the likes-based suggestions discover fashions that share comparable themes or enchantment to sure clusters of customers. For instance, we might even see that varied anime characters get grouped collectively, or renderings of medieval armor and weapons. If we had been to feed one of many medieval weapons into the NN, then we might discover different examples of solely that actual weapon which doubtless span throughout many intervals of time.
I did try to mix the LightFM mannequin with this NN mannequin by taking the NN output options and utilizing them as facet data within the LightFM mannequin. There have been sometimes ~2500 nonzero NN options for every mannequin which completely blew up the coaching time of the LightFM mannequin. It took half-hour to compute the precision at ok. I shuttered on the concept of calculating studying curves and grid searches, so I gave up! Possibly sometime I’ll spin up a large EC2 field and see what occurs.
Subsequent publish, I wrap up this sequence by writing about how I constructed out a Flask app on AWS referred to as Rec-a-Sketch to serve up interactive Sketchfab suggestions. Thanks for studying!