Everyone knows that Python has risen above its humble beginnings such that it now powers billion greenback firms. Let’s not overlook Python’s roots, although! It’s nonetheless a superb language for working fast and soiled scripts that automate some activity. Whereas this works nice for automating my very own duties as a result of I understand how to navigate the command line, it’s a bit a lot to ask a layperson to one way or the other set up python and dependencies, open Terminal on a Mac (god enable you if they’ve a Home windows laptop), sort a random string of characters, and hit enter. Ideally, you’d give the layperson a button, they hit it, and so they get their outcome.
I lately deployed an answer which allowed me to just do this. Even higher – it was all free! On this submit, I’ll discuss how I used Google Sheets as my enter type, datasheets to transform Google Sheets to pandas, Zappa for deploying a serverless Flask app, and PuLP for fixing a fast integer programming downside to make a easy and free advert hoc optimization service.
Word: all of the code for this service is situated on my github
FML
Each undertaking ought to begin as an issue, and mine was no completely different. My spouse competes in fantasy film league. That is like fantasy soccer for film geeks. The principles are easy:
You’re a fantasy movie show proprietor. You need to determine which films to play in your 8 screens. Every film prices a distinct quantity to display, and the aim is to generate essentially the most field workplace income over the weekend given your out there finances. Speaking together with her, I noticed that, if one can do a very good job predicting field workplace income for the weekend (the laborious a part of the issue), then deciding what number of screens to play every film turns into a easy integer programming allocation downside.
Necessities
Now that we now have the issue, what are the necessities?
- A technique for inputting a bunch of information:
- Film identify
- Anticipated income
- Value to display
- Means to run the allocation downside from a browser.
- A view of the answer
What’s the best enter type that information scientists hate?
Excel
What’s worse than Excel?
Google Sheets
Datasheets
Fortunately, Squarespace created datasheets. It is a good library that makes interactions between pandas
and Google Sheets impressively painless. The library is price it for the detailed OAuth web page alone (I as soon as spent 2 weeks battling Google OAuth ache and actually want this web page had existed at the moment). What’s notably good in regards to the OAuth web page is that it walks via organising a service account which doesn’t require the end-user to undergo the standard OAuth dance of browser redirects to and from the Google login web page. That is nce as a result of these redirects can get tousled when shifting from native improvement to manufacturing programs within the cloud (or not less than they all the time get tousled when I attempt to do it!).
Anywho, step one was to setup my Google Sheets credentials and obtain the client_secrets.json
and service_key.json
information. With these helpful, I can now entry my Google Sheets spreadsheet utilizing datasheets
. The spreadsheet is known as FML
, and the inputs
tab appears like
We will pull this right into a pandas DataFrame by setting some datasheets
setting variables to level to our credentials after which making a Consumer
import os
import datasheets
os.environ['DATASHEETS_SECRETS_PATH'] = 'client_secrets.json'
os.environ['DATASHEETS_SERVICE_PATH'] = 'service_key.json'
consumer = datasheets.Consumer(service=True)
If that goes properly, we will now seize our workbook (aka the Google Sheets file) and obtain the tab of information
workbook = consumer.fetch_workbook('FML')
input_tab = workbook.fetch_tab('inputs')
input_data = input_tab.fetch_data()
input_data
film | income | value | |
---|---|---|---|
0 | Lodge Transylvania | 13600000.0 | 157.0 |
1 | Ant Man | 9100000.0 | 116.0 |
2 | Skyscraper | 5300000.0 | 61.0 |
3 | Incredibles 2 | 7900000.0 | 89.0 |
4 | Jurassic World | 6700000.0 | 76.0 |
5 | Purge | 2400000.0 | 28.0 |
6 | Sorry to Trouble | 1800000.0 | 18.0 |
7 | MI: Fallout | 63600000.0 | 756.0 |
8 | Mamma Mia | 19800000.0 | 227.0 |
9 | Equalizer | 18300000.0 | 201.0 |
10 | Unfriended | 1600000.0 | 18.0 |
11 | Blindspotting | 3000000.0 | 41.0 |
12 | Teen Titans | 13400000.0 | 149.0 |
13 | Three Idential Strangers | 1100000.0 | 16.0 |
14 | Eighth Grade | 946000.0 | 26.0 |
Allocating Films
I’ve written beforehand about integer programming in Python utilizing the PuLP bundle, so I’ll keep away from the introductions to integer programming and pulp
. For this submit, I’ll simply rapidly summarize the optimization downside, because it’s fairly easy!
We solely have a single resolution variable in our downside. Within the code, I name this movie_counts
. In math, we will name it $S_{m}$ which corresponds to what number of screens we are going to play film $m$ on for the weekend. That is an integer resolution variable with a decrease certain of 0 and an higher certain of 8 (the variety of screens we now have out there in our fantasy movie show). It’s an integer variable as a result of we can’t display a film on 2.5 screens.
With our resolution variable in hand, we should now outline an goal operate. We merely need to maximize anticipated income. Let’s outline a amount $r_{m}$ which is the amount of cash that we count on film $m$ to usher in (that is the income column within the above DataFrame). Our goal operate is then merely
$$sum_{m} r_{m} * S_{m}$$
Lastly, we’d like some constraints. We solely have two, however, earlier than I introduce them, I must introduce one slight wrinkle in fantasy film league. You get charged $2 million for each display that you just go away empty. We will incorporate this into our optimization downside by assuming that there’s an _extra_ film referred to as “Empty Display screen” and that the anticipated income for that film is _negative_ $2 million. Our two constraints can now be outlined:
- Each display have to be assigned a film
$$ sum_{m} S_{m} = 8 $$ - We have now a restricted finances of $1000. Let’s say film $m$ prices $c_{m}$ to display. Our finances constraint is thus
$$ sum_{m} c_{m} * S_{m} leq 1000 $$
And that’s it: one sort of resolution variable, a easy goal operate, and two constraints. In case you’re , I wrap all the above steps into an Optimizer
class within the [fml code]((https://github.com/EthanRosenthal/fml/blob/grasp/fml/optimizer.py).
With the optimization downside full, I can pack up the answer as a DataFrame and use datasheets
to write down the info again to the outputs
tab of the spreadsheet
resolution = ...
outputs_tab = workbook.fetch_tab('outputs')
outputs_tab.insert_data(resolution)
Painless Serverless
The ultimate step was to create a tiny Flask app with a button to launch the optimization. I made the only barebones website that I may, after which it was time to deploy.
Zappa is a ridiculously cool Python library that permits you to run any Python software as an AWS Lambda operate and make all of it discoverable through API Gateway. What this implies is that you would be able to make a Python web site and run it within the cloud with out an precise server working the code (so long as your web site runs rapidly, and makes use of few assets). You solely pay for every time the web site runs, however the first million instances per 30 days are free. If my spouse occurs to run this greater than 1 million instances, then I’ll fortunately pay cash.
I used to be blown away by how straightforward Zappa
was to make use of. Actually, the toughest half was determining methods to set up python 3.6 on my linux laptop as a result of you must use virtualenv
as an alternative of conda
(although there’s a PR to repair that).
I’m simply going to repeat the documentation on methods to get Zappa
working as a result of that is actually all that I needed to do:
pip set up zappa
zappa init
zappa deploy
After your entire code will get zipped up and despatched to the cloud, Zappa
tells you what cryptic URL at which now you can discover your app. You should use customized domains and a gazillion different choices, however that is fast and soiled serverless integer programming, bear in mind?
With the web site deployed, my spouse can now enter information into the spreadsheet, hit the Calculate
button on the web site, after which watch the spreadsheet for the optimum film screens with nary a command line in sight.