Parameterize notebooks so you may programmatically run them
You’ve educated your machine studying mannequin in a Jupyter pocket book. Now, you need to run that mannequin on information that is available in every day.
Day in, day trip, you create a brand new copy of the identical pocket book and run it. You retailer the copy of the pocket book and go that outcomes to your stakeholders.
In one other state of affairs, you could have a brand new set of information on daily basis that must be visualized with the identical code on Jupyter pocket book.
So, you create a brand new copy of the identical pocket book and modify the enter. Once more, you retailer the copy of the pocket book and go the outcomes to your stakeholders.
Doesn’t that sound painful?
That might be solved if we might have a operate run_notebook(input_data)
. By offering the parameterinput_data
, the pocket book would run with the brand new input_data. The output can be a duplicate of the pocket book run with the brand new information.
That’s precisely what Papermill does.
Right here’s a fast demo of what it does.
It’s an open-sourced instrument for parameterizing, executing, and analyzing Jupyter notebooks. Simply go in parameters to the pocket book, and the Jupyter pocket book runs routinely.
Use instances of Papermill
Operating the identical evaluation on a number of information units is time-consuming and error-prone if carried out manually. For instance, a every day reporting dashboard would possibly must be refreshed on new information every day. Papermill automates this.
Machine studying algorithms in Jupyter notebooks may be used to generate outcomes every day. As a substitute of manually working the pocket book with contemporary information every day, Papermill can be utilized to generate ends in manufacturing.
On this publish, I am going by means of:
- Set up Papermill
- Get began with Papermill
- Generate a visualization report frequently with Papermill
- Run machine studying algorithm in manufacturing with Papermill
- Different issues that you are able to do with Papermill
- Additional readings and assets
Merely run the next in your terminal.
pip set up papermill[all]
On Jupyter Pocket book
- Create a Jupyter pocket book in your Desktop and title it
hello_papermill
- Say, we need to parameterize a and b to calculate
a+b
. We’ll create this code block.
3. Go to the toolbar -> View -> Cell Toolbar -> Tags.
4. Kind parameters
to top-right nook of the primary cell. Click on ‘Add tag’. The pocket book is parameterized!
5. Begin a terminal and navigate to your Desktop.
For Mac customers, the command is:
cd ~/Desktop/
For Home windows customers, the command is
cd C:UsersYOUR_USER_NAME_HEREDesktop
6. Run your parameterized pocket book with this command. This command tells Papermill to “run hello_papermill.ipynb
with the parameters a=10
and b=20
and retailer the ends in output.ipynb
”
papermill hello_papermill.ipynb output.ipynb -p a ten -p b 20
7. From the Jupyter interface, open up the output.ipynb
. You must see the next routinely generated.
Now that we’ve got run by means of the “good day world” of Papermill, let’s dive into the primary doable use case.
Earlier than working this code, set up the next packags.
pip set up yfinance
pip set up matplotlib
On my Desktop
, I created a Jupyter pocket book known as plot_stock_price
.
This pocket book:
- takes within the parameter
inventory
, which is the ticker of the corporate (MSFT
for Microsoft,TSLA
for Tesla andMeta
for Meta), - extracts the inventory value of an organization,
- plots a graph,
- exports the graph as a file named as
output_{inventory}.png
Right here is the code.
import pandas
import yfinance
import matplotlib.pyplot as plt# deciding on a inventory ticker
inventory = 'MSFT' # This line is parameterized.
stock_list = [stock]
# downloading the inventory
stock_data = yfinance.obtain(inventory,begin="2021-10-01", finish="2022-02-14")
stock_data.head(5)
# plot solely the Opening value of ticket
information = stock_data.loc[:,"Open"].copy()
# Plot the info with matplotlib
plt.type.use("seaborn")
fig, ax = plt.subplots(1, figsize=(16,5))
ax.plot(information)
# Save the picture as a PNG file
fig.savefig(f'output_{inventory}.png')
print(f"Picture saved to output_{inventory}.png")
Subsequent, once more, open up your terminal and navigate to your Desktop. (I offered some directions above). Run this command. This tells papermill to “run plot_stock_price.ipynb
with the parameter inventory='TSLA'
and retailer the output pocket book to output.ipynb
”.
papermill plot_stock_price.ipynb output_TSLA.ipynb -p inventory 'TSLA'
You must see one thing like this in your terminal.
Enter Pocket book: plot_stock_price.ipynb
Output Pocket book: output_TSLA.ipynb
Executing: 0%| | 0/8 [00:00<?, ?cell/s]
Executing pocket book with kernel: python3
Executing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00, 1.81cell/s]
Lastly, verify your Desktop once more. You must see two information: output_TSLA.ipynb
and output_TSLA.png
. Nice, we’ve got efficiently ran a parameterized pocket book.
The subsequent use case is a well-known one.
You might have already educated your machine studying algorithm in a pocket book. Now you should run the algorithm towards new information frequently.
Earlier than working this code, set up the next packages.
pip set up scikit-learn
pip set up pandas
pip set up numpy
On my Desktop
, I created a Jupyter pocket book known as predict_linear_regression
.
This pocket book
- trains a linear regression mannequin based mostly on mock information,
- reads in an enter file (CSV) format as a parameter,
- creates mock inputs,
- makes predictions based mostly on inputs within the CSV file, and
- exports the predictions as a file named as
output_{inventory}.png
import numpy as np
from sklearn.linear_model import LinearRegression
import pandas as pdX = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# Convert the array X right into a pandas dataframe
train_df = pd.DataFrame(X, columns=['a', 'b'])
# Create a brand new column 'y' with the method y = 3 + a + 2*b
np.random.seed(5)
train_df['y'] = 3 + train_df['a'] + 2*train_df['b']
reg = LinearRegression().match(train_df.drop(columns=['y']),
train_df['y'])
# Create a mock enter dataset
X1 = np.array([[5, 3], [10, -4]])
input_set1 = pd.DataFrame(X1, columns=['a', 'b'])
input_set1.to_csv('input_set1.csv', index=False)
# Create one other mock enter dataset
X2 = np.array([[10, -1], [2, -6]])
input_set2 = pd.DataFrame(X2, columns=['a', 'b'])
input_set2.to_csv('input_set2.csv', index=False)
# Make predictions on the mock enter datasets
input_name = 'input_set1.csv' # Parameterized
output_name = 'output_set1.csv' # Parameterized
# Learn take a look at enter
pred_df = pd.read_csv(input_name)
# Make predictions
pred_df['y_pred'] = reg.predict(pred_df)
pred_df.to_csv(output_name, index=False)
Subsequent, once more, open up your terminal and navigate to your Desktop. (I offered some directions above). Run this command. This tells papermill to “run predict_linear_regression.ipynb
with the parameter input_name='input_set2.csv'
and output_name ='output_set2.csv'
. On the finish, retailer the output pocket book to predict_output.ipynb
”.
papermill ./predict_linear_regression.ipynb ./predict_output2.ipynb -p input_name './input_set2.csv' -p output_name './output_set2.csv'
You must see one thing like this in your terminal.
Enter Pocket book: ./predict_linear_regression.ipynb
Output Pocket book: ./predict_output2.ipynb
Executing: 0%| | 0/11 [00:00<?, ?cell/s]Executing pocket book with kernel: python3
Executing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:05<00:00, 2.12cell/s]
Lastly, verify your Desktop once more. You must see two information: predict_output2.ipynb
and output_set2.csv
. Nice, we’ve got efficiently run a parameterized pocket book for a machine studying use case.
There are extra helpful functionalities of Papermill. Right here I spotlight a couple of fascinating makes use of.
Operating notebooks as features
As a substitute of executing the pocket book within the terminal, you may execute the pocket book as a operate. Listed below are two alternate options that obtain the identical aim.
Different 1: In terminal, navigate to the listing containing predict_linear_regression.ipynb
. Then, run the command
papermill ./predict_linear_regression.ipynb ./predict_output2.ipynb -p input_name './input_set2.csv' -p output_name './output_set2.csv'
Different 2: In Jupyter, navigate to the listing containing predict_linear_regression.ipynb
. Create a brand new pocket book there and run the command.
import papermill as pmpm.execute_notebook(
'predict_linear_regression.ipynb',
'predict_output2.ipynb',
parameters = {'input_name': './input_set2.csv',
'output_name': './output_set2.csv')
Add your outcomes to Amazon S3 or cloud
$ papermill predict_linear_regression.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
Present a .yml file as an alternative of a parameter
$ papermill predict_linear_regression.ipynb s3://bkt/output.ipynb -f parameters.yaml
On this weblog publish, we’ve got learnt the best way to parameterize a pocket book with Papermill. It’s a superb instrument for optimizing your information science workflow. Give it a attempt!