Questioning how one can get Puppeteer to work correctly on AWS Lambda?
Youβre in the fitting place! On this put up, weβll cowl the principle challenges you possibly can encounter whereas making an attempt to try this. However first, letβs begin with introducing each Puppeteer and AWS Lambda.
What’s Puppeteer?
Merely put, PuppeteerΒ is a software program for controlling a (headless) browser. Itβs a chunk of open-source software program developed and supported by Googleβs developer instruments staff. It lets you simulate consumer interplay with a browser via a easy API. That is very useful for doing issues like automated checks or internet scraping.
An imageβs price a thousand phrases. How a lot is a gifΒ price? With slightly little bit of code proven within the gif under, I can log in to a Google account. You merely must click on, enter textual content, paginate, and scrape all of the publicly out there knowledge you want.
What’s AWS Lambda?
AWS LambdaΒ is what Amazon calls βRun code with out excited about servers or clusters.β You’ll be able to merely create a perform on Lambda after which execute it. Itβs that simple.
Merely put, you are able to do the whole lot on AWS Lambda. Okay, the whole lot is a robust phrase, however nearly. For instance, it’s attainable to scrape 1000’s of public internet pages each evening with AWS Lambda capabilities. Additionally, it manages to insert knowledge into databases.
Getting began with AWS Lambda is easy and cheap. You solely must pay for what you employ, and so they even have a beneficiant free trial.
Drawback #1 β Puppeteer is just too massive to push to Lambda
AWS Lambda has a 50 MB restrict on the zip file you push on to it. As a result of the truth that it installs Chromium, the Puppeteer bundleΒ is considerably bigger than that. Nonetheless, this 50 MB restrict doesnβt apply once you load the perform from S3! See the documentation right here.
AWS Lambda quotas may be tight for Puppeteer
The 250 MB unzipped may be bypassed by importing instantly from an S3 bucket. So we create a bucket in S3, use a node script to add to S3, after which replace our Lambda code from that bucket. The script seems to be one thing like this:
“zip”:Β “npm run construct && 7z a -r perform.zip ./dist/* Β node_modules/”,
“sendToLambda”:Β “npm run zip && aws s3 cp perform.zip s3://chrome-aws && rm perform.zip && aws lambda update-function-code –function-name puppeteer-examples –s3-bucket chrome-aws –s3-key perform.zip”
Puppeteer on AWS Lambda doesnβt work
By default, Linux (together with AWS Lambda) doesnβt embrace the mandatory libraries required to permit Puppeteer to perform.
Luckily, there already exists a bundle of Chromium constructed for AWS Lambda. You’ll find it right here. You’ll need to put in it and puppeteer-coreΒ in your perform that you’re sending to Lambda.
The common Puppeteer bundle won’t be wanted and, actually, counts in opposition to your 250 MB restrict.
npm i --save chrome-aws-lambda puppeteer-core
After which, if you find yourself setting it as much as launch a browser from Puppeteer, it should appear to be this:
constΒ browser =Β awaitΒ chromium.puppeteer
Β Β Β Β .launch({
Β Β Β Β Β Β args:Β chromium.args,
Β Β Β Β Β Β defaultViewport:Β chromium.defaultViewport,
Β Β Β Β Β Β executablePath:Β awaitΒ chromium.executablePath,
Β Β Β Β Β Β headless:Β chromium.headless
Β Β Β Β });
Remaining word
Puppeteer requires extra reminiscence than an everyday script, so control your max reminiscence utilization. When utilizing Puppeteer, we advocate a minimum of 512 MB in your AWS Lambda perform. Additionally, donβt overlook to run await browser.shut()
Β on the finish of your script. In any other case, chances are you’ll find yourself along with your perform working till timeout for no purpose as a result of the browser continues to be alive and ready for instructions.