Chrome Puppeteer

Chrome Puppeteer lets you run and control a browser in the cloud. You can do anything a user can do manually.

Great for taking screenshots, scraping data, and manual testing.

The Chrome Puppeteer Documentation gives you full API docs for what's possible.

Load a website

Start with a lambda that runs chrome and loads a website.

Exercise

Move into exercise-4 from the serverless-workshop-exercises GitHub repository.

I've preconfigured it with necessary dependencies:

aws-lambda
chrome-aws-lambda
puppeteer@3.1.0

Tell serverless to avoid packaging the entire browser.

# serverless.yml
package:
  exclude:
    - node_modules/puppeteer/.local-chromium/**

Add your screenshot function to serverless.yml. Use a GET.

Make sure to specify a large memorysize: (2536 is good) and long timeout: (30 is max). Gives Chrome room to breathe :)

Use the getChrome() method from src/utils to instantiate your browser.

Open a new tab and load a page:

const page = await browser.newPage()
await page.goto(<your url>, {
  waitUntil: ["domcontentloaded", "networkidle2"],
})

Grab the first H1 element and return its value.

const h1value = await page.$eval("h1", (el) => el.innerHTML);

Try getting the URL from query params :)

Try your function

Leave payload empty for GET requests, valid JSON for POST.

Target URL:JSON payload:

Solution

https://github.com/Swizec/serverless-workshop-exercises/commit/0264e4efdc6a647c5690bb23be91f7c27a012a87

Take a screenshot

A fun way to use Puppeteer is taking screenshots.

Exercise

Tell your API Gateway it's okay to serve binary files.

# serverless.yml
provider:
  # ...
  apiGateway:
    binaryMediaTypes:
      - "*/*"

Get the first H1 element again and measure its size.

Screenshots work on pixels, not the DOM. If you pick a large element like body you might run into problems with screenshots being too large for Puppeteer to handle.

const element = await page.$("h1")
const boundingBox = await element.boundingBox()

Take a screenshot with:

const imagePath = `/tmp/screenshot-${new Date().getTime()}.png`
await page.screenshot({
  path: imagePath,
  clip: boundingBox,
})
const data = fs.readFileSync(imagePath).toString("base64")

Serve it back from your lambda with correct image headers and content encoding

return {
  statusCode: 200,
  headers: {
    "Content-Type": "image/png",
  },
  body: data,
  isBase64Encoded: true,
}

Try this one in the browser :)

Solution

https://github.com/Swizec/serverless-workshop-exercises/commit/15c27d6f491ef0ba6b2015d12085a35b0dacd713

Chrome Puppeteer

Load a website

Exercise

Try your function

Solution

Take a screenshot

Exercise

Solution

Did you enjoy this chapter?