Introduction to workers and why we should use them

5 0

Nov 12 '21

As we know that browser runs all the Javascript of a web page in a single thread - the main thread, any excessive Javascript code may block the main thread and makes the page look laggy or even unresponsive.

We may improve this by off-loading some tasks to workers, and turns the single-threaded app into a multi-threaded, higher-performance, and potentially safer app.

Introduction of workers

There are 3 kinds of workers in the browser context:

Web worker - including dedicated worker, and shared worker
Service worker
Worklet- including PaintWorklet, AudioWorklet, AnimationWorklet, LayoutWorklet.

I will focus on the web worker and service worker in this post, as worklet is still experimental and designed for more specific usages.

Web worker

Web worker is designed as the background thread for running scripts. Web pages are allow to spawn multiple workers to do different tasks in an isolated context.

(Image from Workers Overview)

We may create such thread dedicated to a web page via the Worker API:

const myWorker = new Worker('worker.js');

Or sharing the thread among the same origin via the SharedWorker API.

const mySharedWorker = new SharedWorker('sharedWorker.js');

Their usage is almost the same but the connection between the web page and the shared worker is assigned ports. To communicate with the shared worker, the web page need to go through the port assigned.

In the main thread:

mySharedWorker.port.postMessage('Test message to worker.');
mySharedWorker.port.onmessage = (e) => {
  console.log('Receive data from shared worker:', e.data);
};

In the shared worker:

onconnect = (e) => {
  const port = e.ports[0];
  port.onmessage = (e) => {
    port.postMessage('Hi, I received your message "${e.data}"!`);
  };
}

While communicating with the dedicated worker is more straightforward.

In the main thread:

myWorker.postMessage('Test message to worker.');
myWorker.onMessage = (e) => {
  console.log('Receive data from dedicated worker:', e.data);
};

In the dedicated worker:

onmessage = (e) => {
  postMessage('Hi, I received your message "${e.data}"!`);
};

Another difference you may notice is that the dedicated worker is coupled with the web page, hence it will be terminated once user close the page.

Service worker

Service worker is designed as an extra layer working between the web page and the server.

(Image from Workers Overview)

Unlike the web worker, it has a more complicated lifecycle:

Register in main thread via navigator.serviceWorker.register(), and download the script for the service worker;
Install if the service worker is newly registered, or the script is updated;
Activate when no "old" service worker is controlling the clients under its scope;
Redundant when it is out-dated and being replaced.

It is given a set of APIs and events to achieve things like:

caching data
intercepting requests
managing browser push notifications

On top of these capabilities, we do a lot more interesting things. We will discuss these later.

Their limitations

Both web worker and service worker are quite powerful, but meanwhile they have various limitations. Including:

No access to DOM, window object and some other APIs like localStorage. This restriction keeps main thread and worker isolated and not disturbing each other. However, this also means if you would like off-load some heavy jobs like DOM operations, it's not possible with worker.

Messages between the main thread and the worker are cloned via the structured clone algorithm which does not work with various data types. And unlike JSON.stringify() simply strip off the data it does not support, passing a data type the structured clone does not support will simply throw an exception.

Experimental and limited support for ES modules. As mordern browsers support ES module better, we could lower down quite some overhead on compiling our scripts, and run ES module directly in the browser. But, that is not quite the case in the workers.

As of Mar 1, 2019, only Chrome 80+ supports this feature, while Firefox has an open feature request. No other browsers are known to have support for production usage of worker scripts written as modules.

In these browsers, you may spawn a worker with ES module via:

new Worker('worker.js', {
  type: 'module'
});

However, the ES module syntax import and export may not work well as you expected.

Spawning a worker from a worker almost cannot work, although some browser claims that we can spawn a sub worker from a dedicated worker. This could be a bit frastrating sometimes as some library (e.g. esbuild-wasm) has built-in worker logics, which prevent us from organizing the tasks properly according to our needs, and have to communicate between the worker in the library and our own worker through the main thread message channel.

Performance improvement with workers

Off-loading heavy tasks

As introduced previously in "Introduction of workers", we may assign some heavy tasks to workers to avoid blocking the main thread and affecting user experiences.

For example, when we have to group or process the raw data from BE for display, we may spawn a web worker to do this while continuing to render other things.

In the main thread, we spawn a web worker, then instruct the task and wait for the result:

function getData() {
  const myWorker = new Worker('processData.js');
  return new Promise((resolve, reject) => {
    myWorker.onMessage = ({ data }) => {
      resolve(data);
      myWorker.terminate();
    }
    myWorker.onError = (e) => {
      reject(e);
      myWorker.terminate();
    }
    myWorker.postMessage({ sortBy: 'update_time' });
  });
}

getData().then(data => console.log(data));

In the processData.js:

function process(rawData, options) {
  const processedData = rawData;
  // Do the processing
  return processedData;
}

onmessage = ({ data: options }) => {
  fetch('/get_raw_data')
  .then(res => res.json())
  .then(rawData => process(rawData, options))
  .then(data => postMessage(data));
};

Another classic use case is to zip/unzip files in the browser. The workers are often integrated within the library, like unzipit, js-untar, etc.

Prefetching data

Service worker provides APIs for caching data, and intercepting requests in the browser. This provide us an opportunity to get some data ready in advance.

In the main thread, we simply register the service worker script:

if ("serviceWorker" in navigator) {
  // do make sure the service worker script is under the right folder as it will only control the requests from the scripts under the same folder
  navigator.serviceWorker.register("/sw.js");
}

And fetch data as normal:

fetch('/get_data').then(res => {
  // If the service worker is successfully activated and the prefetched data is ready, you shall get the response immediately
  // ...
});

In the sw.js, we load data ahead and put into cache:

const preFetchUrls = [
  '/get_data'
];
caches.open('prefetchedData').then(prefetchedData => {
  preFetchUrls.forEach(requestUrl => {
    fetch(requestUrl)
      .then(res => prefetchedData.put(requestUrl,res));
  });
});

And intercept the request to replace the response with the cache if it exists:

addEventlistener('fetch', e => {
  const { method, url } = e.request;
  if (method !== 'GET' || !preFetchUrls.includes(url)) return;
  e.respondWith(async () => {
        const prefetchedData = await caches.open('prefetchedData');
    const cachedResponse = await cache.match(event.request);
    return cachedResponse || fetch(event.request);
  });
});

Caching for loading faster next time with Workbox

In the use case above, we made use of the caching and request intercepting ability of service worker to load some data ahead. We may also use these abilities to help our return user to load page faster by caching the necessary files.

Here I would like to recommend using Workbox, which will largely simplify the setups and apply different strategies easily.

In the main thread, we will need to register the service worker script as before:

if ("serviceWorker" in navigator) {
  navigator.serviceWorker.register("/sw.js");
}

While in the sw.js, we will import the Workbox libraries this time, which means you will require to bundle this file before use.

The example code from Workbox website is sufficient for most of the use cases:

import { registerRoute } from 'workbox-routing';
import {
  NetworkFirst,
  StaleWhileRevalidate,
  CacheFirst,
} from 'workbox-strategies';
import { CacheableResponsePlugin } from 'workbox-cacheable-response';
import { ExpirationPlugin } from 'workbox-expiration';

// Cache page navigations (html) with a Network First strategy
registerRoute(
  ({ request }) => request.mode === 'navigate',
  new NetworkFirst({
    cacheName: 'pages',
    plugins: [
      // Ensure that only requests that result in a 200 status are cached
      new CacheableResponsePlugin({
        statuses: [200],
      }),
    ],
  }),
);

// Cache CSS, JS, and Web Worker requests with a Stale While Revalidate strategy
registerRoute(
  ({ request }) =>
    request.destination === 'style' ||
    request.destination === 'script' ||
    request.destination === 'worker',
  new StaleWhileRevalidate({
    cacheName: 'assets',
    plugins: [
      new CacheableResponsePlugin({
        statuses: [200],
      }),
    ],
  }),
);

// Cache images with a Cache First strategy
registerRoute(
  ({ request }) => request.destination === 'image',
  new CacheFirst({
    cacheName: 'images',
    plugins: [
      new CacheableResponsePlugin({
        statuses: [200],
      }),
      // Don't cache more than 50 items, and expire them after 30 days
      new ExpirationPlugin({
        maxEntries: 50,
        maxAgeSeconds: 60 * 60 * 24 * 30, // 30 Days
      }),
    ],
  }),
);

If these are the only logics you needed in the service worker, you may also simply use a bundler plugin instead of copy paste and configure the bundler.

For Webpack users, check this doc: Workbox Webpack plugin.

For Rollup users, you may try the plugin rollup-plugin-workbox.

Other usages with workers

Mocking API without Node.js server

Once we have the ability to intercept a request, we may achieve some interesting things like mocking API. MSW is an perfect example.

Basically we could:

listen to the "fetch" event;
check the request matches our rules;
if it matches, replace with a cached response.

Just like what I demonstrated in the "Prefetching data" use case.

As a data processing layer

Another interesting use case is separating a data processing layer from our UI layer.

For example, when we are tracking the usage of an app, the app may throw all the information of a component to the worker. Then the worker selects the required data, and transforms the data into the right format before finally sending them to the server.

The layer isolation keeps the relatively non-critical but probably heavy tracking logics away from affecting the user experience, and crashing the app when there's something wrong.

Syncing in the background

Have you encounter any super slow data operations such as uploading multiple large files, or simply the BE server respond slowly? In these cases, the user is likely have to wait for the server response before exiting it.

Service worker does not have this limitation! It keeps running as long as the browser is still active. Therefore, we could cache these data somewhere in the browser first (like indexDB), and let the service worker to communicate with server instead to allow the user doing other things while waiting.

And more

There are a lot more possibilities with workers besides to the stuff mentioned above, like the well-known Progressive Web App, push notifications, code sandbox (without DOM access), etc.

Let's be creative and bring the web app to the next level!