Navigating CORS and Cache API for 3rd Party Image Caching

This document outlines my process of troubleshooting a caching issue with third-party images in the Cache API storage, involving a deep dive into Cache Storage API, HTTP Headers, Service Workers, and CORS.

Previous articles on HTTP Headers and caching:
How to use Cache-Control: A Guide to HTTP Cache Headers

Introduction

Problem Overview

In my application, I've used cache-control HTTP headers for various assets along with services workers for offline support. Service workers can intercept network requests and provide custom responses. Moreover, they can cache resources using the Cache Storage API, providing performance advantages and enabling offline use.

The Cache Storage API allows you to store copies of responses to requests so you can retrieve them later. It's like having your own private cache separate from the browser's standard HTTP cache.

However, I encountered a problem.

Despite service workers caching resources with the Cache Storage API, images fetched from Cloudinary weren't being captured and stored in Cache Storage.

For context, the application is bundled using Vite, with vite-pwa-plugin configured for solid-js. See on Github.

Offline Support

Let's build some context how Cache Storage and HTTP caching are used.

The Cache Storage API, while another form of caching in the browser, isn't meant to replace HTTP cache strategies. Rather, it works in tandem with the HTTP cache to optimize your application's performance.

Here's a simplified flow to illustrate how these two caching mechanisms interact:

In this flow, the service worker first checks the Cache Storage for a cached response. If it doesn't find one, it sends a network request, which goes through the usual HTTP caching process. The response from the network (which might come from the HTTP cache) is then stored in the Cache Storage for future use.

Implementing offline functionality through service workers and caching mechanisms provides several key advantages:

Improved User Experience: Offline support allows users to interact with your app even when they're not connected to the internet. This creates a seamless user experience, as users can continue to use your app under any network conditions.
Reduced Data Usage: By serving resources from the cache, you can significantly reduce data usage. This is especially beneficial for users with limited or expensive data plans.
Enhanced Performance: Serving resources from the cache is typically faster than downloading them from the network, leading to quicker load times. This performance boost is particularly noticeable on slow or unreliable networks.
Increased Reliability: With offline support, your app becomes more reliable. Users can access your app at any time, regardless of their network conditions. This can lead to increased user engagement and satisfaction.

Understanding the Problem

Analysis

When you add resources to the Cache Storage API, the browser intelligently checks if those resources are already available in the HTTP cache. If they are, it uses those cached resources, thereby avoiding an additional network request. This is a smart way the browser optimizes resource usage.

You can observe this behavior in the Network tab of your browser's developer tools. For instance, on a page you've visited before, you'll notice that the response is served from the Service Worker, which has stored the resource in the HTTP cache:

However, when you refresh the page, the request is served from memory instead:

This demonstrates how the browser smartly chooses the most efficient way to serve resources, whether that's from the service worker, the HTTP cache, or memory.

Service Worker

Service workers are a type of web worker. They're JavaScript files that can control the web page/site it is associated with, intercepting and modifying navigation and resource requests, and caching resources in a very granular fashion to complete offline experiences, or to boost performance.

Service workers sit between your web pages and the network, acting as a type of network proxy. They can intercept all outgoing requests from your page, allowing them to be handled appropriately.

If you want to ensure that certain resources are always served from the Cache Storage API, you can do so by handling those requests in your service worker's fetch event handler with something like this:

*.js

/* sw.js */

self.addEventListener('fetch', (event) => {
  const request = event.request
  event.respondWith(
    caches.match(request).then((response) => {
      return response || fetch(request)
    })
  )
})12345678910

Configuring Service Workers

In my application, service workers are configured through a Vite plugin for Progressive Web Apps (PWA), vite-pwa-plugin, which utilizes Workbox, a set of libraries that automate service worker generation and asset caching.

Here's how I've set up the service worker in my vite.config.ts:

*.ts

/* vite.config.ts */

export default defineConfig({
  // ...
  plugins: [
     // ...
      VitePWA({
      registerType: 'autoUpdate',
      strategies: 'generateSW',
      workbox: {
        cleanupOutdatedCaches: true,
        skipWaiting: true,
        runtimeCaching: [
          /**
           * function definition shown above
           * */
          getCache({
            pattern: /^https:\/\/res.cloudinary.com\/hakkei-co/,
            name: 'img-cache',
          }),
        ],
        globPatterns: ['**/*.{js,css,html,ico,png,svg,json}'],
        navigateFallback: null,
      },
      manifest: {...},
    }),
  ]
  // ...
})1234567891011121314151617181920212223242526272829

The registerType option is set to autoUpdate, enabling the service worker to automatically check for updates in the background.

The strategies option is set to generateSW, instructing Workbox to generate a service worker file.

This article provides a more comprehensive overview of the different service worker caching strategies.

The getCache function is used to handle caching for images fetched from Cloudinary. These images are stored in the Cache Storage for offline access using the CacheFirst strategy:

*.ts

const getCache = ({ name, pattern }: any) => {
  return {
    urlPattern: pattern,
    handler: 'CacheFirst' as const,
    options: {
      cacheName: name,
      expiration: {
        maxEntries: 500,
        maxAgeSeconds: 60 * 60 * 24 * 365 * 2, // 2 years
      },
      cacheableResponse: {
        statuses: [200],
      },
    },
  }
}12345678910111213141516

The CacheFirst strategy instructs Workbox to first attempt to fetch the response from the cache. If the response isn't in the cache, it fetches it from the network, stores it in the cache for future use, and then returns it to the user.

Server Configuration

Lastly, let's review the server configuration.

I've set up my server using Netlify and implemented the Content-Security-Policy header. This acts as a gatekeeper, only allowing resources to be loaded from my site ('self') and res.cloudinary.com, where I host my images:

*.toml

[[headers]]
for = "/*"

  [headers.values]
  Cache-Control = "max-age=300"
  Referrer-Policy = "no-referrer"
  Content-Security-Policy = "default-src 'self'; connect-src 'self' https://res.cloudinary.com; img-src 'self' https://res.cloudinary.com"
  X-Content-Type-Options = "nosniff"
  X-Frame-Options = "DENY"
  Vary = "accept, Accept-Encoding"
  X-XSS-Protection = "1; mode=block"
  Strict-Transport-Security = """
    max-age=63072000;
    includeSubDomains;
    preload"""123456789101112131415

Next, let's look at the headers Cloudinary sends when serving images:

*.http

Accept-Ranges: bytes
Access-Control-Allow-Origin: *
...
Cache-Control: private, no-transform, immutable, max-age=2592000
...
Content-Length: 24312
Content-Type: image/webp
...
Etag: "35383c92073b3f263f0f8ba3aaedba98"
Server: Cloudinary
....
Strict-Transport-Security: max-age=604800 Timing-Allow-Origin:*
Vary: Accept,User-Agent,Save-Data
X-Content-Type-Options: nosniff1234567891011121314

Cloudinary's Cache-Control header is set to private, no-transform, immutable, max-age=2592000, meaning images are cached for about a month.

Lastly, the Access-Control-Allow-Origin: * header indicates any site can fetch my images via a CORS request, a necessary setting for some Cloudinary features.

Problem Summary

To recap, I was able to observe the service worker was intercepting network requests/responses, and I concluded that the headers were correctly configured and I successfully cached all other assets, I still encountered issues when trying to cache assets fetched from Cloudinary:

From my Workbox configuration, I expected img-cache bucket which was not being created.

*.ts

// vite.config.ts
// ...
  getCache({
    pattern: /^https:\/\/res.cloudinary.com\/hakkei-co/,
    name: 'img-cache',
  }),
// ...1234567

Understanding CORS

When developing web applications, one of the most common challenges developers face is dealing with Cross-Origin Resource Sharing (CORS).

CORS is a mechanism that uses additional HTTP headers to tell browsers to give a web application running at one origin, access to selected resources from a different origin.

Access-Control-Allow-Origin

There are several HTTP headers from the Cloudinary response headers involved in CORS, but let's focus on the culprit: Access-Control-Allow-Origin.

The Access-Control-Allow-Origin header, sent in the server's response, indicates which origins can access the resource.

Access-Control-Allow-Origin: * means any origin can access it.

However, this header is only included if the request uses the CORS protocol, which prompts the browser to check for this header.

Now, you might be wondering why we're talking about CORS and this specific header. Well, it's because of a process called preflighting.

Preflight Requests

In the context of CORS, preflight requests are like a safety check. Before the browser sends certain types of cross-origin requests, it sends a preflight request to ask the server, "Can I send this request?"

The server's response, which includes the Access-Control-Allow-Origin header, tells the browser whether it's allowed to send the actual request. This process ensures that cross-origin requests are handled securely, protecting both server resources and user data.

The server's response to the preflight request will indicate whether the actual request is allowed to proceed.

If the server's response to the preflight request indicates that the actual request is allowed, the browser will then proceed to send the actual request. If not, the browser will stop the process and throw a CORS error.

In essence, a preflight request is an additional request that is sent before the actual request in certain cross-origin situations.

Network Requests w/ CORS

When you fetch an image using CORS, the browser sends a request to the server hosting the image.

If this is a simple request (like a GET request), the server responds with the image and includes the Access-Control-Allow-Origin header in its response. This header tells the browser which origins are allowed to access the image.

If you don't set the request mode as cors, the browser treats it as a no-cors request and will ignore the Access-Control-Allow-Origin response header. This was the culprit of images not being cached because the browser's same-origin policy restricts how data received from different origins can be stored and used.

Making a network request with CORS:

Making a network request without CORS:

Wrapping Up: The Solution

The solution to the CORS issue I encountered is specific to the component library I was using, Hope UI. However, the principle of configuring image sources to use CORS is applicable across any framework or library.

Here's how I enabled CORS in my Hope UI component:

*.tsx

 <Box
  bg="transparent"
  as="img"
  rounded="$lg"
  class="sketchy"
  src={cld.image('v1709640251/bw_profile').quality('auto').format('auto').toURL()}
  alt="self portrait"
  objectFit="cover"
  h={250}
  w={250}
  crossOrigin="anonymous" // <- Enable cors
/>123456789101112

Cors can be implemented using the Fetch API like this:

*.js

fetch('https://example.com/data', {
  method: 'GET', // or 'POST'
  headers: {
    'Content-Type': 'application/json',
    // Any other headers as needed
  },
  mode: 'cors', // no-cors, *cors, same-origin
})12345678

With just a single line of code enabling CORS, I was able to fetch and serve images from Cloudinary even when offline.

With CORS enabled, we can observe the response-type column set to cors- which allows the request to be made to a different domain than the one the web page came from.

To recap:

When a request is made with CORS enabled, the server can respond with the Access-Control-Allow-Origin header, which tells the browser that the response can be shared with the origin specified in the header.
Note: If this header is not included in the response, the browser will not allow the response to be shared, resulting in a CORS error. (In this case, it was enabled automatically by Cloudinary).

In the context of cache storage, the cors response-type means that the response was fetched with CORS enabled and that the server responded with the appropriate Access-Control-Allow-Origin header. This allows the response to be shared with the site, even though it came from a different origin.