Optimizing Content Migrations With Edge Compute

The more I learn about edge computing, the more I look for compelling use cases that developers can relate to. Today, I have one: edge redirects.

Background

Imagine you manage a website with a lot of content. I’m talking, like hundreds of thousands of articles. One day, you decide it’s time to change the domain name.

By this time, you may have amassed a million links back to your old domain. In order to prevent those links from breaking, you’ll want to set up a redirect server to handle the traffic going to the old domain, and redirect it to the new domain.

If your old server is using NGINX, you can add some configuration that looks like this:

server {
    listen 80;
    listen 443 ssl;
    server_name www.old-name.com;
    return 301 $scheme://www.new-name.com$request_uri;
}

This little snippet takes any request coming to the old URL and passes it along to a new URL. This has worked well enough for a long time, but what if I told you we could make it even better by using edge compute instead?

The Latency Issue

Imagine your servers live somewhere in North America. If a user in Asia goes to your old URL, their request has to travel all the way to North America just to be told “sorry, your princess is in another castle” (aka, the redirect instructions). That redirect response goes all the way back across the Pacific with the new location. Upon arrival, their browser now sends another request back to North America to get the real-real content, and finally the user gets to their original target, an amazing article on HTML forms ;D

Here’s a little diagram I made all by myself to help visualize.

Flow diagram with a long arrow going from a user to the first server, then back to the user, then to the second server, and finally back to the user.
The user requests the old URL, the old server responds with redirect instructions to the new URL, the browser redirects the request to the new URL, and the response is finally sent to the user.

In case I haven’t made it painfully obvious by now, one big issue with this redirect chain is that the user has to wait for two round-trip flights half way across the world.

This additional time spent getting the redirect instructions can actually be greatly reduced with edge functions.

Edge functions are serverless functions that are deployed all over the world. Akamai, for example, has over 250,000 locations.

By setting up an edge function to handle the old URL, a user’s initial request only needs to travel as far as the nearest edge server to get the redirect instructions.

Let’s compare this with the same example above, but this time using edge compute.

A user in Asia goes to your old URL, their request only has to travel to the nearest edge server location (very possibly in their same city). The princess is still in a different castle (aka, the redirect instructions) but at least they were told right away. Once the browser sees the redirect instructions, everything plays out the same as above with the exception that now the user didn’t have to wait quite so long to read that amazing article (Seriously, I spent a ton of time on it. Please check it out).

Once again, here is my cool visualization.

Flow diagram with a short arrow going from a user to an edge server, then back to the user, then a long arrow going to the server, and finally back to the user.
The user requests the old URL, the edge server closest to the user responds with redirect instructions, the browser redirects the request to the new URL, and the response is finally sent to the user.

(You see how I made the arrows longer and shorter? That’s a symbolic representation of time. My dog says I’m smart.)

This may save a few hundred milliseconds. In the grand scheme, this may not sound like much. But it’s time spent doing absolutely nothing.

Just waiting.

Not waiting for the server to do some calculations, or the database to update.

Just waiting for the request to fly through the air.

And the thing is, for some organizations where even tens of milliseconds could mean the difference between sales, this is actually a lot of time to spend doing nothing. So I say don’t.

The Restructuring Issue

This next issue gets a little more interesting. Let’s say, in addition to changing the domain name, you want to change the URL structure for your blog posts.

This may happen because the old website used blog post IDs in the URL, but someone told you it would be a good idea to use the blog post title in the URL.

So your redirects should look like this

  • From: old-url.com/blog/1
  • To: new-url.com/articles/10-reasons-nugget-why-is-a-good-boy

There are 3 things going on here:

  1. The domain changed
  2. The route “/blog” was renamed to “/articles”
  3. The slug for the individual posts uses the title instead of the ID

The first two requirements are easy enough to handle with something NGINX rewrite rules and a regular expression. That last one is a bit of a pain in the butt because there is no way to programmatically determine how an old URL should point to a new URL.

One solution might be to have a server accept requests to the old URL, look up the requested blog post in a database using the blog post ID, build the new URL using the blog post title, and respond with redirect instructions to the constructed URL.

That would work, but there’s a couple of issues. The database queries will likely add even more latency to the request and it could end up costing you more money to keep running.

A better solution, in my opinion, is to create a 1-to-1 mapping of all the old URLs to all the new URLs. That means you will need to create one static rewrite rule for each URL on your old domain. When you have hundreds of thousands of posts, that’s a lot of work.

Fortunately, you’d likely only need to do this one time during the migration, and you could create a script that loops through each entry in the database and generates the rewrite rule for you (yay robots).

Unfortunately, this is also the case if you use edge functions. There is no getting around the need to create 1-to-1 mapping.

To which you may respond…

If we need a URL map anyway, how is this different than the latency issue?

To which I will respond…

I’m glad you asked.

The key (this is a pun that will make sense shortly) difference here is that web servers like NGINX read rules sequentially. That means if we have 100,000 redirect rules and someone asks for the last one, the server has to read through all the previous rules before finally coming to the last rule and responding with the instructions.

Edge compute has another trick up its sleeve: key-value storage. Most of the big edge compute players also have a key-value storage offering:

In the example above, NGINX processing each of the 100,000 redirect rules has a complexity algorithm of O(n). Which is a fancy way of saying that it takes longer to get to the last item in the list as the list grows.

On the other hand, most edge key-value storage services have a complexity algorithm of O(1). Which is a fancy way of saying that regardless of how long the list of items is, our lookup times will be the same.

Going back to the example, we can store all our URL mappings in edge key-value storage and use edge functions to dynamically look up the redirect for each request.

Let’s break it down into steps:

  1. User makes request to old-url.com
  2. Request gets handled by the closet edge function to the user.
  3. Edge function finds the redirect URL from the key-value storage based on the requested URL.
  4. Edge function returns the redirect instructions to the user.
  5. User’s browser redirects to the new URL.
  6. New URL’s origin server handles the request.

I hope that makes sense, but the big takeaway here is that in addition to the benefit described in the previous section regarding latency, we may also see performance improvements for large-scale redirects where wildcards or regex transformations do not make sense.

The Complexity Issue

Before we get too far into this one, I want to admit that I have an inherent bias from being a frontend/JavaScript developer, but I still think this is a compelling point.

In the examples above, I’m making comparisons between edge compute and NGINX. It’s one of the most widely used server technologies in the world.

However (and here’s where my bias exists), I’m willing to bet that NGINX is not the main programming language for you or your team, and that it adds more complexity to your stack.

The added complexity is worth it for more organizations because it’s so good at serving static assets, being a reverse-proxy, load balancing, and doing all the other things we use it for.

But do we really need that added complexity for a redirect server?

I have to admit that one reason edge compute appeals to me because most edge platforms support my favorite programming language, JavaScript.

This makes it much easier for me to stay productive. Although I may have to switch context between frontend, backend, and edge runtimes, at least I can keep writing the same language everywhere.

I can’t say my experience is the same with NGINX.

In addition to getting NGINX set up in the first place, any time I need to make changes to the configuration, I need to look up how to do it again in the documentation. Even for things I’ve done a hundred times before.

Another cool benefit to edge compute is that we are dealing with serverless functions that can be scaled up automatically. We, as developers don’t have to worry about provisioning a server, figuring out how much resources it needs, deciding what is the best region to deploy to, and yada yada yada.

We get to just write out functions and let the service provider figure out running our code in the most efficient way. And when traffic spikes, it scales up to handle the load automatically.

This isn’t to say that I think NGINX is the wrong choice for a server. In fact, a few paragraphs ago I listed some excellent use cases for NGINX. But for redirects, I’d much rather just deploy an edge function all around the world and use the programming language I’m most familiar with to write the logic. Then I can get back to hanging out with Nugget.

Closing Thoughts

This blog post was mostly inspired by some of the internal conversations I get to be a part of working at Akamai. We get to work with huge organizations working on some of the most technically challenging problems, and the folks I get to work with are SUPER smart.

Unfortunately, a lot of the specifics of these conversations cannot be shared publicly for one reason or another, so I wanted to take a moment to point out some of the really cool things that I think can share.

The examples above where we talked about migrating 100,000 blog posts to a new domain. That actually happened. It’s based on a real migration that Akamai went through with an internal website with about a third more redirects.

I think the coolest things about this is that it’s an example of Akamai dogfooding their own product.

And that’s not even getting into customer work. I saw one case where a customer needed custom redirects for almost a million URLs, and it seemed like EdgeWorkers would be the solution.

Anyway, I’m very excited for the future of edge computing, and my hope is that this article has given you at least one solid use case to get excited about it, yourself.

So, in conclusion, the main benefits of edge redirects over traditional redirects are:

  • Less idle time spent waiting for requests to travel to a redirect server.
  • Less compute time spent on redirect server that require large URL mappings.
  • Less complexity than provisioning and scaling a redirect server.

Thank you so much for reading. If you liked this article, please share it. It's one of the best ways to support me. You can also sign up for my newsletter or follow me on Twitter if you want to know when new articles are published.


Originally published on austingil.com.