Contents

Preview environment for Code reviews, part 3: Reverse-proxy configuration

Mateusz Palichleb

16 Jan 2023.10 minutes read

Preview environment for Code reviews, part 3: Reverse-proxy configuration webp image

Before you start

This blog post is the third part of a series of articles that cover building a separate Preview environment for each Code Review (Merge Request).

In the previous article we have successfully implemented CI/CD GitLab pipelines for creating and closing Preview environments in the Cloud.

Now we will focus on Reverse-proxy server configuration and implementation, which is required to pass data between subdomains and AWS S3 bucket subdirectories.

Local instance of Reverse-proxy server

Firstly, our goal is to test it locally inside the Docker container, because it’s possible to do it without a complex integration with the K8S cluster, which, of course, we will cover in further articles as well.

This approach will help us to focus more on Reverse-proxy configuration to ensure, that we eliminated issues before we will deploy it to the cluster, without worrying about K8S manifests at this stage.

1

Choosing a server implementation

Let us first consider whether we need a customized, dedicated server created from scratch (e.g. in Scala or Java) or whether it is worth using a ready-made solution.

The role of the Reverse-proxy server looks standard as far as the use case is concerned (very common in web application projects), so we have no reason to write our own server. We can use an out-of-the-box solution, developed over the years by the OpenSource community, which in addition is well-written in terms of security and optimized.

The choices on the market are (according to the list of the best web servers in 2022):

We decided that in our case it was worth considering one of the two most popular and widely used, which are Apache httpd and Nginx.

Our Use Case is simple, we need a Reverse-proxy with a filter for wildcard subdomains. Apache httpd in this case requires more configuration files than Nginx, which is more concise in this respect (and much nicer to configure). In addition, Nginx is somewhat of a standard in Kubernetes clusters, so consistency too is an argument for its selection. The choice ultimately fell on Nginx.

If you are interested in the exact differences in the performance of the two servers check out one of the hundreds of blog posts comparing them, e.g. Apache vs. Nginx.

Initial (local) server configuration

Ultimately, the Nginx server is to be run in a container inside Pod in k8s, so we will start by creating a Dockerfile for Docker:

FROM nginx:1.23.1

# custom configuration
COPY ./nginx.conf /etc/nginx/conf.d/default.conf

In this case, we are extending the official Docker Nginx version 1.23.1 image with a custom configuration in which we will define the proxy.

The nginx.conf configuration file looks as follows:

server_names_hash_bucket_size 64;

server {
    listen 80;

    server_name ~(.*)\.staging\.nice-app\.com$;
    set $subdomain $1;
    set $destination_domain 'nice-app-staging.s3-website.eu-central-1.amazonaws.com';

    location / {
        resolver 192.168.65.5; # docker localhost DNS resolver
        proxy_pass http://$destination_domain/$subdomain$uri$is_args$args;
    }
}

First, we define server_names_hash_bucket_size 64; because the hash names that will be in the subdomain can be long. By default, this value depends on the CPU cache (usually 32), but to be safe, we have extended it so that we don't have problems with longer subdomains.

We then define a new server that listens on port 80, where subdomains are caught according to the regexp pattern (.*)staging.nice-app.com$. The hash from the subdomain itself is extracted into the $subdomain variable.

In addition to this, we have already a target domain defined, i.e. a previously obtained URL to the staging bucket in AWS S3.

For each location / path, traffic will be redirected to a URL built from the variables mentioned above and from Nginx's built-in variables (like $uri, which is the text that is given after the domain along with optional HTTP GET arguments in the URL).
This formula is: "http://$destination_domain/$subdomain$uri$is_args$args", which is a subdirectory in the AWS S3 bucket.

In addition, we have a DNS resolver defined on the IP address 192.168.65.5, which is available in the local Docker daemon environment. Why is it required? Because we used variables in the proxy_pass instruction in Nginx. Then Nginx uses a completely different proxy mechanism than the URL, that is not dynamic (i.e. static text with no variables). The problem described can be found on many topics in StackOverflow.

Note that this address may be different on different computers, operating systems, and Docker versions. Therefore, it is a good idea to make sure in advance what the IP address is in your local environment (we will not describe how to do this, because there are many different ways to do it, which change with the Docker version).

Preparation for testing in the local environment

With a Dockerfile defined and an nginx.conf file with a custom configuration, we can move on to the preparation for testing.

Firstly, we need to have the application files uploaded to the staging bucket. To do this, we create a new branch in the app's Git repository, which we will then use to create a Merge Request in GitLab that will trigger the deployment (we use the CI/CD pipeline implemented in our previous article).

$ git checkout -b feature/test-MR
Switched to a new branch 'feature/test-MR'
$ "test" > TEST.md
$ git add .
$ git commit -m "testing MR deployment"

[feature/test-MR (root-commit) 8266b1c] testing MR deployment
 1 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 TEST.md
$ git push --set-upstream origin feature/test-MR

We added the file just for testing purposes, so that later when we created the MR, GitLab would run the pipeline with the upload of the new subdomain and files to AWS.

We then manually created a new Merge Request in GitLab from the branch feature/test-MR as the source to the branch main as the target.

Job build-and-deploy-staging-for-review in Pipeline generated the following URL: "http://a4f194667b4fa23.staging.nice-app.com/", where a4f194667b4fa23 is our hash of the commit, subdomain and subdirectory in the AWS S3 staging bucket.

Now that we have the files and the hash, we can use an operating system mechanism that allows us to redirect the domain to an IP address in the local environment.

2

The configuration can be found in a file with the disk path /etc/hosts (Linux/Mac) or c:\Windows\System32\Drivers\etc\hosts (Windows). Edit this file with administrator rights and add the following line to it, and finally save the changes:

127.0.0.1   a4f194667b4fa23.staging.nice-app.com

This will cause, for the URL "http://a4f194667b4fa23.staging.nice-app.com", the operating system will redirect us to the IP address 127.0.0.1, i.e. localhost, where there will be a running Docker daemon that contains the Nginx docker image that we defined a few paragraphs ago above.

With this procedure, we can test and debug the Reverse-proxy server locally, without using AWS Route 53 from the DNS zone "nice-app.com".

Testing in the local environment

Now you need to navigate to a directory with a separate project that contains the files relating to the Reverse-proxy server (we list the files to show which ones have been copied here).

$ cd /home/nice-app-developer/projects/staging-reverse-proxy
$ ls
Dockerfile  nginx.conf

Now we can run the Docker container with the image built from these files (remember to have Docker installed and the Docker daemon running).

$ docker build -t reverse-proxy .
$ docker run -it --rm -d -p 80:80 --name reverse-proxy-server reverse-proxy

It looks like we can now try to query the server via the subdomain we declared earlier. We should be shown the HTML code of the application whose files (including index.html) have been uploaded to AWS S3. This is example content, prepared not to focus on the application itself at this point, but to give an idea of what such a response from the server might look like.

$ curl http://a4f194667b4fa23.staging.nice-app.com/
<html>
<head><title>Nice-app application</title></head>
<body>
<center>Hello!</center>
</body>
</html>

AWS S3 bucket hosting problems

Congratulations! We have successfully tested the Reverse-proxy mechanism in an Nginx server. On the other hand, know that this is only the beginning of the journey, as the URLs in AWS S3 buckets have their specific operation patterns and related problems, which we will describe below.

Problem 1: A hosting mechanism for trailing slash URLs

Hosting files contained in an AWS S3 bucket works in such a way that if a slash / is included at the end of the URL (known as a “trailing slash”), the hosting server will look under that path for the directory first and then for the index.html file to return its content in a response.

That is, for example, for the URL "nice-app-staging.s3-website.eu-central-1.amazonaws.com/some-directory/", the content of the file located under the path some-directory/index.html will be returned. If the file is not there, a 404 error page will be displayed (image below).

3b

The situation changes if we do not specify a trailing slash at the end of the URL. Then AWS S3 hosting will check for the existence of a some-directory file (with no extension) completely ignoring the fact that it may be a directory and not a file. Then, even if we have an index.html file in that some-directory subdirectory, the page will still not display correctly (example below).

4b

Currently, in our Nginx configuration, the proxy_pass function passes every URL (with and without trailing slashes) to the AWS S3 hosting, which means that for URLs without a trailing slash we can get 404 error pages instead of the actual application page (HTML file content).

Solution

To solve this problem, we can use the rewrite function, which redirects all URLs without a trailing slash to a URL with a slash.

rewrite (.*[^\/])$ $1/ break;

Problem 2: URL redirection without slash for assets (e.g. jpg, png, css, js, etc.).

We have resolved the issue with the trailing slash, however as a result we have lost access to asset files. The added rewrite instruction now appends a slash at the end to files such as *.jpg causing AWS S3 to look for an index.html file in a directory with a name ending in .jpg/.

Solution

The simplest idea to bypass this problem is to split URL path routing into two patterns: one for files without a trailing slash, and the other for subpage URLs with a trailing slash.

    location ~* (.*\.[A-Za-z]+)$ { # all files with extension
        resolver 192.168.65.5;
        proxy_pass http://$destination_domain/$subdomain$uri$is_args$args;
    }

    location / {
        resolver 192.168.65.5; # docker localhost DNS resolver
        # adding a trailing slash to the end of URL if it does not exist (needed for proxy_pass to open AWS S3 directory, then to search for index.html inside)
        rewrite (.*[^\/])$ $1/ break;
        proxy_pass http://$destination_domain/$subdomain$uri$is_args$args;
    }

As you can see in the code above, the additional pattern location ~* (.*[A-Za-z]+)$ catches URLs that are files with an extension and do not end with a slash. It then redirects them without a rewrite directive.

Problem 3: Nginx redirects to a decoded URL format

Another problem with files is that their names may contain spaces and other special characters. Unfortunately, Nginx by default converts already encoded URLs to a decoded form when using the $uri variable. This is then passed to proxy_pass function.

Everything would be fine with this configuration if AWS S3 accepted such decoded URLs. As you can guess, this is not acceptable by this service.

For a file that contains spaces e.g. file with space.png, AWS S3 hosting will display an error that it could not find such a file.

5b

In contrast, the image to be searched for will be found if we use encoded URL, e.g. file%20with%20space.png (where %20 stands for an encoded space).

Solution

The solution to this problem is to re-encode the URL fragment that Nginx has decoded for processing. Since the function itself is complicated, we decided to use an extension to the Nginx server that contains such a function. The extension is called “OpenResty”, and the function needed is set_escape_uri.

In order not to waste time installing the extension via additional lines in the Dockerfile, we decided for the demo to use an already configured Nginx docker image that contains the needed extension.

The Dockerfile file will now look as follows:

FROM openresty/openresty:1.21.4.1-3-buster-fat

# custom configuration
COPY ./nginx.conf /etc/nginx/conf.d/default.conf

And the full configuration of nginx.conf:

server_names_hash_bucket_size 64;

server {
    listen 80;

    server_name ~(.*)\.staging\.nice-app\.com$;
    set $subdomain $1;
    set $destination_domain 'nice-app-staging.s3-website.eu-central-1.amazonaws.com';

    location ~* (.*\.[A-Za-z]+)$ {
        resolver 192.168.65.5;
        set_escape_uri $escaped_uri $uri; # OpenResty Nginx module https://github.com/openresty/set-misc-nginx-module#set_escape_uri
        proxy_pass http://$destination_domain/$subdomain$escaped_uri$is_args$args;
    }

    location / {
        resolver 192.168.65.5; # docker localhost DNS resolver
        # adding a trailing slash to the end of URL if it does not exist (needed for proxy_pass to open AWS S3 directory, then to search for index.html inside)
        rewrite (.*[^\/])$ $1/ break;
        proxy_pass http://$destination_domain/$subdomain$uri$is_args$args;
    }
}

Brief summary

We were able to select a server implementation, create a configuration (Reverse-proxy) and solve problems behind the mechanisms for URLs in AWS S3 hosting. Furthermore, we have tested it using a local environment and Docker. Nevertheless, we are still missing a deployment to the cloud to make the server available to the Kubernetes cluster that the client company has.

6

In the next article (part 4), we will already focus on the CI/CD mechanism for Reverse-proxy server Docker images. Moreover, the client company’s infrastructure for continuous deployments in the Kubernetes cluster will be introduced.

Blog Comments powered by Disqus.