How to test for expiring TLS/SSL certificates using Gremlin FI

How to test for expiring TLS/SSL certificates using Gremlin FI
Last Updated:
Categories: Chaos Engineering

Introduction

Transport Layer Security (TLS) is an essential component of the modern Internet. It encrypts the connections between users, organizations, and systems, preventing data in-transit from being exposed to third parties. Because of this, it's critical for teams to not only implement TLS for their services, but also make sure that it's working properly.

In this tutorial, we'll show you how to use Gremlin Fault Injection (FI) to test your services to detect expiring TLS certificates. We'll use the Time Travel attack to adjust the clock of our system, then retrieve our TLS certificate. Our system will compare the TLS expiration date to its clock, and if the clock is within the certificate's validity period, it will be considered valid.

Overview

This tutorial will show you how to:

  • Verify the validity of a certificate.
  • Run a Time Travel test using Gremlin Fault Injection (FI) to check for expiring TLS certificates.
  • Create a Scenario to gradually increase the magnitude of the test.
  • Automate TLS certificate tests using the Gremlin REST API.

Prerequisites

Before starting this tutorial, you'll need:

  • A Gremlin FI account.
  • A web application with a publicly accessible URL with HTTPS enabled.
  • The Gremlin agent running on your web application's host.

To install Gremlin, follow the installation guide in our documentation. Once your host(s) appear in the Gremlin web app, continue on to the first step.

Step 1: Verify the validity of a certificate

Earlier, we mentioned how devices check a certificate's expiration date against the current date and time to verify the certificate's validity. We'll use Time Travel to change the system clock on our host, letting us shift seconds, minutes, hours, days, or even years into the future. We'll move our system forward, send a request to a website, and if we receive an expiration error, we know that our certificate is expiring soon.

To demonstrate what a healthy certificate looks like, let’s run a curl request against our web application:

bash
1curl -I https://gremlin-demo-lab-host/
1HTTP/2 200
2cache-control: max-age=604800
3content-length: 543958
4content-type: text/plain
5date: Mon, 18 Jan 2021 23:06:18 GMT
6...

Now let’s try sending a request to a website with an expired certificate:

bash
1curl -I https://expired.badssl.com/

We'll see that curl reports back an SSL certificate error. This is what we should look for when considering whether the test has passed or failed:

1curl: (60) SSL certificate problem: certificate has expired More details here: https://curl.se/docs/sslcerts.html

Step 2 - Run a Time Travel test

Now that we know what to look for, let’s perform our test. Our hypothesis is that if we set our system clock forward (e.g. by one month) and send a curl request, we’ll see a successful response. But if curl returns an error, then we know the certificate will expire within the next day.

We’ll log into the Gremlin web app, create a new attack, and select our test host, which is shown here as "gremlin-demo-lab-host":

Selecting targets in the Gremlin web app

Next, we’ll expand the State category and select Time Travel. We’ll keep the length of the experiment set to 60 seconds, block NTP (Network Time Protocol) communication so that our host doesn’t automatically update to the correct time, and set the offset set to 2,678,400 seconds, or exactly one month from now.

Tip: To calculate the offset, use a date/time conversion tool such as the ones provided by timeanddate.com.

Time travel attack parameters

Now let’s run the test. While it's running, let’s re-run curl:

shell
1curl -I https://gremlin-demo-lab-host/
1curl: (60) SSL certificate problem: certificate has expired
2More details here: https://curl.se/docs/sslcerts.html

Curl returned an error, meaning that our certificate is going to expire within the next month. We’ll click the Halt button in the top-right corner of the Gremlin web app to halt the experiment, which automatically reverts the system clock to the correct time. We’ll record our observations in Gremlin, then work on replacing our certificate. Using Time Travel allowed us to catch this before it became a problem for our customers, while doing so in a safe and controlled way.

Step 3 - Create a Scenario

With a Scenario, we can run multiple Time Travel attacks back-to-back and increase the interval each time. This lets us test over multiple time periods during a single experiment.

To create a Scenario, we’ll click on our previously run Time Travel attack to open the Attack Details page. From here, we’ll click Create Scenario. We’ll call our Scenario "SSL/TLS certificate expiration" and enter a description.

Creating a Scenario from an attack

Entering Scenario details

Next, we’ll click “Add a recent attack”, re-select our previous Time Travel attack, and choose our test host. We’ll change the offset for the second attack to 604,800 (one week). We’ll repeat this step to create a third attack, then change its offset to 2,678,400 (one month).

While the Scenario is running, we’ll run curl in a continuous loop. In the following script, curl makes a request, and if the request is successful, it waits 10 seconds before repeating it. If curl fails, it exits the loop and prints the failure to the console. This script also prints the current system time before each check, so we can see which stage of the Scenario was active when curl failed.

bash
1#!/bin/bash
2while :; do
3 echo $(date)
4 curl -s https://gremlin-demo-lab-host/ > /dev/null
5 if [[ "$?" -ne 0 ]]; then
6 break
7 fi
8 sleep 10
9done
10echo "Failed to connect."

Now let’s run the Scenario and start our script. Once the Scenario hits stage 2, curl returns an error and the loop exits. This tells us our TLS certificate will expire between one day and one week from now.

Aborted Scenario results

If you have a Gremlin account, you can use this card to use a pre-configured Scenario. Click "Run Scenario" to open the Recommended Scenario in the Gremlin web app, click "Add targets and run" to select the hosts you want to run the attack on, then run the Scenario.

Step 4 - Automate the Scenario using the Gremlin REST API

The Gremlin REST API provides a RESTful interface for performing actions in Gremlin, such as starting attacks and Scenarios. By using the REST API in our test script, we can automatically initiate our Scenario

First, let’s reopen our executed Time Travel Scenario in the web app. Click Rerun, then in the bottom-right corner of the page, click Gremlin API Examples. This generates a full curl request that we can use to initiate the attack:

shell
1curl -i -X POST 'https://api.gremlin.com/v1/scenarios/<your Scenario ID>/runs?teamId=<your team ID>' -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Bearer <your bearer token>' -d '{}'

Next, we’ll copy this command to our curl script and add it just before the loop. We can add a second API command after the end of the loop to halt the experiment if curl fails. This lets us safely rollback after detecting an expired certificate without having to open the Gremlin web app and halt the experiment ourselves. Make sure to replace <your Scenario ID>, <your team ID>, and <your bearer token> with your own values:

bash
1#!/bin/bash
2
3# Start the Scenario
4RUN=$(curl -X POST 'https://api.gremlin.com/v1/scenarios/<your Scenario ID>/runs?teamId=<your team ID>' -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Bearer <your bearer token>' -d '{}')
5
6# Test your website(s)
7while :; do
8 echo $(date)
9 curl -s https://gremlin-demo-lab-host/ > /dev/null
10 if [[ "$?" -ne 0 ]]; then
11 break
12 fi
13 sleep 10
14done
15
16# Halt the Scenario
17curl -X POST 'https://api.gremlin.com/v1/scenarios/halt/<your Scenario ID>/runs/'$RUN'?teamId=<your team ID>' -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Bearer <your bearer token>' -d '{}'

Now we have a fully scripted chaos experiment that we can schedule using a service like cron, add to our CI/CD pipeline, or run as part of our client-side testing suite.

Conclusion

Staying ahead of expiring certificates is vital for keeping your websites and services accessible and secure. Using Gremlin FI to run Time Travel tests lets you quickly and safely test your certificates on any environment, whether your websites are hosted on AWS, GCP, Azure, or on-premises.

Related

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Get started